   PCA
 This is part of the analysis module

Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.

Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of poorly correlated variables into a set of linearly uncorrelated variables. You can read more about the specifics of this technique here: https://en.wikipedia.org/wiki/Principal_component_analysis

When used with molecular dynamics simulations a set of frames taken from the trajectory, $$\{X_i\}$$, or the values of a number of collective variables which are calculated from the trajectory frames are used as input. In this second instance your input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables. However, if collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that this input trajectory is a set of poorly correlated (high-dimensional) vectors. After principal component analysis has been performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen. In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory. These output directions are some linear combination of the $$x$$, $$y$$ and $$z$$ positions if the positions were used as input or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input.

As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates. In other words, you must calculate the average structure and the amount the system fluctuates around this average structure. The problem in doing so when the $$x$$, $$y$$ and $$z$$ coordinates of a molecule are used as input is that the majority of the changes in the positions of the atoms comes from the translational and rotational degrees of freedom of the molecule. The first six principal components will thus, most likely, be uninteresting. Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures to be analysed to the first frame in the trajectory. This can be used to effectively remove translational and/or rotational motions from consideration. The resulting principal components thus describe vibrational motions of the molecule.

If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be used as input for the PCAVARS action.

The atoms involved can be specified using
 ATOMS the atoms whose positions we are tracking for the purpose of analysing the data. For more information on how to specify lists of atoms see Groups and Virtual Atoms
Compulsory keywords
 STRIDE ( default=1 ) the frequency with which the data should be collected and added to the quantity being averaged CLEAR ( default=0 ) the frequency with which to clear all the accumulated data. The default value of 0 implies that all the data will be used and that the grid will never be cleared METRIC ( default=EUCLIDEAN ) how are we measuring the distances between configurations RUN ( default=0 ) the frequency with which to run the analysis algorithm. The default value of zero assumes you want to analyse the whole trajectory NLOW_DIM number of PCA coordinates required OFILE the file on which to output the eigenvectors
Options
 SERIAL ( default=off ) do the calculation in serial. Do not parallelize LOWMEM ( default=off ) lower the memory requirements TIMINGS ( default=off ) output information on the timings of the various parts of the calculation UNORMALIZED ( default=off ) output the unaveraged quantity/quantities. WRITE_CHECKPOINT ( default=off ) write out a checkpoint so that the analysis can be restarted in a later run LOGWEIGHTS list of actions that calculates log weights that should be used to weight configurations when calculating averages ARG the input for this action is the scalar output from one or more other actions. The particular scalars that you will use are referenced using the label of the action. If the label appears on its own then it is assumed that the Action calculates a single scalar value. The value of this scalar is thus used as the input to this new action. If * or *.* appears the scalars calculated by all the proceding actions in the input file are taken. Some actions have multi-component outputs and each component of the output has a specific label. For example a DISTANCE action labelled dist may have three componets x, y and z. To take just the x component you should use dist.x, if you wish to take all three components then use dist.*.More information on the referencing of Actions can be found in the section of the manual on the PLUMED Getting started. Scalar values can also be referenced using POSIX regular expressions as detailed in the section on Regular Expressions. To use this feature you you must compile PLUMED with the appropriate flag. You can use multiple instances of this keyword i.e. ARG1, ARG2, ARG3... FMT the format that should be used in analysis output files RESTART allows per-action setting of restart (YES/NO/AUTO) UPDATE_FROM Only update this action from this time UPDATE_UNTIL Only update this action until this time
Examples

The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions of the first 22 atoms. The TYPE=OPTIMAL instruction ensures that translational and rotational degrees of freedom are removed from consideration. The first two principal components will be output to a file called pca-comp.pdb. Trajectory frames will be collected on every step and the PCA calculation will be performed at the end of the simulation.

PCA METRIC=OPTIMAL ATOMS=1-22 STRIDE=1 NLOW_DIM=2 OFILE=pca-comp.pdb


The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from chnages in the six distances seen in the previous lines. Notice that here the TYPE=EUCLIDEAN keyword is used to indicate that no alighment has to be done when calculating the various elements of the covariance matrix from the input vectors. In this calculation the first two principal components will be output to a file called pca-comp.pdb. Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps. Consequently, if you run a 2000 step simulation the PCA analysis will be performed twice. The REWEIGHT_BIAS keyword in this input tells PLUMED that rather that ascribing a weight of one to each of the frames when calculating averages and covariances a reweighting should be performed based and each frames' weight in these calculations should be determined based on the current value of the instantaneous bias (see REWEIGHT_BIAS).

d1: DISTANCE ATOMS=1,2
d2: DISTANCE ATOMS=1,3
d3: DISTANCE ATOMS=1,4
d4: DISTANCE ATOMS=2,3
d5: DISTANCE ATOMS=2,4
d6: DISTANCE ATOMS=3,4

PCA ARG=d1,d2,d3,d4,d5,d6 METRIC=EUCLIDEAN STRIDE=5 RUN=1000 NLOW_DIM=2 REWEIGHT_BIAS OFILE=pca-comp.pdb