PCA
 This is part of the dimred module It is only available if you configure PLUMED with ./configure –enable-modules=dimred . Furthermore, this feature is still being developed so take care when using it and report any problems on the mailing list.

Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.

Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of poorly correlated variables into a set of linearly uncorrelated variables. You can read more about the specifics of this technique here: https://en.wikipedia.org/wiki/Principal_component_analysis

When used with molecular dynamics simulations a set of frames taken from the trajectory, $$\{X_i\}$$, or the values of a number of collective variables which are calculated from the trajectory frames are used as input. In this second instance your input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables. However, if collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that this input trajectory is a set of poorly correlated (high-dimensional) vectors. After principal component analysis has been performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen. In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory. These output directions are some linear combination of the $$x$$, $$y$$ and $$z$$ positions if the positions were used as input or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input.

As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates. In other words, you must calculate the average structure and the amount the system fluctuates around this average structure. The problem in doing so when the $$x$$, $$y$$ and $$z$$ coordinates of a molecule are used as input is that the majority of the changes in the positions of the atoms comes from the translational and rotational degrees of freedom of the molecule. The first six principal components will thus, most likely, be uninteresting. Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures to be analyzed to the first frame in the trajectory. This can be used to effectively remove translational and/or rotational motions from consideration. The resulting principal components thus describe vibrational motions of the molecule.

If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be used as input for the PCAVARS action.

Examples

The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions of the first 22 atoms. The TYPE=OPTIMAL instruction ensures that translational and rotational degrees of freedom are removed from consideration. The first two principal components will be output to a file called PCA-comp.pdb. Trajectory frames will be collected on every step and the PCA calculation will be performed at the end of the simulation.

Click on the labels of the actions for more information on what each action computes
ff: COLLECT_FRAMES ATOMSthe atoms whose positions we are tracking for the purpose of analyzing the data =1-22 STRIDEthe frequency with which data should be stored for analysis. =1
pca: PCA USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object =ff METRICcompulsory keyword ( default=EUCLIDEAN )
the method that you are going to use to measure the distances between points =OPTIMAL NLOW_DIMcompulsory keyword
number of low-dimensional coordinates required =2
OUTPUT_PCA_PROJECTION USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object =pca FILEcompulsory keyword
the name of the file to output to =PCA-comp.pdb


The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the six distances seen in the previous lines. Notice that here the TYPE=EUCLIDEAN keyword is used to indicate that no alignment has to be done when calculating the various elements of the covariance matrix from the input vectors. In this calculation the first two principal components will be output to a file called PCA-comp.pdb. Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps. Consequently, if you run a 2000 step simulation the PCA analysis will be performed twice. The REWEIGHT_BIAS action in this input tells PLUMED that rather that ascribing a weight of one to each of the frames when calculating averages and covariance matrices a reweighting should be performed based and each frames' weight in these calculations should be determined based on the current value of the instantaneous bias (see REWEIGHT_BIAS).

Click on the labels of the actions for more information on what each action computes
d1: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =1,2
d2: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =1,3
d3: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =1,4
d4: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =2,3
d5: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =2,4
d6: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =3,4
rr: RESTRAINT ARGthe input for this action is the scalar output from one or more other actions. =d1 ATcompulsory keyword
the position of the restraint =0.1 KAPPAcompulsory keyword ( default=0.0 )
specifies that the restraint is harmonic and what the values of the force constants
on each of the variables are =10
rbias: REWEIGHT_BIAS TEMPthe system temperature. =300
ff: COLLECT_FRAMES ARGthe input for this action is the scalar output from one or more other actions. =d1,d2,d3,d4,d5,d6 LOGWEIGHTSlist of actions that calculates log weights that should be used to weight configurations
when calculating averages =rbias STRIDEthe frequency with which data should be stored for analysis. =5
pca: PCA USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object =ff METRICcompulsory keyword ( default=EUCLIDEAN )
the method that you are going to use to measure the distances between points =EUCLIDEAN NLOW_DIMcompulsory keyword
number of low-dimensional coordinates required =2
OUTPUT_PCA_PROJECTION USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object =pca STRIDEcompulsory keyword ( default=0 )
the frequency with which to perform the required analysis and to output the data.
=100 FILEcompulsory keyword
the name of the file to output to =PCA-comp.pdb

Glossary of keywords and components
Description of components

By default the value of the calculated quantity can be referenced elsewhere in the input file by using the label of the action. Alternatively this Action can be used to calculate the following quantities by employing the keywords listed below. These quantities can be referenced elsewhere in the input by using this Action's label followed by a dot and the name of the quantity required from the list below.

 Quantity Description coord the low-dimensional projections of the various input configurations

In addition the following quantities can be calculated by employing the keywords listed below

 Quantity Keyword Description gradient GRADIENT the gradient vmean VMEAN the norm of the mean vector. The output component can be referred to elsewhere in the input file by using the label.vmean vsum VSUM the norm of sum of vectors. The output component can be referred to elsewhere in the input file by using the label.vsum spath SPATH the position on the path gspath GPATH the position on the path calculated using trigonometry gzpath GPATH the distance from the path calculated using trigonometry zpath ZPATH the distance from the path altmin ALT_MIN the minimum value. This is calculated using the formula described in the description of the keyword so as to make it continuous. between BETWEEN the number/fraction of values within a certain range. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters. highest HIGHEST the highest of the quantities calculated by this action lessthan LESS_THAN the number of values less than a target value. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters. lowest LOWEST the lowest of the quantities calculated by this action max MAX the maximum value. This is calculated using the formula described in the description of the keyword so as to make it continuous. mean MEAN the mean value. The output component can be referred to elsewhere in the input file by using the label.mean min MIN the minimum value. This is calculated using the formula described in the description of the keyword so as to make it continuous. moment MOMENTS the central moments of the distribution of values. The second moment would be referenced elsewhere in the input file using label.moment-2, the third as label.moment-3, etc. morethan MORE_THAN the number of values more than a target value. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters. sum SUM the sum of values
The data to analyze can be the output from another analysis algorithm
 USE_OUTPUT_DATA_FROM use the output of the analysis performed by this object as input to your new analysis object
Alternatively data can be collected from the trajectory using
 ATOMS the list of atoms that you are going to use in the measure of distance that you are using. For more information on how to specify lists of atoms see Groups and Virtual Atoms
Compulsory keywords
 NLOW_DIM number of low-dimensional coordinates required METRIC ( default=EUCLIDEAN ) the method that you are going to use to measure the distances between points
Options
 SERIAL ( default=off ) do the calculation in serial. Do not use MPI LOWMEM ( default=off ) lower the memory requirements ARG the input for this action is the scalar output from one or more other actions. The particular scalars that you will use are referenced using the label of the action. If the label appears on its own then it is assumed that the Action calculates a single scalar value. The value of this scalar is thus used as the input to this new action. If * or *.* appears the scalars calculated by all the proceeding actions in the input file are taken. Some actions have multi-component outputs and each component of the output has a specific label. For example a DISTANCE action labelled dist may have three components x, y and z. To take just the x component you should use dist.x, if you wish to take all three components then use dist.*.More information on the referencing of Actions can be found in the section of the manual on the PLUMED Getting Started. Scalar values can also be referenced using POSIX regular expressions as detailed in the section on Regular Expressions. To use this feature you you must compile PLUMED with the appropriate flag. You can use multiple instances of this keyword i.e. ARG1, ARG2, ARG3...