PCA

This is part of the dimred module | |

It is only available if you configure PLUMED with ./configure –enable-modules=dimred . Furthermore, this feature is still being developed so take care when using it and report any problems on the mailing list. |

Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.

Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of poorly correlated variables into a set of linearly uncorrelated variables. You can read more about the specifics of this technique here: https://en.wikipedia.org/wiki/Principal_component_analysis

When used with molecular dynamics simulations a set of frames taken from the trajectory, \(\{X_i\}\), or the values of a number of collective variables which are calculated from the trajectory frames are used as input. In this second instance your input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables. However, if collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that this input trajectory is a set of poorly correlated (high-dimensional) vectors. After principal component analysis has been performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen. In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory. These output directions are some linear combination of the \(x\), \(y\) and \(z\) positions if the positions were used as input or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input.

As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates. In other words, you must calculate the average structure and the amount the system fluctuates around this average structure. The problem in doing so when the \(x\), \(y\) and \(z\) coordinates of a molecule are used as input is that the majority of the changes in the positions of the atoms comes from the translational and rotational degrees of freedom of the molecule. The first six principal components will thus, most likely, be uninteresting. Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures to be analyzed to the first frame in the trajectory. This can be used to effectively remove translational and/or rotational motions from consideration. The resulting principal components thus describe vibrational motions of the molecule.

If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be used as input for the PCAVARS action.

- Examples

The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions of the first 22 atoms. The TYPE=OPTIMAL instruction ensures that translational and rotational degrees of freedom are removed from consideration. The first two principal components will be output to a file called PCA-comp.pdb. Trajectory frames will be collected on every step and the PCA calculation will be performed at the end of the simulation.

Click on the labels of the actions for more information on what each action computes

ff:COLLECT_FRAMESATOMS=1-22the atoms whose positions we are tracking for the purpose of analyzing the dataSTRIDE=1the frequency with which data should be stored for analysis.pca:PCAUSE_OUTPUT_DATA_FROM=use the output of the analysis performed by this object as input to your new analysis objectffMETRIC=OPTIMALcompulsory keyword ( default=EUCLIDEAN )the method that you are going to use to measure the distances between pointsNLOW_DIM=2 OUTPUT_PCA_PROJECTIONcompulsory keywordnumber of low-dimensional coordinates requiredUSE_OUTPUT_DATA_FROM=use the output of the analysis performed by this object as input to your new analysis objectpcaFILE=PCA-comp.pdbcompulsory keywordthe name of the file to output to

The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the six distances seen in the previous lines. Notice that here the TYPE=EUCLIDEAN keyword is used to indicate that no alignment has to be done when calculating the various elements of the covariance matrix from the input vectors. In this calculation the first two principal components will be output to a file called PCA-comp.pdb. Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps. Consequently, if you run a 2000 step simulation the PCA analysis will be performed twice. The REWEIGHT_BIAS action in this input tells PLUMED that rather that ascribing a weight of one to each of the frames when calculating averages and covariance matrices a reweighting should be performed based and each frames' weight in these calculations should be determined based on the current value of the instantaneous bias (see REWEIGHT_BIAS).

Click on the labels of the actions for more information on what each action computes

d1:DISTANCEATOMS=1,2the pair of atom that we are calculating the distance between.d2:DISTANCEATOMS=1,3the pair of atom that we are calculating the distance between.d3:DISTANCEATOMS=1,4the pair of atom that we are calculating the distance between.d4:DISTANCEATOMS=2,3the pair of atom that we are calculating the distance between.d5:DISTANCEATOMS=2,4the pair of atom that we are calculating the distance between.d6:DISTANCEATOMS=3,4the pair of atom that we are calculating the distance between.rr:RESTRAINTARG=the input for this action is the scalar output from one or more other actions.d1AT=0.1compulsory keywordthe position of the restraintKAPPA=10compulsory keyword ( default=0.0 )specifies that the restraint is harmonic and what the values of the force constants on each of the variables arerbias:REWEIGHT_BIASTEMP=300the system temperature.ff:COLLECT_FRAMESARG=the input for this action is the scalar output from one or more other actions.d1,d2,d3,d4,d5,d6LOGWEIGHTS=list of actions that calculates log weights that should be used to weight configurations when calculating averagesrbiasSTRIDE=5the frequency with which data should be stored for analysis.pca:PCAUSE_OUTPUT_DATA_FROM=use the output of the analysis performed by this object as input to your new analysis objectffMETRIC=EUCLIDEANcompulsory keyword ( default=EUCLIDEAN )the method that you are going to use to measure the distances between pointsNLOW_DIM=2 OUTPUT_PCA_PROJECTIONcompulsory keywordnumber of low-dimensional coordinates requireduse the output of the analysis performed by this object as input to your new analysis objectpcaSTRIDE=100compulsory keyword ( default=0 )the frequency with which to perform the required analysis and to output the data.FILE=PCA-comp.pdbcompulsory keywordthe name of the file to output to

- Glossary of keywords and components

- Description of components

By default the value of the calculated quantity can be referenced elsewhere in the input file by using the label of the action. Alternatively this Action can be used to calculate the following quantities by employing the keywords listed below. These quantities can be referenced elsewhere in the input by using this Action's label followed by a dot and the name of the quantity required from the list below.

Quantity | Description |

coord | the low-dimensional projections of the various input configurations |

In addition the following quantities can be calculated by employing the keywords listed below

Quantity | Keyword | Description |

gradient | GRADIENT | the gradient |

vmean | VMEAN | the norm of the mean vector. The output component can be referred to elsewhere in the input file by using the label.vmean |

vsum | VSUM | the norm of sum of vectors. The output component can be referred to elsewhere in the input file by using the label.vsum |

spath | SPATH | the position on the path |

gspath | GPATH | the position on the path calculated using trigonometry |

gzpath | GPATH | the distance from the path calculated using trigonometry |

zpath | ZPATH | the distance from the path |

altmin | ALT_MIN | the minimum value. This is calculated using the formula described in the description of the keyword so as to make it continuous. |

between | BETWEEN | the number/fraction of values within a certain range. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters. |

highest | HIGHEST | the highest of the quantities calculated by this action |

lessthan | LESS_THAN | the number of values less than a target value. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters. |

lowest | LOWEST | the lowest of the quantities calculated by this action |

max | MAX | the maximum value. This is calculated using the formula described in the description of the keyword so as to make it continuous. |

mean | MEAN | the mean value. The output component can be referred to elsewhere in the input file by using the label.mean |

min | MIN | the minimum value. This is calculated using the formula described in the description of the keyword so as to make it continuous. |

moment | MOMENTS | the central moments of the distribution of values. The second moment would be referenced elsewhere in the input file using label.moment-2, the third as label.moment-3, etc. |

morethan | MORE_THAN | the number of values more than a target value. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters. |

sum | SUM | the sum of values |

- The data to analyze can be the output from another analysis algorithm

USE_OUTPUT_DATA_FROM | use the output of the analysis performed by this object as input to your new analysis object |

- Alternatively data can be collected from the trajectory using

ATOMS | the list of atoms that you are going to use in the measure of distance that you are using. For more information on how to specify lists of atoms see Groups and Virtual Atoms |

- Compulsory keywords

NLOW_DIM | number of low-dimensional coordinates required |

METRIC | ( default=EUCLIDEAN ) the method that you are going to use to measure the distances between points |

- Options

SERIAL | ( default=off ) do the calculation in serial. Do not use MPI |

LOWMEM | ( default=off ) lower the memory requirements |

ARG | the input for this action is the scalar output from one or more other actions. The particular scalars that you will use are referenced using the label of the action. If the label appears on its own then it is assumed that the Action calculates a single scalar value. The value of this scalar is thus used as the input to this new action. If * or *.* appears the scalars calculated by all the proceeding actions in the input file are taken. Some actions have multi-component outputs and each component of the output has a specific label. For example a DISTANCE action labelled dist may have three components x, y and z. To take just the x component you should use dist.x, if you wish to take all three components then use dist.*.More information on the referencing of Actions can be found in the section of the manual on the PLUMED Getting Started. Scalar values can also be referenced using POSIX regular expressions as detailed in the section on Regular Expressions. To use this feature you you must compile PLUMED with the appropriate flag. You can use multiple instances of this keyword i.e. ARG1, ARG2, ARG3... |