Skip to content

Shortcut: PCA

Module dimred
Description Usage
Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input. used in 1 tutorialsused in 0 eggs
output value type
the projections of the input coordinates on the PCA components that were found from the covariance matrix matrix

Details and examples

Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.

Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of poorly correlated variables into a set of linearly uncorrelated variables. You can read more about the specifics of this technique here

When used with molecular dynamics simulations a set of frames taken from the trajectory, , or the values of a number of collective variables which are calculated from the trajectory frames are used as input. In this second instance your input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables. However, if collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that this input trajectory is a set of poorly correlated (high-dimensional) vectors. After principal component analysis has been performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen. In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory. These output directions are some linear combination of the , and positions if the positions were used as input or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input.

As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates. In other words, you must calculate the average structure and the amount the system fluctuates around this average structure. The problem in doing so when the , and coordinates of a molecule are used as input is that the majority of the changes in the positions of the atoms comes from the translational and rotational degrees of freedom of the molecule. The first six principal components will thus, most likely, be uninteresting. Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures to be analyzed to the first frame in the trajectory. This can be used to effectively remove translational and/or rotational motions from consideration. The resulting principal components thus describe vibrational motions of the molecule.

If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be used as input for the PCAVARS action.

Examples

The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions of the first 22 atoms. The TYPE=OPTIMAL instruction ensures that translational and rotational degrees of freedom are removed from consideration. The average position and the first two principal components will be output to a file called pca-comp.pdb. Trajectory frames will be collected on every step and the PCA calculation will be performed at the end of the simulation. The colvar file that is output contains the projections of all the positions in the high dimensional space on these vectors.

Click on the labels of the actions for more information on what each action computes
tested on2.11
ff: COLLECT_FRAMESCollect atomic positions or argument values from the trajectory for later analysis This action is a shortcut and it has hidden defaults. More details ATOMSlist of atomic positions that you would like to collect and store for later analysis=1-22 STRIDE the frequency with which data should be stored for analysis=1
pca: PCAPerform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input. This action is a shortcut and it has hidden defaults. More details ARGthe arguments that you would like to make the histogram for=ff NLOW_DIMnumber of low-dimensional coordinates required=2 FILEthe file on which to output the low dimensional coordinates=pca-comp.pdb
DUMPVECTORPrint a vector to a file More details ARGthe labels of vectors/matrices that should be output in the file=pca,pca_weights FILE the file on which to write the vetors=colvar STRIDE the frequency with which the grid should be output to the file=0

The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the six distances seen in the input file below. In this calculation the first two principal components will be output to a file called PCA-comp.pdb. Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps. Consequently, if you run a 2000 step simulation the PCA analysis will be performed twice. The REWEIGHT_BIAS action in this input tells PLUMED that rather that ascribing a weight of one to each of the frames when calculating averages and covariance matrices a reweighting should be performed based and each frames' weight in these calculations should be determined based on the current value of the instantaneous bias (see REWEIGHT_BIAS).

Click on the labels of the actions for more information on what each action computes
tested on2.11
d1: DISTANCECalculate the distance/s between pairs of atoms. More details ATOMSthe pair of atom that we are calculating the distance between=1,2
d2: DISTANCECalculate the distance/s between pairs of atoms. More details ATOMSthe pair of atom that we are calculating the distance between=1,3
d3: DISTANCECalculate the distance/s between pairs of atoms. More details ATOMSthe pair of atom that we are calculating the distance between=1,4
d4: DISTANCECalculate the distance/s between pairs of atoms. More details ATOMSthe pair of atom that we are calculating the distance between=2,3
d5: DISTANCECalculate the distance/s between pairs of atoms. More details ATOMSthe pair of atom that we are calculating the distance between=2,4
d6: DISTANCECalculate the distance/s between pairs of atoms. More details ATOMSthe pair of atom that we are calculating the distance between=3,4
rr: RESTRAINTAdds harmonic and/or linear restraints on one or more variables. This action has hidden defaults. More details ARGthe values the harmonic restraint acts upon=d1 ATthe position of the restraint=0.1 KAPPA specifies that the restraint is harmonic and what the values of the force constants on each of the variables are=10
rbias: REWEIGHT_BIASCalculate weights for ensemble averages that negate the effect the bias has on the region of phase space explored This action has hidden defaults. More details TEMPthe system temperature=300

ff: COLLECT_FRAMESCollect atomic positions or argument values from the trajectory for later analysis This action is a shortcut and it has hidden defaults. More details ARGthe labels of the values whose time series you would like to collect for later analysis=d1,d2,d3,d4,d5,d6 LOGWEIGHTSlist of actions that calculates log weights that should be used to weight configurations when calculating averages=rbias STRIDE the frequency with which data should be stored for analysis=5 pca: PCAPerform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input. This action is a shortcut and it has hidden defaults. More details ARGthe arguments that you would like to make the histogram for=ff NLOW_DIMnumber of low-dimensional coordinates required=2 FILEthe file on which to output the low dimensional coordinates=pca-comp.pdb DUMPVECTORPrint a vector to a file More details ARGthe labels of vectors/matrices that should be output in the file=pca,pca_weights FILE the file on which to write the vetors=colvar STRIDE the frequency with which the grid should be output to the file=1000

Full list of keywords

The following table describes the keywords and options that can be used with this action

Keyword Type Default Description
ARG compulsory none the arguments that you would like to make the histogram for
NLOW_DIM compulsory none number of low-dimensional coordinates required
STRIDE compulsory 0 the frequency with which to perform this analysis
FMT compulsory %f the format to use when outputting the low dimensional coordinates
FILE optional not used the file on which to output the low dimensional coordinates