PCAVARS

This is part of the mapping module |

Projection on principal component eigenvectors or other high dimensional linear subspace

The collective variables described in Distances from reference configurations allow one to calculate the distance between the instantaneous structure adopted by the system and some high-dimensional, reference configuration. The problem with doing this is that, as one gets further and further from the reference configuration, the distance from it becomes a progressively poorer and poorer collective variable. This happens because the ``number" of structures at a distance \(d\) from a reference configuration is proportional to \(d^N\) in an \(N\) dimensional space. Consequently, when \(d\) is small the distance from the reference configuration may well be a good collective variable. However, when \(d\) is large it is unlikely that the distance from the reference structure is a good CV. When the distance is large there will almost certainly be markedly different configuration that have the same CV value and hence barriers in transverse degrees of freedom.

For these reasons dimensionality reduction is often employed so a projection \(\mathbf{s}\) of a high-dimensional configuration \(\mathbf{X}\) in a lower dimensionality space using a function:

\[ \mathbf{s} = F(\mathbf{X}-\mathbf{X}^{ref}) \]

where here we have introduced some high-dimensional reference configuration \(\mathbf{X}^{ref}\). By far the simplest way to do this is to use some linear operator for \(F\). That is to say we find a low-dimensional projection by rotating the basis vectors using some linear algebra:

\[ \mathbf{s}_i = \sum_k A_{ik} ( X_{k} - X_{k}^{ref} ) \]

Here \(A\) is a \(d\) by \(D\) matrix where \(D\) is the dimensionality of the high dimensional space and \(d\) is the dimensionality of the lower dimensional subspace. In plumed when this kind of projection you can use the majority of the metrics detailed on Distances from reference configurations to calculate the displacement, \(\mathbf{X}-\mathbf{X}^{ref}\), from the reference configuration. The matrix \(A\) can be found by various means including principal component analysis and normal mode analysis. In both these methods the rows of \(A\) would be the principle eigenvectors of a square matrix. For PCA the covariance while for normal modes the Hessian.

**Bug:**- It is not possible to use the DRMSD metric with this variable. You can get around this by listing the set of distances you wish to calculate for your DRMSD in the plumed file explicitly and using the EUCLIDEAN metric. MAHALONOBIS and NORM-EUCLIDEAN also do not work with this variable but using these options makes little sense when projecting on a linear subspace.

- Examples

The following input calculates a projection on a linear subspace where the displacements from the reference configuration are calculated using the OPTIMAL metric. Consequently, both translation of the center of mass of the atoms and rotation of the reference frame are removed from these displacements. The matrix \(A\) and the reference configuration \(R^{ref}\) are specified in the pdb input file reference.pdb and the value of all projections (and the residual) are output to a file called colvar2.

Click on the labels of the actions for more information on what each action computes

pca2:PCAVARSREFERENCE=reference.pdbcompulsory keyworda pdb file containing the reference configuration and configurations that define the directions for each eigenvectorTYPE=OPTIMAL PRINTcompulsory keyword ( default=OPTIMAL )The method we are using for alignment to the reference structureARG=the input for this action is the scalar output from one or more other actions.pca2.*FILE=colvar2the name of the file on which to output these quantities

The reference configurations can be specified using a pdb file. The first configuration that you provide is the reference configuration, which is referred to in the above as \(X^{ref}\) subsequent configurations give the directions of row vectors that are contained in the matrix \(A\) above. These directions can be specified by specifying a second configuration - in this case a vector will be constructed by calculating the displacement of this second configuration from the reference configuration. A pdb input prepared in this way would look as follows:

REMARK TYPE=OPTIMAL ATOM 2 CH3 ACE 1 12.932 -14.718 -6.016 1.00 1.00 ATOM 5 C ACE 1 21.312 -9.928 -5.946 1.00 1.00 ATOM 9 CA ALA 2 19.462 -11.088 -8.986 1.00 1.00 ATOM 13 HB2 ALA 2 21.112 -10.688 -12.476 1.00 1.00 ATOM 15 C ALA 2 19.422 7.978 -14.536 1.00 1.00 ATOM 20 HH31 NME 3 20.122 -9.928 -17.746 1.00 1.00 ATOM 21 HH32 NME 3 18.572 -13.148 -16.346 1.00 1.00 END REMARK TYPE=OPTIMAL ATOM 2 CH3 ACE 1 13.932 -14.718 -6.016 1.00 1.00 ATOM 5 C ACE 1 20.312 -9.928 -5.946 1.00 1.00 ATOM 9 CA ALA 2 18.462 -11.088 -8.986 1.00 1.00 ATOM 13 HB2 ALA 2 20.112 -11.688 -12.476 1.00 1.00 ATOM 15 C ALA 2 19.422 7.978 -12.536 1.00 1.00 ATOM 20 HH31 NME 3 20.122 -9.928 -17.746 1.00 1.00 ATOM 21 HH32 NME 3 18.572 -13.148 -16.346 1.00 1.00 END

Alternatively, the second configuration can specify the components of \(A\) explicitly. In this case you need to include the keyword TYPE=DIRECTION in the remarks to the pdb as shown below.

ATOM 2 CH3 ACE 1 12.932 -14.718 -6.016 1.00 1.00 ATOM 5 C ACE 1 21.312 -9.928 -5.946 1.00 1.00 ATOM 9 CA ALA 2 19.462 -11.088 -8.986 1.00 1.00 ATOM 13 HB2 ALA 2 21.112 -10.688 -12.476 1.00 1.00 ATOM 15 C ALA 2 19.422 7.978 -14.536 1.00 1.00 ATOM 20 HH31 NME 3 20.122 -9.928 -17.746 1.00 1.00 ATOM 21 HH32 NME 3 18.572 -13.148 -16.346 1.00 1.00 END REMARK TYPE=DIRECTION ATOM 2 CH3 ACE 1 0.1414 0.3334 -0.0302 1.00 0.00 ATOM 5 C ACE 1 0.0893 -0.1095 -0.1434 1.00 0.00 ATOM 9 CA ALA 2 0.0207 -0.321 0.0321 1.00 0.00 ATOM 13 HB2 ALA 2 0.0317 -0.6085 0.0783 1.00 0.00 ATOM 15 C ALA 2 0.1282 -0.4792 0.0797 1.00 0.00 ATOM 20 HH31 NME 3 0.0053 -0.465 0.0309 1.00 0.00 ATOM 21 HH32 NME 3 -0.1019 -0.4261 -0.0082 1.00 0.00 END

If your metric involves arguments the labels of these arguments in your plumed input file should be specified in the REMARKS for each of the frames of your path. An input file in this case might look like this:

DESCRIPTION: a pca eigenvector specified using the start point and direction in the HD space. REMARK WEIGHT=1.0 REMARK ARG=d1,d2 REMARK d1=1.0 d2=1.0 END REMARK TYPE=DIRECTION REMARK ARG=d1,d2 REMARK d1=0.1 d2=0.25 END

Here we are working with the EUCLIDEAN metric and notice that we have specified the components of \(A\) using DIRECTION. Consequently, the values of d1 and d2 in the second frame above do not specify a particular coordinate in the high-dimensional space as in they do in the first frame. Instead these values are the coefficients that can be used to construct a linear combination of d1 and d2. If we wanted to specify the direction in this metric using the start and end point of the vector we would write:

DESCRIPTION: a pca eigenvector specified using the start and end point of a vector in the HD space. REMARK WEIGHT=1.0 REMARK ARG=d1,d2 REMARK d1=1.0 d2=1.0 END REMARK ARG=d1,d2 REMARK d1=1.1 d2=1.25 END

- Glossary of keywords and components

- Description of components

By default this Action calculates the following quantities. These quantities can be referenced elsewhere in the input by using this Action's label followed by a dot and the name of the quantity required from the list below.

Quantity | Description |

eig | the projections on each eigenvalue are stored on values labeled eig-1, eig-2, ... |

residual | the distance of the configuration from the linear subspace defined by the vectors, \(e_i\), that are contained in the rows of \(A\). In other words this is \(\sqrt( r^2 - \sum_i [\mathbf{r}.\mathbf{e_i}]^2)\) where \(r\) is the distance between the instantaneous position and the reference point. |

- Compulsory keywords

REFERENCE | a pdb file containing the reference configuration and configurations that define the directions for each eigenvector |

TYPE | ( default=OPTIMAL ) The method we are using for alignment to the reference structure |

- Options

NUMERICAL_DERIVATIVES | ( default=off ) calculate the derivatives for these quantities numerically |

NOPBC | ( default=off ) ignore the periodic boundary conditions when calculating distances |

ARG | the input for this action is the scalar output from one or more other actions. The particular scalars that you will use are referenced using the label of the action. If the label appears on its own then it is assumed that the Action calculates a single scalar value. The value of this scalar is thus used as the input to this new action. If * or *.* appears the scalars calculated by all the proceeding actions in the input file are taken. Some actions have multi-component outputs and each component of the output has a specific label. For example a DISTANCE action labelled dist may have three components x, y and z. To take just the x component you should use dist.x, if you wish to take all three components then use dist.*.More information on the referencing of Actions can be found in the section of the manual on the PLUMED Getting Started. Scalar values can also be referenced using POSIX regular expressions as detailed in the section on Regular Expressions. To use this feature you you must compile PLUMED with the appropriate flag. You can use multiple instances of this keyword i.e. ARG1, ARG2, ARG3... |