PCAVARS

This is part of the mapping module |

Projection on principal component eigenvectors or other high dimensional linear subspace

The collective variables described in Distances from reference configurations allow one to calculate the distance between the instaneous structure adopted by the system and some high-dimensional, reference configuration. The problem with doing this is that, as one gets further and further from the reference configuration, the distance from it becomes a progressively poorer and poorer collective variable. This happens because the ``number" of structures at a distance \(d\) from a reference configuration is proportional to \(d^N\) in an \(N\) dimensional space. Consequently, when \(d\) is small the distance from the reference configuration may well be a good collective variable. However, when \(d\) is large it is unlikely that the distance from the reference structure is a good CV. When the distance is large there will almost certainly be markedly different configuration that have the same CV value and hence barriers in transverse degrees of freedom.

For these reasons dimensionality reduction is often employed so a projection \(\mathbf{s}\) of a high-dimensional configuration \(\mathbf{X}\) in a lower dimensionality space using a function:

\[ \mathbf{s} = F(\mathbf{X}-\mathbf{X}^{ref}) \]

where here we have introduced some high-dimensional reference configuration \(\mathbf{X}^{ref}\). By far the simplest way to do this is to use some linear operator for \(F\). That is to say we find a low-dimensional projection by rotating the basis vectors using some linear algebra:

\[ \mathbf{s}_i = \sum_k A_{ik} ( X_{k} - X_{k}^{ref} ) \]

Here \(A\) is a \(d\) by \(D\) matrix where \(D\) is the dimensionality of the high dimensional space and \(d\) is the dimensionality of the lower dimensional subspace. In plumed when this kind of projection you can use the majority of the metrics detailed on Distances from reference configurations to calculate the displacement, \(\mathbf{X}-\mathbf{X}^{ref}\), from the reference configuration. The matrix \(A\) can be found by various means including principal component analysis and normal mode analysis. In both these methods the rows of \(A\) would be the principle eigenvectors of a square matrix. For PCA the covariance while for normal modes the Hessian.

**Bug:**- It is not possible to use the DRMSD metric with this variable. You can get around this by listing the set of distances you wish to calculate for your DRMSD in the plumed file explicitally and using the EUCLIDEAN metric. MAHALONOBIS and NORM-EUCLIDEAN also do not work with this variable but using these options makes little sense when projecting on a linear subspace.

- Description of components

By default this Action calculates the following quantities. These quanties can be referenced elsewhere in the input by using this Action's label followed by a dot and the name of the quantity required from the list below.

Quantity | Description |

eig | the projections on each eigenvalue are stored on values labeled eig-1, eig-2, ... |

residual | the distance of the configuration from the linear subspace defined by the vectors, \(e_i\), that are contained in the rows of \(A\). In other words this is \(\sqrt( r^2 - \sum_i [\mathbf{r}.\mathbf{e_i}]^2)\) where \(r\) is the distance between the instantaneous position and the reference point. |

- Compulsory keywords

REFERENCE | a pdb file containing the reference configuration and configurations that define the directions for each eigenvector |

TYPE | ( default=OPTIMAL ) The method we are using for alignment to the reference structure |

- Options

NUMERICAL_DERIVATIVES | ( default=off ) calculate the derivatives for these quantities numerically |

NORMALIZE | ( default=off ) calculate the length of the eigenvector input and divide the components by it so as to have a normalised vector |

- Examples

The following input calculates a projection on a linear subspace where the displacements from the reference configuration are calculated using the OPTIMAL metric. Consequently, both translation of the center of mass of the atoms and rotation of the reference frame are removed from these displacements. The matrix \(A\) and the reference configuration \(R^{ref}\) are specified in the pdb input file reference.pdb and the value of all projections (and the residual) are output to a file called colvar2.

PCAVARS REFERENCE=reference.pdb TYPE=OPTIMAL LABEL=pca2 PRINT ARG=pca2.* FILE=colvar2

The reference configurations can be specified using a pdb file. The first configuration that you provide is the reference configuration, which is refered to in the above as \(X^{ref}\) subsequent configurations give the directions of row vectors that are contained in the matrix \(A\) above. These directions can be specified by specifying a second configuration - in this case a vector will be constructed by calculating the displacement of this second configuration from the reference configuration. A pdb input prepared in this way would look as follows:

ATOM 2 CH3 ACE 1 12.932 -14.718 -6.016 1.00 1.00 ATOM 5 C ACE 1 21.312 -9.928 -5.946 1.00 1.00 ATOM 9 CA ALA 2 19.462 -11.088 -8.986 1.00 1.00 ATOM 13 HB2 ALA 2 21.112 -10.688 -12.476 1.00 1.00 ATOM 15 C ALA 2 19.422 7.978 -14.536 1.00 1.00 ATOM 20 HH31 NME 3 20.122 -9.928 -17.746 1.00 1.00 ATOM 21 HH32 NME 3 18.572 -13.148 -16.346 1.00 1.00 END ATOM 2 CH3 ACE 1 13.932 -14.718 -6.016 1.00 1.00 ATOM 5 C ACE 1 20.312 -9.928 -5.946 1.00 1.00 ATOM 9 CA ALA 2 18.462 -11.088 -8.986 1.00 1.00 ATOM 13 HB2 ALA 2 20.112 -11.688 -12.476 1.00 1.00 ATOM 15 C ALA 2 19.422 7.978 -12.536 1.00 1.00 ATOM 20 HH31 NME 3 20.122 -9.928 -17.746 1.00 1.00 ATOM 21 HH32 NME 3 18.572 -13.148 -16.346 1.00 1.00 END

Alternatively, the second configuration can specify the components of \(A\) explicitally. In this case you need to include the keyword TYPE=DIRECTION in the remarks to the pdb as shown below.

ATOM 2 CH3 ACE 1 12.932 -14.718 -6.016 1.00 1.00 ATOM 5 C ACE 1 21.312 -9.928 -5.946 1.00 1.00 ATOM 9 CA ALA 2 19.462 -11.088 -8.986 1.00 1.00 ATOM 13 HB2 ALA 2 21.112 -10.688 -12.476 1.00 1.00 ATOM 15 C ALA 2 19.422 7.978 -14.536 1.00 1.00 ATOM 20 HH31 NME 3 20.122 -9.928 -17.746 1.00 1.00 ATOM 21 HH32 NME 3 18.572 -13.148 -16.346 1.00 1.00 END REMARK TYPE=DIRECTION ATOM 2 CH3 ACE 1 0.1414 0.3334 -0.0302 1.00 0.00 ATOM 5 C ACE 1 0.0893 -0.1095 -0.1434 1.00 0.00 ATOM 9 CA ALA 2 0.0207 -0.321 0.0321 1.00 0.00 ATOM 13 HB2 ALA 2 0.0317 -0.6085 0.0783 1.00 0.00 ATOM 15 C ALA 2 0.1282 -0.4792 0.0797 1.00 0.00 ATOM 20 HH31 NME 3 0.0053 -0.465 0.0309 1.00 0.00 ATOM 21 HH32 NME 3 -0.1019 -0.4261 -0.0082 1.00 0.00 END

If your metric involves arguments the labels of these arguments in your plumed input file should be specified in the REMARKS for each of the frames of your path. An input file in this case might look like this:

DESCRIPTION: a pca eigenvector specified using the start point and direction in the HD space. REMARK WEIGHT=1.0 REMARK ARG=d1,d2 REMARK d1=1.0 d2=1.0 END REMARK TYPE=DIRECTION REMARK ARG=d1,d2 REMARK d1=0.1 d2=0.25 END

Here we are working with the EUCLIDEAN metric and notice that we have specified the components of \(A\) using DIRECTION. Consequently, the values of d1 and d2 in the second frame above do not specify a particular coordinate in the high-dimensional space as in they do in the first frame. Instead these values are the coefficients that can be used to construct a linear combination of d1 and d2. If we wanted to specify the direction in this metric using the start and end point of the vector we would write:

DESCRIPTION: a pca eigenvector specified using the start and end point of a vector in the HD space. REMARK WEIGHT=1.0 REMARK ARG=d1,d2 REMARK d1=1.0 d2=1.0 END REMARK ARG=d1,d2 REMARK d1=1.1 d2=1.25 END