PCAVARS
This is part of the mapping module
It is only available if you configure PLUMED with ./configure –enable-modules=mapping . Furthermore, this feature is still being developed so take care when using it and report any problems on the mailing list.

Projection on principal component eigenvectors or other high dimensional linear subspace

The collective variables described in Distances from reference configurations allow one to calculate the distance between the instantaneous structure adopted by the system and some high-dimensional, reference configuration. The problem with doing this is that, as one gets further and further from the reference configuration, the distance from it becomes a progressively poorer and poorer collective variable. This happens because the ``number" of structures at a distance \(d\) from a reference configuration is proportional to \(d^N\) in an \(N\) dimensional space. Consequently, when \(d\) is small the distance from the reference configuration may well be a good collective variable. However, when \(d\) is large it is unlikely that the distance from the reference structure is a good CV. When the distance is large there will almost certainly be markedly different configuration that have the same CV value and hence barriers in transverse degrees of freedom.

For these reasons dimensionality reduction is often employed so a projection \(\mathbf{s}\) of a high-dimensional configuration \(\mathbf{X}\) in a lower dimensionality space using a function:

\[ \mathbf{s} = F(\mathbf{X}-\mathbf{X}^{ref}) \]

where here we have introduced some high-dimensional reference configuration \(\mathbf{X}^{ref}\). By far the simplest way to do this is to use some linear operator for \(F\). That is to say we find a low-dimensional projection by rotating the basis vectors using some linear algebra:

\[ \mathbf{s}_i = \sum_k A_{ik} ( X_{k} - X_{k}^{ref} ) \]

Here \(A\) is a \(d\) by \(D\) matrix where \(D\) is the dimensionality of the high dimensional space and \(d\) is the dimensionality of the lower dimensional subspace. In plumed when this kind of projection you can use the majority of the metrics detailed on Distances from reference configurations to calculate the displacement, \(\mathbf{X}-\mathbf{X}^{ref}\), from the reference configuration. The matrix \(A\) can be found by various means including principal component analysis and normal mode analysis. In both these methods the rows of \(A\) would be the principle eigenvectors of a square matrix. For PCA the covariance while for normal modes the Hessian.

Bug:
It is not possible to use the DRMSD metric with this variable. You can get around this by listing the set of distances you wish to calculate for your DRMSD in the plumed file explicitly and using the EUCLIDEAN metric. MAHALONOBIS and NORM-EUCLIDEAN also do not work with this variable but using these options makes little sense when projecting on a linear subspace.
Examples

The following input calculates a projection on a linear subspace where the displacements from the reference configuration are calculated using the OPTIMAL metric. Consequently, both translation of the center of mass of the atoms and rotation of the reference frame are removed from these displacements. The matrix \(A\) and the reference configuration \(R^{ref}\) are specified in the pdb input file reference.pdb and the value of all projections (and the residual) are output to a file called colvar2.

Click on the labels of the actions for more information on what each action computes
tested on master
pca2: PCAVARS 
REFERENCE
compulsory keyword a pdb file containing the set of reference configurations
=reference.pdb
TYPE
compulsory keyword ( default=OPTIMAL-FAST ) the manner in which distances are calculated.
=OPTIMAL PRINT
ARG
the input for this action is the scalar output from one or more other actions.
=pca2.*
FILE
the name of the file on which to output these quantities
=colvar2

The reference configurations can be specified using a pdb file. The first configuration that you provide is the reference configuration, which is referred to in the above as \(X^{ref}\) subsequent configurations give the directions of row vectors that are contained in the matrix \(A\) above. These directions are specified by giving a second configuration that describes the components of \(A\) explicitly.

ATOM      2  CH3 ACE     1      12.932 -14.718  -6.016  1.00  1.00
ATOM      5  C   ACE     1      21.312  -9.928  -5.946  1.00  1.00
ATOM      9  CA  ALA     2      19.462 -11.088  -8.986  1.00  1.00
ATOM     13  HB2 ALA     2      21.112 -10.688 -12.476  1.00  1.00
ATOM     15  C   ALA     2      19.422   7.978 -14.536  1.00  1.00
ATOM     20 HH31 NME     3      20.122  -9.928 -17.746  1.00  1.00
ATOM     21 HH32 NME     3      18.572 -13.148 -16.346  1.00  1.00
END
REMARK
ATOM      2  CH3 ACE     1      0.1414  0.3334 -0.0302  1.00  0.00
ATOM      5  C   ACE     1      0.0893 -0.1095 -0.1434  1.00  0.00
ATOM      9  CA  ALA     2      0.0207 -0.321   0.0321  1.00  0.00
ATOM     13  HB2 ALA     2      0.0317 -0.6085  0.0783  1.00  0.00
ATOM     15  C   ALA     2      0.1282 -0.4792  0.0797  1.00  0.00
ATOM     20 HH31 NME     3      0.0053 -0.465   0.0309  1.00  0.00
ATOM     21 HH32 NME     3     -0.1019 -0.4261 -0.0082  1.00  0.00
END

Notice that the PCAVARS command is a shortcut. If you look at how the shortcut in the above input is expanded you should be able to see how the command works by calculating the RMSD displacement between the instantaneous and reference configuration and how those displacements are then projected on the eigenvector that was specified in the second frame of the pdb input above. Understanding the expanded version of this shortcut command allows you to recognise that you can project the displacement vector on any arbitrary vector. For example in the input below two reference structures are provided in the pdb file. PLUMED calculates the RMSD distance between these two reference configurations during setup and sets up a unit vector called eig that points along the director connecting the two RMSD structure. During the calculation the vector connecting the instantaneous configuration and the first of the two reference configurations in computed. This vector is then projected on the unit vector connecting the two initial structures:

Click on the labels of the actions for more information on what each action computes
tested on master
# Read in two reference configuratioms from PDB file
ref1: PDB2CONSTANT 
REFERENCE
compulsory keyword a file in pdb format containing the reference structure
=two-frames.pdb
NUMBER
compulsory keyword ( default=0 ) if there are multiple structures in the pdb file you can specify that you want the RMSD from a specific structure by specifying its place in the file here.
=1 ref1T: TRANSPOSE
ARG
the input for this action is the scalar output from one or more other actions.
=ref1 ref2: PDB2CONSTANT
REFERENCE
compulsory keyword a file in pdb format containing the reference structure
=two-frames.pdb
NUMBER
compulsory keyword ( default=0 ) if there are multiple structures in the pdb file you can specify that you want the RMSD from a specific structure by specifying its place in the file here.
=2 # Calculate the displacement vector that takes you from ref1 to ref2 eigu: RMSD_VECTOR
ARG
the input for this action is the scalar output from one or more other actions.
=ref1T,ref2
DISPLACEMENT
( default=off ) Calculate the vector of displacements instead of the length of this vector
TYPE
compulsory keyword ( default=SIMPLE ) the manner in which RMSD alignment is performed.
=OPTIMAL # Normalise the reference vector eigu2: CUSTOM
ARG
the input to this function.
=eigu.disp
FUNC
compulsory keyword the function you wish to evaluate
=x*x
PERIODIC
compulsory keyword if the output of your function is periodic then you should specify the periodicity of the function.
=NO eign2: SUM
ARG
the input to this function.
=eigu2
PERIODIC
compulsory keyword if the output of your function is periodic then you should specify the periodicity of the function.
=NO eig: CUSTOM
ARG
the input to this function.
=eigu.disp,eign2
FUNC
compulsory keyword the function you wish to evaluate
=x/sqrt(y)
PERIODIC
compulsory keyword if the output of your function is periodic then you should specify the periodicity of the function.
=NO eigT: TRANSPOSE
ARG
the input for this action is the scalar output from one or more other actions.
=eig # Everything prior to this point is only done in setup. From here onwards we have the commands that are done during the main calculate loop. # Calculate the RMSD displacement between the instaneous structure and the first reference structure rmsd: RMSD
REFERENCE
compulsory keyword a file in pdb format containing the reference structure and the atoms involved in the CV
=two-frames.pdb
NUMBER
compulsory keyword ( default=0 ) if there are multiple structures in the pdb file you can specify that you want the RMSD from a specific structure by specifying its place in the file here.
=1
TYPE
compulsory keyword ( default=SIMPLE ) the manner in which RMSD alignment is performed.
=OPTIMAL
DISPLACEMENT
( default=off ) Calculate the vector of displacements instead of the length of this vector
SQUARED
( default=off ) This should be setted if you want MSD instead of RMSD
# Project the displacement computed above on the director of the vector that connects reference structure ref1 to refeference structure ref2 pca: MATRIX_VECTOR_PRODUCT
ARG
the input for this action is the scalar output from one or more other actions.
=eigT,rmsd.disp # Print the final CV to a file PRINT
ARG
the input for this action is the scalar output from one or more other actions.
=pca
FILE
the name of the file on which to output these quantities
=colvar

You also project vectors of differences of arguments on reference vectors. For example, the input below can be used to look at the projection of the vector connecting the instantanous configuraiton to a reference point in CV on a reference vector that has been specified in the PDB file.

Click on the labels of the actions for more information on what each action computes
tested on master
d1: DISTANCE 
ATOMS
the pair of atom that we are calculating the distance between.
=1,2 d2: DISTANCE
ATOMS
the pair of atom that we are calculating the distance between.
=3,4 d3: DISTANCE
ATOMS
the pair of atom that we are calculating the distance between.
=5,6 pca: PCAVARS
ARG
if there are arguments to be used specify them here
=d1,d2,d3
REFERENCE
compulsory keyword a pdb file containing the set of reference configurations
=epath.pdb PRINT
ARG
the input for this action is the scalar output from one or more other actions.
=pca_eig-1,pca_residual
FILE
the name of the file on which to output these quantities
=colvar

The pdb input file for this calculation might look something like this:

REMARK d1=0.1221 d2=0.0979 d3=0.1079
END
REMARK d1=0.078811 d2=-0.945732 d3=-0.315244
END

The first set of argument values in this input file are the reference values for the arguments. The second any subsquent sets of arguments give the coefficients that should be used when constructing linear combinations.

Notice, lastly, that you can also use a combination of argument values and atomic positions when specifying the reference configuration and the reference directions. If you are doing something this complicated, however, you are perhaps better working with the PLUMED input directly rather than this shortcut as you will need to take special measures to ensure that all your CVs are in the same units.

Glossary of keywords and components
Description of components
Quantity Description
eig the projections on the eigenvalues
residual the residual distance that is not projected on any of the eigenvalues
Compulsory keywords
REFERENCE a pdb file containing the set of reference configurations
TYPE ( default=OPTIMAL-FAST ) the manner in which distances are calculated. More information on the different metrics that are available in PLUMED can be found in the section of the manual on Distances from reference configurations
Options
NOPBC

( default=off ) do not use periodic boundary conditions when computing this quantity

ARG if there are arguments to be used specify them here