This is part of the mapping module
	It is only available if you configure PLUMED with ./configure –enable-modules=mapping . Furthermore, this feature is still being developed so take care when using it and report any problems on the mailing list.

Projection on principal component eigenvectors or other high dimensional linear subspace

The collective variables described in Distances from reference configurations allow one to calculate the distance between the instantaneous structure adopted by the system and some high-dimensional, reference configuration. The problem with doing this is that, as one gets further and further from the reference configuration, the distance from it becomes a progressively poorer and poorer collective variable. This happens because the ``number" of structures at a distance \(d\) from a reference configuration is proportional to \(d^N\) in an \(N\) dimensional space. Consequently, when \(d\) is small the distance from the reference configuration may well be a good collective variable. However, when \(d\) is large it is unlikely that the distance from the reference structure is a good CV. When the distance is large there will almost certainly be markedly different configuration that have the same CV value and hence barriers in transverse degrees of freedom.

For these reasons dimensionality reduction is often employed so a projection \(\mathbf{s}\) of a high-dimensional configuration \(\mathbf{X}\) in a lower dimensionality space using a function:

\[ \mathbf{s} = F(\mathbf{X}-\mathbf{X}^{ref}) \]

where here we have introduced some high-dimensional reference configuration \(\mathbf{X}^{ref}\). By far the simplest way to do this is to use some linear operator for \(F\). That is to say we find a low-dimensional projection by rotating the basis vectors using some linear algebra:

\[ \mathbf{s}_i = \sum_k A_{ik} ( X_{k} - X_{k}^{ref} ) \]

Here \(A\) is a \(d\) by \(D\) matrix where \(D\) is the dimensionality of the high dimensional space and \(d\) is the dimensionality of the lower dimensional subspace. In plumed when this kind of projection you can use the majority of the metrics detailed on Distances from reference configurations to calculate the displacement, \(\mathbf{X}-\mathbf{X}^{ref}\), from the reference configuration. The matrix \(A\) can be found by various means including principal component analysis and normal mode analysis. In both these methods the rows of \(A\) would be the principle eigenvectors of a square matrix. For PCA the covariance while for normal modes the Hessian.

Bug:: It is not possible to use the DRMSD metric with this variable. You can get around this by listing the set of distances you wish to calculate for your DRMSD in the plumed file explicitly and using the EUCLIDEAN metric. MAHALONOBIS and NORM-EUCLIDEAN also do not work with this variable but using these options makes little sense when projecting on a linear subspace.

Examples

The following input calculates a projection on a linear subspace where the displacements from the reference configuration are calculated using the OPTIMAL metric. Consequently, both translation of the center of mass of the atoms and rotation of the reference frame are removed from these displacements. The matrix \(A\) and the reference configuration \(R^{ref}\) are specified in the pdb input file reference.pdb and the value of all projections (and the residual) are output to a file called colvar2.

Click on the labels of the actions for more information on what each action computes

pca2: PCAVARS REFERENCEcompulsory keyword 
a pdb file containing the set of reference configurations 
=reference.pdb TYPEcompulsory keyword ( default=OPTIMAL-FAST )
the manner in which distances are calculated. 
=OPTIMAL  You cannot view the components that are calculated by each action for this input file. Sorry 
PRINT ARGthe input for this action is the scalar output from one or more other actions. 
=pca2.* FILEthe name of the file on which to output these quantities 
=colvar2  You cannot view the components that are calculated by each action for this input file. Sorry

The reference configurations can be specified using a pdb file. The first configuration that you provide is the reference configuration, which is referred to in the above as \(X^{ref}\) subsequent configurations give the directions of row vectors that are contained in the matrix \(A\) above. These directions are specified by giving a second configuration that describes the components of \(A\) explicitly.

ATOM      2  CH3 ACE     1      12.932 -14.718  -6.016  1.00  1.00
ATOM      5  C   ACE     1      21.312  -9.928  -5.946  1.00  1.00
ATOM      9  CA  ALA     2      19.462 -11.088  -8.986  1.00  1.00
ATOM     13  HB2 ALA     2      21.112 -10.688 -12.476  1.00  1.00
ATOM     15  C   ALA     2      19.422   7.978 -14.536  1.00  1.00
ATOM     20 HH31 NME     3      20.122  -9.928 -17.746  1.00  1.00
ATOM     21 HH32 NME     3      18.572 -13.148 -16.346  1.00  1.00
END
REMARK
ATOM      2  CH3 ACE     1      0.1414  0.3334 -0.0302  1.00  0.00
ATOM      5  C   ACE     1      0.0893 -0.1095 -0.1434  1.00  0.00
ATOM      9  CA  ALA     2      0.0207 -0.321   0.0321  1.00  0.00
ATOM     13  HB2 ALA     2      0.0317 -0.6085  0.0783  1.00  0.00
ATOM     15  C   ALA     2      0.1282 -0.4792  0.0797  1.00  0.00
ATOM     20 HH31 NME     3      0.0053 -0.465   0.0309  1.00  0.00
ATOM     21 HH32 NME     3     -0.1019 -0.4261 -0.0082  1.00  0.00
END

Notice that the PCAVARS command is a shortcut. If you look at how the shortcut in the above input is expanded you should be able to see how the command works by calculating the RMSD displacement between the instantaneous and reference configuration and how those displacements are then projected on the eigenvector that was specified in the second frame of the pdb input above. Understanding the expanded version of this shortcut command allows you to recognise that you can project the displacement vector on any arbitrary vector. For example in the input below two reference structures are provided in the pdb file. PLUMED calculates the RMSD distance between these two reference configurations during setup and sets up a unit vector called eig that points along the director connecting the two RMSD structure. During the calculation the vector connecting the instantaneous configuration and the first of the two reference configurations in computed. This vector is then projected on the unit vector connecting the two initial structures:

Click on the labels of the actions for more information on what each action computes

# Read in two reference configuratioms from PDB file
ref1: PDB2CONSTANT REFERENCEcompulsory keyword 
a file in pdb format containing the reference structure 
=two-frames.pdb NUMBERcompulsory keyword ( default=0 )
if there are multiple structures in the pdb file you can specify that you want the
RMSD from a specific structure by specifying its place in the file here. 
=1  You cannot view the components that are calculated by each action for this input file. Sorry 
ref1T: TRANSPOSE ARGthe input for this action is the scalar output from one or more other actions. 
=ref1  You cannot view the components that are calculated by each action for this input file. Sorry 
ref2: PDB2CONSTANT REFERENCEcompulsory keyword 
a file in pdb format containing the reference structure 
=two-frames.pdb NUMBERcompulsory keyword ( default=0 )
if there are multiple structures in the pdb file you can specify that you want the
RMSD from a specific structure by specifying its place in the file here. 
=2  You cannot view the components that are calculated by each action for this input file. Sorry 
# Calculate the displacement vector that takes you from ref1 to ref2
eigu: RMSD_VECTOR ARGthe input for this action is the scalar output from one or more other actions. 
=ref1T,ref2 DISPLACEMENT( default=off ) Calculate the vector of displacements instead of the length of this
vector 
 TYPEcompulsory keyword ( default=SIMPLE )
the manner in which RMSD alignment is performed. 
=OPTIMAL  You cannot view the components that are calculated by each action for this input file. Sorry 
# Normalise the reference vector
eigu2: CUSTOM ARGthe input to this function. 
=eigu.disp FUNCcompulsory keyword 
the function you wish to evaluate 
=x*x PERIODICcompulsory keyword 
if the output of your function is periodic then you should specify the periodicity
of the function. 
=NO  You cannot view the components that are calculated by each action for this input file. Sorry 
eign2: SUM ARGthe input to this function. 
=eigu2 PERIODICcompulsory keyword 
if the output of your function is periodic then you should specify the periodicity
of the function. 
=NO  You cannot view the components that are calculated by each action for this input file. Sorry 
eig: CUSTOM ARGthe input to this function. 
=eigu.disp,eign2 FUNCcompulsory keyword 
the function you wish to evaluate 
=x/sqrt(y) PERIODICcompulsory keyword 
if the output of your function is periodic then you should specify the periodicity
of the function. 
=NO  You cannot view the components that are calculated by each action for this input file. Sorry 
eigT: TRANSPOSE ARGthe input for this action is the scalar output from one or more other actions. 
=eig  You cannot view the components that are calculated by each action for this input file. Sorry 
# Everything prior to this point is only done in setup. From here onwards we have the commands that are done during the main calculate loop.
# Calculate the RMSD displacement between the instaneous structure and the first reference structure
rmsd: RMSD REFERENCEcompulsory keyword 
a file in pdb format containing the reference structure and the atoms involved in
the CV 
=two-frames.pdb NUMBERcompulsory keyword ( default=0 )
if there are multiple structures in the pdb file you can specify that you want the
RMSD from a specific structure by specifying its place in the file here. 
=1 TYPEcompulsory keyword ( default=SIMPLE )
the manner in which RMSD alignment is performed. 
=OPTIMAL DISPLACEMENT( default=off ) Calculate the vector of displacements instead of the length of this
vector 
 SQUARED( default=off ) This should be setted if you want MSD instead of RMSD 
  You cannot view the components that are calculated by each action for this input file. Sorry 
# Project the displacement computed above on the director of the vector that connects reference structure ref1 to refeference structure ref2
pca: MATRIX_VECTOR_PRODUCT ARGthe input for this action is the scalar output from one or more other actions. 
=eigT,rmsd.disp  You cannot view the components that are calculated by each action for this input file. Sorry 
# Print the final CV to a file
PRINT ARGthe input for this action is the scalar output from one or more other actions. 
=pca FILEthe name of the file on which to output these quantities 
=colvar  You cannot view the components that are calculated by each action for this input file. Sorry

You also project vectors of differences of arguments on reference vectors. For example, the input below can be used to look at the projection of the vector connecting the instantanous configuraiton to a reference point in CV on a reference vector that has been specified in the PDB file.

Click on the labels of the actions for more information on what each action computes

d1: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. 
=1,2  You cannot view the components that are calculated by each action for this input file. Sorry 
d2: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. 
=3,4  You cannot view the components that are calculated by each action for this input file. Sorry 
d3: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. 
=5,6  You cannot view the components that are calculated by each action for this input file. Sorry 
pca: PCAVARS ARGif there are arguments to be used specify them here 
=d1,d2,d3 REFERENCEcompulsory keyword 
a pdb file containing the set of reference configurations 
=epath.pdb  You cannot view the components that are calculated by each action for this input file. Sorry 
PRINT ARGthe input for this action is the scalar output from one or more other actions. 
=pca_eig-1,pca_residual FILEthe name of the file on which to output these quantities 
=colvar  You cannot view the components that are calculated by each action for this input file. Sorry

The pdb input file for this calculation might look something like this:

REMARK d1=0.1221 d2=0.0979 d3=0.1079
END
REMARK d1=0.078811 d2=-0.945732 d3=-0.315244
END

The first set of argument values in this input file are the reference values for the arguments. The second any subsquent sets of arguments give the coefficients that should be used when constructing linear combinations.

Notice, lastly, that you can also use a combination of argument values and atomic positions when specifying the reference configuration and the reference directions. If you are doing something this complicated, however, you are perhaps better working with the PLUMED input directly rather than this shortcut as you will need to take special measures to ensure that all your CVs are in the same units.

Glossary of keywords and components

Description of components

Quantity	Description
eig	the projections on the eigenvalues
residual	the residual distance that is not projected on any of the eigenvalues

Compulsory keywords

REFERENCE	a pdb file containing the set of reference configurations
TYPE	( default=OPTIMAL-FAST ) the manner in which distances are calculated. More information on the different metrics that are available in PLUMED can be found in the section of the manual on Distances from reference configurations

Options

NOPBC

( default=off ) do not use periodic boundary conditions when computing this quantity

ARG

if there are arguments to be used specify them here