Line data Source code
1 : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 : Copyright (c) 2016-2023 The plumed team 3 : (see the PEOPLE file at the root of the distribution for a list of names) 4 : 5 : See http://www.plumed.org for more information. 6 : 7 : This file is part of plumed, version 2. 8 : 9 : plumed is free software: you can redistribute it and/or modify 10 : it under the terms of the GNU Lesser General Public License as published by 11 : the Free Software Foundation, either version 3 of the License, or 12 : (at your option) any later version. 13 : 14 : plumed is distributed in the hope that it will be useful, 15 : but WITHOUT ANY WARRANTY; without even the implied warranty of 16 : MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 : GNU Lesser General Public License for more details. 18 : 19 : You should have received a copy of the GNU Lesser General Public License 20 : along with plumed. If not, see <http://www.gnu.org/licenses/>. 21 : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */ 22 : #include "core/ActionShortcut.h" 23 : #include "core/ActionRegister.h" 24 : #include "core/ActionWithValue.h" 25 : #include "core/ActionPilot.h" 26 : #include "core/ActionAtomistic.h" 27 : #include "core/PlumedMain.h" 28 : #include "core/ActionSet.h" 29 : 30 : //+PLUMEDOC DIMRED PCA 31 : /* 32 : Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input. 33 : 34 : Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of 35 : poorly correlated variables into a set of linearly uncorrelated variables. You can read more about the specifics of this technique 36 : [here](https://en.wikipedia.org/wiki/Principal_component_analysis) 37 : 38 : When used with molecular dynamics simulations a set of frames taken from the trajectory, $\{X_i\}$, or the values of 39 : a number of collective variables which are calculated from the trajectory frames are used as input. In this second instance your 40 : input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables. However, if 41 : collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that 42 : this input trajectory is a set of poorly correlated (high-dimensional) vectors. After principal component analysis has been 43 : performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen. 44 : In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory. 45 : These output directions are some linear combination of the $x$, $y$ and $z$ positions if the positions were used as input 46 : or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input. 47 : 48 : As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates. In other words, you must 49 : calculate the average structure and the amount the system fluctuates around this average structure. The problem in doing so when the 50 : $x$, $y$ and $z$ coordinates of a molecule are used as input is that the majority of the changes in the positions of the 51 : atoms comes from the translational and rotational degrees of freedom of the molecule. The first six principal components will thus, most likely, 52 : be uninteresting. Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures 53 : to be analyzed to the first frame in the trajectory. This can be used to effectively remove translational and/or rotational motions from 54 : consideration. The resulting principal components thus describe vibrational motions of the molecule. 55 : 56 : If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be 57 : used as input for the [PCAVARS](PCAVARS.md) action. 58 : 59 : ## Examples 60 : 61 : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions 62 : of the first 22 atoms. The `TYPE=OPTIMAL` instruction ensures that translational and rotational degrees of freedom are removed from consideration. 63 : The average position and the first two principal components will be output to a file called `pca-comp.pdb`. Trajectory frames will be collected on every step and the PCA calculation 64 : will be performed at the end of the simulation. The `colvar` file that is output contains the projections of all the positions in the high dimensional space on these vectors. 65 : 66 : ```plumed 67 : ff: COLLECT_FRAMES ATOMS=1-22 STRIDE=1 68 : pca: PCA ARG=ff NLOW_DIM=2 FILE=pca-comp.pdb 69 : DUMPVECTOR ARG=pca,pca_weights FILE=colvar STRIDE=0 70 : ``` 71 : 72 : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the six distances 73 : seen in the input file below. In this calculation the first two principal components will be output to a file called PCA-comp.pdb. 74 : Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps. Consequently, if you run a 2000 step simulation the 75 : PCA analysis will be performed twice. The REWEIGHT_BIAS action in this input tells PLUMED that rather that ascribing a weight of one to each of the frames 76 : when calculating averages and covariance matrices a reweighting should be performed based and each frames' weight in these calculations should be determined based on 77 : the current value of the instantaneous bias (see [REWEIGHT_BIAS](REWEIGHT_BIAS.md)). 78 : 79 : ```plumed 80 : d1: DISTANCE ATOMS=1,2 81 : d2: DISTANCE ATOMS=1,3 82 : d3: DISTANCE ATOMS=1,4 83 : d4: DISTANCE ATOMS=2,3 84 : d5: DISTANCE ATOMS=2,4 85 : d6: DISTANCE ATOMS=3,4 86 : rr: RESTRAINT ARG=d1 AT=0.1 KAPPA=10 87 : rbias: REWEIGHT_BIAS TEMP=300 88 : 89 : ff: COLLECT_FRAMES ARG=d1,d2,d3,d4,d5,d6 LOGWEIGHTS=rbias STRIDE=5 90 : pca: PCA ARG=ff NLOW_DIM=2 FILE=pca-comp.pdb 91 : DUMPVECTOR ARG=pca,pca_weights FILE=colvar STRIDE=1000 92 : ``` 93 : 94 : */ 95 : //+ENDPLUMEDOC 96 : 97 : namespace PLMD { 98 : namespace dimred { 99 : 100 : class PCA : public ActionShortcut { 101 : public: 102 : static void registerKeywords( Keywords& keys ); 103 : PCA( const ActionOptions& ); 104 : }; 105 : 106 : PLUMED_REGISTER_ACTION(PCA,"PCA") 107 : 108 4 : void PCA::registerKeywords( Keywords& keys ) { 109 4 : ActionShortcut::registerKeywords( keys ); 110 4 : keys.add("compulsory","ARG","the arguments that you would like to make the histogram for"); 111 4 : keys.add("compulsory","NLOW_DIM","number of low-dimensional coordinates required"); 112 4 : keys.add("compulsory","STRIDE","0","the frequency with which to perform this analysis"); 113 4 : keys.add("optional","FILE","the file on which to output the low dimensional coordinates"); 114 4 : keys.add("compulsory","FMT","%f","the format to use when outputting the low dimensional coordinates"); 115 8 : keys.setValueDescription("matrix","the projections of the input coordinates on the PCA components that were found from the covariance matrix"); 116 4 : keys.needsAction("LOGSUMEXP"); 117 4 : keys.needsAction("TRANSPOSE"); 118 4 : keys.needsAction("MATRIX_VECTOR_PRODUCT"); 119 4 : keys.needsAction("CONSTANT"); 120 4 : keys.needsAction("COLLECT"); 121 4 : keys.needsAction("OUTER_PRODUCT"); 122 4 : keys.needsAction("CUSTOM"); 123 4 : keys.needsAction("MATRIX_PRODUCT"); 124 4 : keys.needsAction("DIAGONALIZE"); 125 4 : keys.needsAction("VSTACK"); 126 4 : keys.needsAction("DUMPPDB"); 127 4 : } 128 : 129 : 130 2 : PCA::PCA(const ActionOptions&ao): 131 : Action(ao), 132 2 : ActionShortcut(ao) { 133 : // Find the argument name 134 : std::string argn; 135 2 : parse("ARG",argn); 136 2 : ActionShortcut* as = plumed.getActionSet().getShortcutActionWithLabel( argn ); 137 2 : if( !as || as->getName()!="COLLECT_FRAMES" ) { 138 0 : error("found no COLLECT_FRAMES action with label " + argn ); 139 : } 140 : // Get the final weights using the logsumexp trick 141 4 : readInputLine( getShortcutLabel() + "_weights: LOGSUMEXP ARG=" + argn + "_logweights"); 142 : // Now transpose the collected frames 143 4 : readInputLine( getShortcutLabel() + "_dataT: TRANSPOSE ARG=" + argn + "_data"); 144 : // And multiply the transpose by the weights to get the averages 145 4 : readInputLine( getShortcutLabel() + "_mean: MATRIX_VECTOR_PRODUCT ARG=" + getShortcutLabel() + "_dataT," + getShortcutLabel() + "_weights"); 146 : // Make a matrix of averages 147 4 : readInputLine( getShortcutLabel() + "_averages: OUTER_PRODUCT ARG=" + argn + "_ones," + getShortcutLabel() + "_mean"); 148 : // Make a matrix of weights 149 2 : ActionWithValue* av2 = plumed.getActionSet().selectWithLabel<ActionWithValue*>( argn + "_data" ); 150 2 : if( !av2 ) { 151 0 : error("count not find data"); 152 : } 153 2 : unsigned nones = (av2->copyOutput(0))->getShape()[1]; 154 2 : std::string ones="1"; 155 68 : for(unsigned i=1; i<nones; ++i) { 156 : ones += ",1"; 157 : } 158 4 : readInputLine( getShortcutLabel() + "_wones: CONSTANT VALUES=" + ones ); 159 4 : readInputLine( getShortcutLabel() + "_wmat: OUTER_PRODUCT ARG=" + getShortcutLabel() + "_weights," + getShortcutLabel() + "_wones"); 160 : // And compute the data substract the mean 161 4 : readInputLine( getShortcutLabel() + "_diff: CUSTOM ARG=" + argn + "_data," + getShortcutLabel() + "_averages FUNC=(x-y) PERIODIC=NO"); 162 4 : readInputLine( getShortcutLabel() + "_wdiff: CUSTOM ARG=" + getShortcutLabel() + "_wmat," + getShortcutLabel() + "_diff FUNC=sqrt(x)*y PERIODIC=NO"); 163 : // And the covariance 164 4 : readInputLine( getShortcutLabel() + "_wdiffT: TRANSPOSE ARG=" + getShortcutLabel() + "_wdiff"); 165 4 : readInputLine( getShortcutLabel() + "_covar: MATRIX_PRODUCT ARG=" + getShortcutLabel() + "_wdiffT," + getShortcutLabel() + "_wdiff"); 166 : // Read the dimensionality of the low dimensional space 167 : unsigned ndim; 168 2 : parse("NLOW_DIM",ndim); 169 2 : std::string vecstr="1"; 170 2 : if( ndim<=0 || ndim>nones ) { 171 0 : error("cannot generate projection in space of dimension higher than input coordinates"); 172 : } 173 4 : for(unsigned i=1; i<ndim; ++i) { 174 : std::string num; 175 2 : Tools::convert( i+1, num ); 176 4 : vecstr += "," + num; 177 : } 178 4 : readInputLine( getShortcutLabel() + "_eig: DIAGONALIZE ARG=" + getShortcutLabel() + "_covar VECTORS=" + vecstr ); 179 : // Now create a matrix to hold the output data 180 2 : std::string outd = "ARG=" + getShortcutLabel() + "_mean"; 181 6 : for(unsigned i=0; i<ndim; ++i) { 182 : std::string num; 183 4 : Tools::convert( i+1, num ); 184 8 : outd += "," + getShortcutLabel() + "_eig.vecs-" + num; 185 : } 186 4 : readInputLine( getShortcutLabel() + "_pcaT: VSTACK " + outd ); 187 4 : readInputLine( getShortcutLabel() + "_pca: TRANSPOSE ARG=" + getShortcutLabel() + "_pcaT"); 188 : // And output it all 189 : std::string filename, pstride; 190 2 : parse("STRIDE",pstride); 191 4 : parse("FILE",filename); 192 2 : if( filename.length()>0 && av2->getName()=="VSTACK" ) { 193 : std::vector<std::string> argnames; 194 1 : av2->getMatrixColumnTitles( argnames ); 195 1 : std::string argname_str=argnames[0]; 196 2 : for(unsigned i=1; i<argnames.size(); ++i) { 197 2 : argname_str += "," + argnames[i]; 198 : } 199 : std::string fmt; 200 2 : parse("FMT",fmt); 201 1 : if( fmt.length()>0 ) { 202 2 : fmt=" FMT=" + fmt; 203 : } 204 2 : readInputLine("DUMPPDB DESCRIPTION=PCA ARG_NAMES=" + argname_str + " ARG=" + getShortcutLabel() + "_pca FILE=" + filename + " STRIDE=" + pstride + fmt ); 205 1 : } else { 206 1 : if( av2->getName()!="COLLECT" ) { 207 0 : error("input data should be VSTACK if list of arguments of COLLECT if atom positions"); 208 : } 209 1 : ActionAtomistic* rmsdact = plumed.getActionSet().selectWithLabel<ActionAtomistic*>( argn + "_getposx" ); 210 1 : if( !rmsdact ) { 211 0 : error("could not find action that gets positions from trajectory for RMSD"); 212 : } 213 1 : std::vector<AtomNumber> atoms( rmsdact->getAbsoluteIndexes() ); 214 : std::string indices; 215 1 : Tools::convert( atoms[0].serial(), indices ); 216 22 : for(unsigned i=1; i<atoms.size(); ++i) { 217 : std::string jnum; 218 21 : Tools::convert( atoms[i].serial(), jnum ); 219 42 : indices += "," + jnum; 220 : } 221 2 : readInputLine("DUMPPDB DESCRIPTION=PCA ATOM_INDICES=" + indices + " ATOMS=" + getShortcutLabel() + "_pca FILE=" + filename + " STRIDE=" + pstride ); 222 : } 223 4 : outd = "ARG=" + getShortcutLabel() + "_eig.vecs-1"; 224 4 : for(unsigned i=1; i<ndim; ++i) { 225 : std::string num; 226 2 : Tools::convert( i+1, num ); 227 4 : outd += "," + getShortcutLabel() + "_eig.vecs-" + num; 228 : } 229 4 : readInputLine( getShortcutLabel() + "_eigv: VSTACK " + outd ); 230 4 : readInputLine( getShortcutLabel() + ": MATRIX_PRODUCT ARG=" + getShortcutLabel() + "_diff," + getShortcutLabel() + "_eigv"); 231 2 : } 232 : 233 : } 234 : }