LCOV - code coverage report
Current view: top level - dimred - PCA.cpp (source / functions) Hit Total Coverage
Test: plumed test coverage Lines: 81 86 94.2 %
Date: 2025-12-04 11:19:34 Functions: 2 3 66.7 %

          Line data    Source code
       1             : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
       2             :    Copyright (c) 2016-2023 The plumed team
       3             :    (see the PEOPLE file at the root of the distribution for a list of names)
       4             : 
       5             :    See http://www.plumed.org for more information.
       6             : 
       7             :    This file is part of plumed, version 2.
       8             : 
       9             :    plumed is free software: you can redistribute it and/or modify
      10             :    it under the terms of the GNU Lesser General Public License as published by
      11             :    the Free Software Foundation, either version 3 of the License, or
      12             :    (at your option) any later version.
      13             : 
      14             :    plumed is distributed in the hope that it will be useful,
      15             :    but WITHOUT ANY WARRANTY; without even the implied warranty of
      16             :    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
      17             :    GNU Lesser General Public License for more details.
      18             : 
      19             :    You should have received a copy of the GNU Lesser General Public License
      20             :    along with plumed.  If not, see <http://www.gnu.org/licenses/>.
      21             : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
      22             : #include "core/ActionShortcut.h"
      23             : #include "core/ActionRegister.h"
      24             : #include "core/ActionWithValue.h"
      25             : #include "core/ActionPilot.h"
      26             : #include "core/ActionAtomistic.h"
      27             : #include "core/PlumedMain.h"
      28             : #include "core/ActionSet.h"
      29             : 
      30             : //+PLUMEDOC DIMRED PCA
      31             : /*
      32             : Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.
      33             : 
      34             : Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of
      35             : poorly correlated variables into a set of linearly uncorrelated variables.  You can read more about the specifics of this technique
      36             : [here](https://en.wikipedia.org/wiki/Principal_component_analysis)
      37             : 
      38             : When used with molecular dynamics simulations a set of frames taken from the trajectory, $\{X_i\}$, or the values of
      39             : a number of collective variables which are calculated from the trajectory frames are used as input.  In this second instance your
      40             : input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables.  However, if
      41             : collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that
      42             : this input trajectory is a set of poorly correlated (high-dimensional) vectors.  After principal component analysis has been
      43             : performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen.
      44             : In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory.
      45             : These output directions are some linear combination of the $x$, $y$ and $z$ positions if the positions were used as input
      46             : or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input.
      47             : 
      48             : As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates.  In other words, you must
      49             : calculate the average structure and the amount the system fluctuates around this average structure.  The problem in doing so when the
      50             : $x$, $y$ and $z$ coordinates of a molecule are used as input is that the majority of the changes in the positions of the
      51             : atoms comes from the translational and rotational degrees of freedom of the molecule.  The first six principal components will thus, most likely,
      52             : be uninteresting.  Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures
      53             : to be analyzed to the first frame in the trajectory.  This can be used to effectively remove translational and/or rotational motions from
      54             : consideration.  The resulting principal components thus describe vibrational motions of the molecule.
      55             : 
      56             : If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be
      57             : used as input for the [PCAVARS](PCAVARS.md) action.
      58             : 
      59             : ## Examples
      60             : 
      61             : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions
      62             : of the first 22 atoms. The `TYPE=OPTIMAL` instruction ensures that translational and rotational degrees of freedom are removed from consideration.
      63             : The average position and the first two principal components will be output to a file called `pca-comp.pdb`. Trajectory frames will be collected on every step and the PCA calculation
      64             : will be performed at the end of the simulation. The `colvar` file that is output contains the projections of all the positions in the high dimensional space on these vectors.
      65             : 
      66             : ```plumed
      67             : ff: COLLECT_FRAMES ATOMS=1-22 STRIDE=1
      68             : pca: PCA ARG=ff NLOW_DIM=2 FILE=pca-comp.pdb
      69             : DUMPVECTOR ARG=pca,pca_weights FILE=colvar STRIDE=0
      70             : ```
      71             : 
      72             : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the six distances
      73             : seen in the input file below.  In this calculation the first two principal components will be output to a file called PCA-comp.pdb.
      74             : Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps. Consequently, if you run a 2000 step simulation the
      75             : PCA analysis will be performed twice. The REWEIGHT_BIAS action in this input tells PLUMED that rather that ascribing a weight of one to each of the frames
      76             : when calculating averages and covariance matrices a reweighting should be performed based and each frames' weight in these calculations should be determined based on
      77             : the current value of the instantaneous bias (see [REWEIGHT_BIAS](REWEIGHT_BIAS.md)).
      78             : 
      79             : ```plumed
      80             : d1: DISTANCE ATOMS=1,2
      81             : d2: DISTANCE ATOMS=1,3
      82             : d3: DISTANCE ATOMS=1,4
      83             : d4: DISTANCE ATOMS=2,3
      84             : d5: DISTANCE ATOMS=2,4
      85             : d6: DISTANCE ATOMS=3,4
      86             : rr: RESTRAINT ARG=d1 AT=0.1 KAPPA=10
      87             : rbias: REWEIGHT_BIAS TEMP=300
      88             : 
      89             : ff: COLLECT_FRAMES ARG=d1,d2,d3,d4,d5,d6 LOGWEIGHTS=rbias STRIDE=5
      90             : pca: PCA ARG=ff NLOW_DIM=2 FILE=pca-comp.pdb
      91             : DUMPVECTOR ARG=pca,pca_weights FILE=colvar STRIDE=1000
      92             : ```
      93             : 
      94             : */
      95             : //+ENDPLUMEDOC
      96             : 
      97             : namespace PLMD {
      98             : namespace dimred {
      99             : 
     100             : class PCA : public ActionShortcut {
     101             : public:
     102             :   static void registerKeywords( Keywords& keys );
     103             :   PCA( const ActionOptions& );
     104             : };
     105             : 
     106             : PLUMED_REGISTER_ACTION(PCA,"PCA")
     107             : 
     108           4 : void PCA::registerKeywords( Keywords& keys ) {
     109           4 :   ActionShortcut::registerKeywords( keys );
     110           4 :   keys.add("compulsory","ARG","the arguments that you would like to make the histogram for");
     111           4 :   keys.add("compulsory","NLOW_DIM","number of low-dimensional coordinates required");
     112           4 :   keys.add("compulsory","STRIDE","0","the frequency with which to perform this analysis");
     113           4 :   keys.add("optional","FILE","the file on which to output the low dimensional coordinates");
     114           4 :   keys.add("compulsory","FMT","%f","the format to use when outputting the low dimensional coordinates");
     115           8 :   keys.setValueDescription("matrix","the projections of the input coordinates on the PCA components that were found from the covariance matrix");
     116           4 :   keys.needsAction("LOGSUMEXP");
     117           4 :   keys.needsAction("TRANSPOSE");
     118           4 :   keys.needsAction("MATRIX_VECTOR_PRODUCT");
     119           4 :   keys.needsAction("CONSTANT");
     120           4 :   keys.needsAction("COLLECT");
     121           4 :   keys.needsAction("OUTER_PRODUCT");
     122           4 :   keys.needsAction("CUSTOM");
     123           4 :   keys.needsAction("MATRIX_PRODUCT");
     124           4 :   keys.needsAction("DIAGONALIZE");
     125           4 :   keys.needsAction("VSTACK");
     126           4 :   keys.needsAction("DUMPPDB");
     127           4 : }
     128             : 
     129             : 
     130           2 : PCA::PCA(const ActionOptions&ao):
     131             :   Action(ao),
     132           2 :   ActionShortcut(ao) {
     133             :   // Find the argument name
     134             :   std::string argn;
     135           2 :   parse("ARG",argn);
     136           2 :   ActionShortcut* as = plumed.getActionSet().getShortcutActionWithLabel( argn );
     137           2 :   if( !as || as->getName()!="COLLECT_FRAMES" ) {
     138           0 :     error("found no COLLECT_FRAMES action with label " + argn );
     139             :   }
     140             :   // Get the final weights using the logsumexp trick
     141           4 :   readInputLine( getShortcutLabel() + "_weights: LOGSUMEXP ARG=" + argn + "_logweights");
     142             :   // Now transpose the collected frames
     143           4 :   readInputLine( getShortcutLabel() + "_dataT: TRANSPOSE ARG=" + argn + "_data");
     144             :   // And multiply the transpose by the weights to get the averages
     145           4 :   readInputLine( getShortcutLabel() + "_mean: MATRIX_VECTOR_PRODUCT ARG=" + getShortcutLabel() + "_dataT," + getShortcutLabel() + "_weights");
     146             :   // Make a matrix of averages
     147           4 :   readInputLine( getShortcutLabel() + "_averages: OUTER_PRODUCT ARG=" + argn + "_ones," + getShortcutLabel() + "_mean");
     148             :   // Make a matrix of weights
     149           2 :   ActionWithValue* av2 = plumed.getActionSet().selectWithLabel<ActionWithValue*>( argn + "_data" );
     150           2 :   if( !av2 ) {
     151           0 :     error("count not find data");
     152             :   }
     153           2 :   unsigned nones = (av2->copyOutput(0))->getShape()[1];
     154           2 :   std::string ones="1";
     155          68 :   for(unsigned i=1; i<nones; ++i) {
     156             :     ones += ",1";
     157             :   }
     158           4 :   readInputLine( getShortcutLabel() + "_wones: CONSTANT VALUES=" + ones );
     159           4 :   readInputLine( getShortcutLabel() + "_wmat: OUTER_PRODUCT ARG=" + getShortcutLabel() + "_weights," + getShortcutLabel() + "_wones");
     160             :   // And compute the data substract the mean
     161           4 :   readInputLine( getShortcutLabel() + "_diff: CUSTOM ARG=" + argn + "_data," + getShortcutLabel() + "_averages FUNC=(x-y) PERIODIC=NO");
     162           4 :   readInputLine( getShortcutLabel() + "_wdiff: CUSTOM ARG=" + getShortcutLabel() + "_wmat," + getShortcutLabel() + "_diff FUNC=sqrt(x)*y PERIODIC=NO");
     163             :   // And the covariance
     164           4 :   readInputLine( getShortcutLabel() + "_wdiffT: TRANSPOSE ARG=" + getShortcutLabel() + "_wdiff");
     165           4 :   readInputLine( getShortcutLabel() + "_covar: MATRIX_PRODUCT ARG=" + getShortcutLabel() + "_wdiffT," + getShortcutLabel() + "_wdiff");
     166             :   // Read the dimensionality of the low dimensional space
     167             :   unsigned ndim;
     168           2 :   parse("NLOW_DIM",ndim);
     169           2 :   std::string vecstr="1";
     170           2 :   if( ndim<=0 || ndim>nones ) {
     171           0 :     error("cannot generate projection in space of dimension higher than input coordinates");
     172             :   }
     173           4 :   for(unsigned i=1; i<ndim; ++i) {
     174             :     std::string num;
     175           2 :     Tools::convert( i+1, num );
     176           4 :     vecstr += "," + num;
     177             :   }
     178           4 :   readInputLine( getShortcutLabel() + "_eig: DIAGONALIZE ARG=" + getShortcutLabel() + "_covar VECTORS=" + vecstr );
     179             :   // Now create a matrix to hold the output data
     180           2 :   std::string outd = "ARG=" + getShortcutLabel() + "_mean";
     181           6 :   for(unsigned i=0; i<ndim; ++i) {
     182             :     std::string num;
     183           4 :     Tools::convert( i+1, num );
     184           8 :     outd += "," + getShortcutLabel() + "_eig.vecs-" + num;
     185             :   }
     186           4 :   readInputLine( getShortcutLabel() + "_pcaT: VSTACK " + outd );
     187           4 :   readInputLine( getShortcutLabel() + "_pca: TRANSPOSE ARG=" + getShortcutLabel() + "_pcaT");
     188             :   // And output it all
     189             :   std::string filename, pstride;
     190           2 :   parse("STRIDE",pstride);
     191           4 :   parse("FILE",filename);
     192           2 :   if( filename.length()>0 && av2->getName()=="VSTACK" ) {
     193             :     std::vector<std::string> argnames;
     194           1 :     av2->getMatrixColumnTitles( argnames );
     195           1 :     std::string argname_str=argnames[0];
     196           2 :     for(unsigned i=1; i<argnames.size(); ++i) {
     197           2 :       argname_str += "," + argnames[i];
     198             :     }
     199             :     std::string fmt;
     200           2 :     parse("FMT",fmt);
     201           1 :     if( fmt.length()>0 ) {
     202           2 :       fmt=" FMT=" + fmt;
     203             :     }
     204           2 :     readInputLine("DUMPPDB DESCRIPTION=PCA ARG_NAMES=" + argname_str + " ARG=" + getShortcutLabel() + "_pca FILE=" + filename + " STRIDE=" + pstride + fmt );
     205           1 :   } else {
     206           1 :     if( av2->getName()!="COLLECT" ) {
     207           0 :       error("input data should be VSTACK if list of arguments of COLLECT if atom positions");
     208             :     }
     209           1 :     ActionAtomistic* rmsdact = plumed.getActionSet().selectWithLabel<ActionAtomistic*>( argn + "_getposx" );
     210           1 :     if( !rmsdact ) {
     211           0 :       error("could not find action that gets positions from trajectory for RMSD");
     212             :     }
     213           1 :     std::vector<AtomNumber> atoms( rmsdact->getAbsoluteIndexes() );
     214             :     std::string indices;
     215           1 :     Tools::convert( atoms[0].serial(), indices );
     216          22 :     for(unsigned i=1; i<atoms.size(); ++i) {
     217             :       std::string jnum;
     218          21 :       Tools::convert( atoms[i].serial(), jnum );
     219          42 :       indices += "," + jnum;
     220             :     }
     221           2 :     readInputLine("DUMPPDB DESCRIPTION=PCA ATOM_INDICES=" + indices + " ATOMS=" + getShortcutLabel() + "_pca FILE=" + filename + " STRIDE=" + pstride );
     222             :   }
     223           4 :   outd = "ARG=" + getShortcutLabel() + "_eig.vecs-1";
     224           4 :   for(unsigned i=1; i<ndim; ++i) {
     225             :     std::string num;
     226           2 :     Tools::convert( i+1, num );
     227           4 :     outd += "," + getShortcutLabel() + "_eig.vecs-" + num;
     228             :   }
     229           4 :   readInputLine( getShortcutLabel() + "_eigv: VSTACK " + outd );
     230           4 :   readInputLine( getShortcutLabel() + ": MATRIX_PRODUCT ARG=" + getShortcutLabel() + "_diff," + getShortcutLabel() + "_eigv");
     231           2 : }
     232             : 
     233             : }
     234             : }

Generated by: LCOV version 1.16