LCOV - code coverage report
Current view: top level - analysis - PCA.cpp (source / functions) Hit Total Coverage
Test: plumed test coverage Lines: 80 81 98.8 %
Date: 2018-12-19 07:49:13 Functions: 10 13 76.9 %

          Line data    Source code
       1             : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
       2             :    Copyright (c) 2016-2018 The plumed team
       3             :    (see the PEOPLE file at the root of the distribution for a list of names)
       4             : 
       5             :    See http://www.plumed.org for more information.
       6             : 
       7             :    This file is part of plumed, version 2.
       8             : 
       9             :    plumed is free software: you can redistribute it and/or modify
      10             :    it under the terms of the GNU Lesser General Public License as published by
      11             :    the Free Software Foundation, either version 3 of the License, or
      12             :    (at your option) any later version.
      13             : 
      14             :    plumed is distributed in the hope that it will be useful,
      15             :    but WITHOUT ANY WARRANTY; without even the implied warranty of
      16             :    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
      17             :    GNU Lesser General Public License for more details.
      18             : 
      19             :    You should have received a copy of the GNU Lesser General Public License
      20             :    along with plumed.  If not, see <http://www.gnu.org/licenses/>.
      21             : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
      22             : #include "Analysis.h"
      23             : #include "tools/Matrix.h"
      24             : #include "reference/Direction.h"
      25             : #include "reference/MetricRegister.h"
      26             : #include "reference/ReferenceConfiguration.h"
      27             : #include "reference/ReferenceValuePack.h"
      28             : #include "core/ActionRegister.h"
      29             : 
      30             : //+PLUMEDOC DIMRED PCA
      31             : /*
      32             : Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.
      33             : 
      34             : Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of
      35             : poorly correlated variables into a set of linearly uncorrelated variables.  You can read more about the specifics of this technique
      36             : here: https://en.wikipedia.org/wiki/Principal_component_analysis
      37             : 
      38             : When used with molecular dynamics simulations a set of frames taken from the trajectory, \f$\{X_i\}\f$, or the values of
      39             : a number of collective variables which are calculated from the trajectory frames are used as input.  In this second instance your
      40             : input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables.  However, if
      41             : collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that
      42             : this input trajectory is a set of poorly correlated (high-dimensional) vectors.  After principal component analysis has been
      43             : performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen.
      44             : In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory.
      45             : These output directions are some linear combination of the \f$x\f$, \f$y\f$ and \f$z\f$ positions if the positions were used as input
      46             : or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input.
      47             : 
      48             : As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates.  In other words, you must
      49             : calculate the average structure and the amount the system fluctuates around this average structure.  The problem in doing so when the
      50             : \f$x\f$, \f$y\f$ and \f$z\f$ coordinates of a molecule are used as input is that the majority of the changes in the positions of the
      51             : atoms comes from the translational and rotational degrees of freedom of the molecule.  The first six principal components will thus, most likely,
      52             : be uninteresting.  Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures
      53             : to be analysed to the first frame in the trajectory.  This can be used to effectively remove translational and/or rotational motions from
      54             : consideration.  The resulting principal components thus describe vibrational motions of the molecule.
      55             : 
      56             : If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be
      57             : used as input for the \ref PCAVARS action.
      58             : 
      59             : \par Examples
      60             : 
      61             : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions
      62             : of the first 22 atoms.  The TYPE=OPTIMAL instruction ensures that translational and rotational degrees of freedom are removed from consideration.
      63             : The first two principal components will be output to a file called pca-comp.pdb.  Trajectory frames will be collected on every step and the PCA calculation
      64             : will be performed at the end of the simulation.
      65             : 
      66             : \verbatim
      67             : PCA METRIC=OPTIMAL ATOMS=1-22 STRIDE=1 NLOW_DIM=2 OFILE=pca-comp.pdb
      68             : \endverbatim
      69             : 
      70             : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from chnages in the six distances
      71             : seen in the previous lines.  Notice that here the TYPE=EUCLIDEAN keyword is used to indicate that no alighment has to be done when calculating the various
      72             : elements of the covariance matrix from the input vectors.  In this calculation the first two principal components will be output to a file called pca-comp.pdb.
      73             : Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps.  Consequently, if you run a 2000 step simulation the
      74             : PCA analysis will be performed twice.  The REWEIGHT_BIAS keyword in this input tells PLUMED that rather that ascribing a weight of one to each of the frames
      75             : when calculating averages and covariances a reweighting should be performed based and each frames' weight in these calculations should be determined based on
      76             : the current value of the instantaneous bias (see \ref REWEIGHT_BIAS).
      77             : 
      78             : \verbatim
      79             : d1: DISTANCE ATOMS=1,2
      80             : d2: DISTANCE ATOMS=1,3
      81             : d3: DISTANCE ATOMS=1,4
      82             : d4: DISTANCE ATOMS=2,3
      83             : d5: DISTANCE ATOMS=2,4
      84             : d6: DISTANCE ATOMS=3,4
      85             : 
      86             : PCA ARG=d1,d2,d3,d4,d5,d6 METRIC=EUCLIDEAN STRIDE=5 RUN=1000 NLOW_DIM=2 REWEIGHT_BIAS OFILE=pca-comp.pdb
      87             : \endverbatim
      88             : 
      89             : */
      90             : //+ENDPLUMEDOC
      91             : 
      92             : namespace PLMD {
      93             : namespace analysis {
      94             : 
      95             : class PCA : public Analysis {
      96             : private:
      97             :   unsigned ndim;
      98             : /// The position of the reference configuration (the one we align to)
      99             :   ReferenceConfiguration* myref;
     100             : /// The eigenvectors for the atomic displacements
     101             :   Matrix<Vector> atom_eigv;
     102             : /// The eigenvectors for the displacements in argument space
     103             :   Matrix<double> arg_eigv;
     104             :   std::string ofilename;
     105             : public:
     106             :   static void registerKeywords( Keywords& keys );
     107             :   explicit PCA(const ActionOptions&ao);
     108             :   ~PCA();
     109             :   void performAnalysis();
     110           0 :   void performTask( const unsigned&, const unsigned&, MultiValue& ) const { plumed_error(); }
     111             : };
     112             : 
     113        2525 : PLUMED_REGISTER_ACTION(PCA,"PCA")
     114             : 
     115           3 : void PCA::registerKeywords( Keywords& keys ) {
     116           3 :   Analysis::registerKeywords( keys );
     117           3 :   keys.add("compulsory","NLOW_DIM","number of PCA coordinates required");
     118           3 :   keys.add("compulsory","OFILE","the file on which to output the eigenvectors");
     119           3 : }
     120             : 
     121           2 : PCA::PCA(const ActionOptions&ao):
     122           2 :   PLUMED_ANALYSIS_INIT(ao)
     123             : {
     124             :   // Setup reference configuration
     125           2 :   log.printf("  performing PCA analysis using %s metric \n", getMetricName().c_str() );
     126           2 :   myref = metricRegister().create<ReferenceConfiguration>( getMetricName() );
     127           2 :   std::vector<std::string> argnames( getNumberOfArguments() );
     128           4 :   for(unsigned i=0; i<argnames.size(); ++i) {
     129           2 :     if( getArguments()[i]->isPeriodic() ) error("cannot run PCA with periodic variables");
     130           2 :     argnames[i] = getArguments()[i]->getName();
     131             :   }
     132           2 :   myref->setNamesAndAtomNumbers( getAbsoluteIndexes(), argnames );
     133             : 
     134           2 :   parse("NLOW_DIM",ndim);
     135           2 :   if( getNumberOfAtoms()>0 ) atom_eigv.resize( ndim, getNumberOfAtoms() );
     136           2 :   if( getNumberOfArguments()>0 ) arg_eigv.resize( ndim, getNumberOfArguments() );
     137             : 
     138             :   // Read stuff for output file
     139           2 :   parseOutputFile("OFILE",ofilename);
     140           2 :   checkRead();
     141           2 : }
     142             : 
     143           8 : PCA::~PCA() {
     144           2 :   delete myref;
     145           6 : }
     146             : 
     147           2 : void PCA::performAnalysis() {
     148             :   // Align everything to the first frame
     149           2 :   MultiValue myval( 1, getNumberOfArguments() + 3*getNumberOfAtoms() + 9 );
     150           4 :   ReferenceValuePack mypack( getNumberOfArguments(), getNumberOfAtoms(), myval );
     151           2 :   for(unsigned i=0; i<getNumberOfAtoms(); ++i) mypack.setAtomIndex( i, i );
     152             :   // Setup some PCA storage
     153           2 :   data[0]->setupPCAStorage ( mypack );
     154             : 
     155             :   // Create some arrays to store the average position
     156           4 :   std::vector<double> sarg( getNumberOfArguments(), 0 );
     157           4 :   std::vector<Vector> spos( getNumberOfAtoms() );
     158           2 :   for(unsigned i=0; i<getNumberOfAtoms(); ++i) spos[i].zero();
     159             : 
     160             :   // Calculate the average displacement from the first frame
     161           2 :   double norm=getWeight(0);
     162        1092 :   for(unsigned i=1; i<getNumberOfDataPoints(); ++i) {
     163        1090 :     double d = data[0]->calc( data[i]->getReferencePositions(), getPbc(), getArguments(), data[i]->getReferenceArguments(), mypack, true );
     164             :     // Accumulate average displacement of arguments (Here PBC could do fucked up things - really needs Berry Phase ) GAT
     165        1090 :     for(unsigned j=0; j<getNumberOfArguments(); ++j) sarg[j] += 0.5*getWeight(i)*mypack.getArgumentDerivative(j);
     166             :     // Accumulate average displacement of position
     167        1090 :     for(unsigned j=0; j<getNumberOfAtoms(); ++j) spos[j] += getWeight(i)*mypack.getAtomsDisplacementVector()[j];
     168        1090 :     norm += getWeight(i);
     169             :   }
     170             :   // Now normalise the displacements to get the average and add these to the first frame
     171           2 :   double inorm = 1.0 / norm ;
     172           2 :   for(unsigned j=0; j<getNumberOfArguments(); ++j) sarg[j] = inorm*sarg[j] + data[0]->getReferenceArguments()[j];
     173           2 :   for(unsigned j=0; j<getNumberOfAtoms(); ++j) spos[j] = inorm*spos[j] + data[0]->getReferencePositions()[j];
     174             :   // And set the reference configuration
     175           4 :   std::vector<double> empty( getNumberOfArguments(), 1.0 ); myref->setReferenceConfig( spos, sarg, empty );
     176             : 
     177             :   // Now accumulate the covariance
     178           2 :   unsigned narg=getNumberOfArguments();
     179           4 :   Matrix<double> covar( getNumberOfArguments()+3*getNumberOfAtoms(), getNumberOfArguments()+3*getNumberOfAtoms() ); covar=0;
     180        1094 :   for(unsigned i=0; i<getNumberOfDataPoints(); ++i) {
     181             :     // double d = data[i]->calc( spos, getPbc(), getArguments(), sarg, mypack, true );
     182        1092 :     double d = data[0]->calc( data[i]->getReferencePositions(), getPbc(), getArguments(), data[i]->getReferenceArguments(), mypack, true );
     183        2184 :     for(unsigned jarg=0; jarg<getNumberOfArguments(); ++jarg) {
     184             :       // Need sorting for PBC with GAT
     185        1092 :       double jarg_d = 0.5*mypack.getArgumentDerivative(jarg) + data[0]->getReferenceArguments()[jarg] - sarg[jarg];
     186        3276 :       for(unsigned karg=0; karg<getNumberOfArguments(); ++karg) {
     187             :         // Need sorting for PBC with GAT
     188        2184 :         double karg_d = 0.5*mypack.getArgumentDerivative(karg) + data[0]->getReferenceArguments()[karg] - sarg[karg];
     189        2184 :         covar( jarg, karg ) += 0.25*getWeight(i)*jarg_d*karg_d; // mypack.getArgumentDerivative(jarg)*mypack.getArgumentDerivative(karg);
     190             :       }
     191             :     }
     192       13104 :     for(unsigned jat=0; jat<getNumberOfAtoms(); ++jat) {
     193       48048 :       for(unsigned jc=0; jc<3; ++jc) {
     194       36036 :         double jdisplace = mypack.getAtomsDisplacementVector()[jat][jc] + data[0]->getReferencePositions()[jat][jc] - spos[jat][jc];
     195      828828 :         for(unsigned kat=0; kat<getNumberOfAtoms(); ++kat) {
     196     3171168 :           for(unsigned kc=0; kc<3; ++kc) {
     197     2378376 :             double kdisplace = mypack.getAtomsDisplacementVector()[kat][kc] + data[0]->getReferencePositions()[kat][kc] - spos[kat][kc];
     198     2378376 :             covar( narg+3*jat + jc, narg+3*kat + kc ) += getWeight(i)*jdisplace*kdisplace;
     199             :           }
     200             :         }
     201             :       }
     202             :     }
     203             :   }
     204             :   // Normalise
     205          70 :   for(unsigned i=0; i<covar.nrows(); ++i) {
     206          68 :     for(unsigned j=0; j<covar.ncols(); ++j) covar(i,j) *= inorm;
     207             :   }
     208             : 
     209             :   // Diagonalise the covariance
     210           4 :   std::vector<double> eigval( getNumberOfArguments()+3*getNumberOfAtoms() );
     211           4 :   Matrix<double> eigvec( getNumberOfArguments()+3*getNumberOfAtoms(), getNumberOfArguments()+3*getNumberOfAtoms() );
     212           2 :   diagMat( covar, eigval, eigvec );
     213             : 
     214             :   // Open an output file
     215           4 :   OFile ofile; ofile.link(*this); ofile.setBackupString("analysis");
     216           2 :   ofile.open( ofilename );
     217             :   // Output the reference configuration
     218           2 :   myref->print( ofile, getOutputFormat(), atoms.getUnits().getLength()/0.1 );
     219             : 
     220             :   // Store and print the eigenvectors
     221           4 :   std::vector<Vector> tmp_atoms( getNumberOfAtoms() );
     222           4 :   std::vector<double> tmp_args( getNumberOfArguments() );
     223           2 :   Direction* tref = metricRegister().create<Direction>( "DIRECTION" );
     224           2 :   tref->setNamesAndAtomNumbers( getAbsoluteIndexes(), argument_names );
     225           6 :   for(unsigned dim=0; dim<ndim; ++dim) {
     226           4 :     unsigned idim = covar.ncols() - 1 - dim;
     227           4 :     for(unsigned i=0; i<getNumberOfArguments(); ++i) tmp_args[i]=arg_eigv(dim,i)=eigvec(idim,i);
     228          48 :     for(unsigned i=0; i<getNumberOfAtoms(); ++i) {
     229          44 :       for(unsigned k=0; k<3; ++k) tmp_atoms[i][k]=atom_eigv(dim,i)[k]=eigvec(idim,narg+3*i+k);
     230             :     }
     231           4 :     tref->setDirection( tmp_atoms, tmp_args );
     232           4 :     tref->print( ofile, getOutputFormat(), atoms.getUnits().getLength()/0.1 );
     233             :   }
     234             :   // Close the output file
     235           4 :   delete tref; ofile.close();
     236           2 : }
     237             : 
     238             : }
     239        2523 : }

Generated by: LCOV version 1.13