Line data Source code
1 : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 : Copyright (c) 2016-2018 The plumed team
3 : (see the PEOPLE file at the root of the distribution for a list of names)
4 :
5 : See http://www.plumed.org for more information.
6 :
7 : This file is part of plumed, version 2.
8 :
9 : plumed is free software: you can redistribute it and/or modify
10 : it under the terms of the GNU Lesser General Public License as published by
11 : the Free Software Foundation, either version 3 of the License, or
12 : (at your option) any later version.
13 :
14 : plumed is distributed in the hope that it will be useful,
15 : but WITHOUT ANY WARRANTY; without even the implied warranty of
16 : MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17 : GNU Lesser General Public License for more details.
18 :
19 : You should have received a copy of the GNU Lesser General Public License
20 : along with plumed. If not, see <http://www.gnu.org/licenses/>.
21 : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
22 : #include "Analysis.h"
23 : #include "tools/Matrix.h"
24 : #include "reference/Direction.h"
25 : #include "reference/MetricRegister.h"
26 : #include "reference/ReferenceConfiguration.h"
27 : #include "reference/ReferenceValuePack.h"
28 : #include "core/ActionRegister.h"
29 :
30 : //+PLUMEDOC DIMRED PCA
31 : /*
32 : Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.
33 :
34 : Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of
35 : poorly correlated variables into a set of linearly uncorrelated variables. You can read more about the specifics of this technique
36 : here: https://en.wikipedia.org/wiki/Principal_component_analysis
37 :
38 : When used with molecular dynamics simulations a set of frames taken from the trajectory, \f$\{X_i\}\f$, or the values of
39 : a number of collective variables which are calculated from the trajectory frames are used as input. In this second instance your
40 : input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables. However, if
41 : collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that
42 : this input trajectory is a set of poorly correlated (high-dimensional) vectors. After principal component analysis has been
43 : performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen.
44 : In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory.
45 : These output directions are some linear combination of the \f$x\f$, \f$y\f$ and \f$z\f$ positions if the positions were used as input
46 : or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input.
47 :
48 : As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates. In other words, you must
49 : calculate the average structure and the amount the system fluctuates around this average structure. The problem in doing so when the
50 : \f$x\f$, \f$y\f$ and \f$z\f$ coordinates of a molecule are used as input is that the majority of the changes in the positions of the
51 : atoms comes from the translational and rotational degrees of freedom of the molecule. The first six principal components will thus, most likely,
52 : be uninteresting. Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures
53 : to be analysed to the first frame in the trajectory. This can be used to effectively remove translational and/or rotational motions from
54 : consideration. The resulting principal components thus describe vibrational motions of the molecule.
55 :
56 : If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be
57 : used as input for the \ref PCAVARS action.
58 :
59 : \par Examples
60 :
61 : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions
62 : of the first 22 atoms. The TYPE=OPTIMAL instruction ensures that translational and rotational degrees of freedom are removed from consideration.
63 : The first two principal components will be output to a file called pca-comp.pdb. Trajectory frames will be collected on every step and the PCA calculation
64 : will be performed at the end of the simulation.
65 :
66 : \verbatim
67 : PCA METRIC=OPTIMAL ATOMS=1-22 STRIDE=1 NLOW_DIM=2 OFILE=pca-comp.pdb
68 : \endverbatim
69 :
70 : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from chnages in the six distances
71 : seen in the previous lines. Notice that here the TYPE=EUCLIDEAN keyword is used to indicate that no alighment has to be done when calculating the various
72 : elements of the covariance matrix from the input vectors. In this calculation the first two principal components will be output to a file called pca-comp.pdb.
73 : Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps. Consequently, if you run a 2000 step simulation the
74 : PCA analysis will be performed twice. The REWEIGHT_BIAS keyword in this input tells PLUMED that rather that ascribing a weight of one to each of the frames
75 : when calculating averages and covariances a reweighting should be performed based and each frames' weight in these calculations should be determined based on
76 : the current value of the instantaneous bias (see \ref REWEIGHT_BIAS).
77 :
78 : \verbatim
79 : d1: DISTANCE ATOMS=1,2
80 : d2: DISTANCE ATOMS=1,3
81 : d3: DISTANCE ATOMS=1,4
82 : d4: DISTANCE ATOMS=2,3
83 : d5: DISTANCE ATOMS=2,4
84 : d6: DISTANCE ATOMS=3,4
85 :
86 : PCA ARG=d1,d2,d3,d4,d5,d6 METRIC=EUCLIDEAN STRIDE=5 RUN=1000 NLOW_DIM=2 REWEIGHT_BIAS OFILE=pca-comp.pdb
87 : \endverbatim
88 :
89 : */
90 : //+ENDPLUMEDOC
91 :
92 : namespace PLMD {
93 : namespace analysis {
94 :
95 : class PCA : public Analysis {
96 : private:
97 : unsigned ndim;
98 : /// The position of the reference configuration (the one we align to)
99 : ReferenceConfiguration* myref;
100 : /// The eigenvectors for the atomic displacements
101 : Matrix<Vector> atom_eigv;
102 : /// The eigenvectors for the displacements in argument space
103 : Matrix<double> arg_eigv;
104 : std::string ofilename;
105 : public:
106 : static void registerKeywords( Keywords& keys );
107 : explicit PCA(const ActionOptions&ao);
108 : ~PCA();
109 : void performAnalysis();
110 0 : void performTask( const unsigned&, const unsigned&, MultiValue& ) const { plumed_error(); }
111 : };
112 :
113 2525 : PLUMED_REGISTER_ACTION(PCA,"PCA")
114 :
115 3 : void PCA::registerKeywords( Keywords& keys ) {
116 3 : Analysis::registerKeywords( keys );
117 3 : keys.add("compulsory","NLOW_DIM","number of PCA coordinates required");
118 3 : keys.add("compulsory","OFILE","the file on which to output the eigenvectors");
119 3 : }
120 :
121 2 : PCA::PCA(const ActionOptions&ao):
122 2 : PLUMED_ANALYSIS_INIT(ao)
123 : {
124 : // Setup reference configuration
125 2 : log.printf(" performing PCA analysis using %s metric \n", getMetricName().c_str() );
126 2 : myref = metricRegister().create<ReferenceConfiguration>( getMetricName() );
127 2 : std::vector<std::string> argnames( getNumberOfArguments() );
128 4 : for(unsigned i=0; i<argnames.size(); ++i) {
129 2 : if( getArguments()[i]->isPeriodic() ) error("cannot run PCA with periodic variables");
130 2 : argnames[i] = getArguments()[i]->getName();
131 : }
132 2 : myref->setNamesAndAtomNumbers( getAbsoluteIndexes(), argnames );
133 :
134 2 : parse("NLOW_DIM",ndim);
135 2 : if( getNumberOfAtoms()>0 ) atom_eigv.resize( ndim, getNumberOfAtoms() );
136 2 : if( getNumberOfArguments()>0 ) arg_eigv.resize( ndim, getNumberOfArguments() );
137 :
138 : // Read stuff for output file
139 2 : parseOutputFile("OFILE",ofilename);
140 2 : checkRead();
141 2 : }
142 :
143 8 : PCA::~PCA() {
144 2 : delete myref;
145 6 : }
146 :
147 2 : void PCA::performAnalysis() {
148 : // Align everything to the first frame
149 2 : MultiValue myval( 1, getNumberOfArguments() + 3*getNumberOfAtoms() + 9 );
150 4 : ReferenceValuePack mypack( getNumberOfArguments(), getNumberOfAtoms(), myval );
151 2 : for(unsigned i=0; i<getNumberOfAtoms(); ++i) mypack.setAtomIndex( i, i );
152 : // Setup some PCA storage
153 2 : data[0]->setupPCAStorage ( mypack );
154 :
155 : // Create some arrays to store the average position
156 4 : std::vector<double> sarg( getNumberOfArguments(), 0 );
157 4 : std::vector<Vector> spos( getNumberOfAtoms() );
158 2 : for(unsigned i=0; i<getNumberOfAtoms(); ++i) spos[i].zero();
159 :
160 : // Calculate the average displacement from the first frame
161 2 : double norm=getWeight(0);
162 1092 : for(unsigned i=1; i<getNumberOfDataPoints(); ++i) {
163 1090 : double d = data[0]->calc( data[i]->getReferencePositions(), getPbc(), getArguments(), data[i]->getReferenceArguments(), mypack, true );
164 : // Accumulate average displacement of arguments (Here PBC could do fucked up things - really needs Berry Phase ) GAT
165 1090 : for(unsigned j=0; j<getNumberOfArguments(); ++j) sarg[j] += 0.5*getWeight(i)*mypack.getArgumentDerivative(j);
166 : // Accumulate average displacement of position
167 1090 : for(unsigned j=0; j<getNumberOfAtoms(); ++j) spos[j] += getWeight(i)*mypack.getAtomsDisplacementVector()[j];
168 1090 : norm += getWeight(i);
169 : }
170 : // Now normalise the displacements to get the average and add these to the first frame
171 2 : double inorm = 1.0 / norm ;
172 2 : for(unsigned j=0; j<getNumberOfArguments(); ++j) sarg[j] = inorm*sarg[j] + data[0]->getReferenceArguments()[j];
173 2 : for(unsigned j=0; j<getNumberOfAtoms(); ++j) spos[j] = inorm*spos[j] + data[0]->getReferencePositions()[j];
174 : // And set the reference configuration
175 4 : std::vector<double> empty( getNumberOfArguments(), 1.0 ); myref->setReferenceConfig( spos, sarg, empty );
176 :
177 : // Now accumulate the covariance
178 2 : unsigned narg=getNumberOfArguments();
179 4 : Matrix<double> covar( getNumberOfArguments()+3*getNumberOfAtoms(), getNumberOfArguments()+3*getNumberOfAtoms() ); covar=0;
180 1094 : for(unsigned i=0; i<getNumberOfDataPoints(); ++i) {
181 : // double d = data[i]->calc( spos, getPbc(), getArguments(), sarg, mypack, true );
182 1092 : double d = data[0]->calc( data[i]->getReferencePositions(), getPbc(), getArguments(), data[i]->getReferenceArguments(), mypack, true );
183 2184 : for(unsigned jarg=0; jarg<getNumberOfArguments(); ++jarg) {
184 : // Need sorting for PBC with GAT
185 1092 : double jarg_d = 0.5*mypack.getArgumentDerivative(jarg) + data[0]->getReferenceArguments()[jarg] - sarg[jarg];
186 3276 : for(unsigned karg=0; karg<getNumberOfArguments(); ++karg) {
187 : // Need sorting for PBC with GAT
188 2184 : double karg_d = 0.5*mypack.getArgumentDerivative(karg) + data[0]->getReferenceArguments()[karg] - sarg[karg];
189 2184 : covar( jarg, karg ) += 0.25*getWeight(i)*jarg_d*karg_d; // mypack.getArgumentDerivative(jarg)*mypack.getArgumentDerivative(karg);
190 : }
191 : }
192 13104 : for(unsigned jat=0; jat<getNumberOfAtoms(); ++jat) {
193 48048 : for(unsigned jc=0; jc<3; ++jc) {
194 36036 : double jdisplace = mypack.getAtomsDisplacementVector()[jat][jc] + data[0]->getReferencePositions()[jat][jc] - spos[jat][jc];
195 828828 : for(unsigned kat=0; kat<getNumberOfAtoms(); ++kat) {
196 3171168 : for(unsigned kc=0; kc<3; ++kc) {
197 2378376 : double kdisplace = mypack.getAtomsDisplacementVector()[kat][kc] + data[0]->getReferencePositions()[kat][kc] - spos[kat][kc];
198 2378376 : covar( narg+3*jat + jc, narg+3*kat + kc ) += getWeight(i)*jdisplace*kdisplace;
199 : }
200 : }
201 : }
202 : }
203 : }
204 : // Normalise
205 70 : for(unsigned i=0; i<covar.nrows(); ++i) {
206 68 : for(unsigned j=0; j<covar.ncols(); ++j) covar(i,j) *= inorm;
207 : }
208 :
209 : // Diagonalise the covariance
210 4 : std::vector<double> eigval( getNumberOfArguments()+3*getNumberOfAtoms() );
211 4 : Matrix<double> eigvec( getNumberOfArguments()+3*getNumberOfAtoms(), getNumberOfArguments()+3*getNumberOfAtoms() );
212 2 : diagMat( covar, eigval, eigvec );
213 :
214 : // Open an output file
215 4 : OFile ofile; ofile.link(*this); ofile.setBackupString("analysis");
216 2 : ofile.open( ofilename );
217 : // Output the reference configuration
218 2 : myref->print( ofile, getOutputFormat(), atoms.getUnits().getLength()/0.1 );
219 :
220 : // Store and print the eigenvectors
221 4 : std::vector<Vector> tmp_atoms( getNumberOfAtoms() );
222 4 : std::vector<double> tmp_args( getNumberOfArguments() );
223 2 : Direction* tref = metricRegister().create<Direction>( "DIRECTION" );
224 2 : tref->setNamesAndAtomNumbers( getAbsoluteIndexes(), argument_names );
225 6 : for(unsigned dim=0; dim<ndim; ++dim) {
226 4 : unsigned idim = covar.ncols() - 1 - dim;
227 4 : for(unsigned i=0; i<getNumberOfArguments(); ++i) tmp_args[i]=arg_eigv(dim,i)=eigvec(idim,i);
228 48 : for(unsigned i=0; i<getNumberOfAtoms(); ++i) {
229 44 : for(unsigned k=0; k<3; ++k) tmp_atoms[i][k]=atom_eigv(dim,i)[k]=eigvec(idim,narg+3*i+k);
230 : }
231 4 : tref->setDirection( tmp_atoms, tmp_args );
232 4 : tref->print( ofile, getOutputFormat(), atoms.getUnits().getLength()/0.1 );
233 : }
234 : // Close the output file
235 4 : delete tref; ofile.close();
236 2 : }
237 :
238 : }
239 2523 : }
|