Line data Source code
1 : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 : Copyright (c) 2016-2023 The plumed team
3 : (see the PEOPLE file at the root of the distribution for a list of names)
4 :
5 : See http://www.plumed.org for more information.
6 :
7 : This file is part of plumed, version 2.
8 :
9 : plumed is free software: you can redistribute it and/or modify
10 : it under the terms of the GNU Lesser General Public License as published by
11 : the Free Software Foundation, either version 3 of the License, or
12 : (at your option) any later version.
13 :
14 : plumed is distributed in the hope that it will be useful,
15 : but WITHOUT ANY WARRANTY; without even the implied warranty of
16 : MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17 : GNU Lesser General Public License for more details.
18 :
19 : You should have received a copy of the GNU Lesser General Public License
20 : along with plumed. If not, see <http://www.gnu.org/licenses/>.
21 : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
22 : #include "PCA.h"
23 : #include "tools/Matrix.h"
24 : #include "reference/MetricRegister.h"
25 : #include "reference/ReferenceValuePack.h"
26 : #include "analysis/ReadAnalysisFrames.h"
27 : #include "core/ActionRegister.h"
28 :
29 : //+PLUMEDOC DIMRED PCA
30 : /*
31 : Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.
32 :
33 : Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of
34 : poorly correlated variables into a set of linearly uncorrelated variables. You can read more about the specifics of this technique
35 : here: https://en.wikipedia.org/wiki/Principal_component_analysis
36 :
37 : When used with molecular dynamics simulations a set of frames taken from the trajectory, \f$\{X_i\}\f$, or the values of
38 : a number of collective variables which are calculated from the trajectory frames are used as input. In this second instance your
39 : input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables. However, if
40 : collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that
41 : this input trajectory is a set of poorly correlated (high-dimensional) vectors. After principal component analysis has been
42 : performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen.
43 : In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory.
44 : These output directions are some linear combination of the \f$x\f$, \f$y\f$ and \f$z\f$ positions if the positions were used as input
45 : or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input.
46 :
47 : As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates. In other words, you must
48 : calculate the average structure and the amount the system fluctuates around this average structure. The problem in doing so when the
49 : \f$x\f$, \f$y\f$ and \f$z\f$ coordinates of a molecule are used as input is that the majority of the changes in the positions of the
50 : atoms comes from the translational and rotational degrees of freedom of the molecule. The first six principal components will thus, most likely,
51 : be uninteresting. Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures
52 : to be analyzed to the first frame in the trajectory. This can be used to effectively remove translational and/or rotational motions from
53 : consideration. The resulting principal components thus describe vibrational motions of the molecule.
54 :
55 : If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be
56 : used as input for the \ref PCAVARS action.
57 :
58 : \par Examples
59 :
60 : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions
61 : of the first 22 atoms. The TYPE=OPTIMAL instruction ensures that translational and rotational degrees of freedom are removed from consideration.
62 : The first two principal components will be output to a file called PCA-comp.pdb. Trajectory frames will be collected on every step and the PCA calculation
63 : will be performed at the end of the simulation.
64 :
65 : \plumedfile
66 : ff: COLLECT_FRAMES ATOMS=1-22 STRIDE=1
67 : pca: PCA USE_OUTPUT_DATA_FROM=ff METRIC=OPTIMAL NLOW_DIM=2
68 : OUTPUT_PCA_PROJECTION USE_OUTPUT_DATA_FROM=pca FILE=PCA-comp.pdb
69 : \endplumedfile
70 :
71 : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the six distances
72 : seen in the previous lines. Notice that here the TYPE=EUCLIDEAN keyword is used to indicate that no alignment has to be done when calculating the various
73 : elements of the covariance matrix from the input vectors. In this calculation the first two principal components will be output to a file called PCA-comp.pdb.
74 : Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps. Consequently, if you run a 2000 step simulation the
75 : PCA analysis will be performed twice. The REWEIGHT_BIAS action in this input tells PLUMED that rather that ascribing a weight of one to each of the frames
76 : when calculating averages and covariance matrices a reweighting should be performed based and each frames' weight in these calculations should be determined based on
77 : the current value of the instantaneous bias (see \ref REWEIGHT_BIAS).
78 :
79 : \plumedfile
80 : d1: DISTANCE ATOMS=1,2
81 : d2: DISTANCE ATOMS=1,3
82 : d3: DISTANCE ATOMS=1,4
83 : d4: DISTANCE ATOMS=2,3
84 : d5: DISTANCE ATOMS=2,4
85 : d6: DISTANCE ATOMS=3,4
86 : rr: RESTRAINT ARG=d1 AT=0.1 KAPPA=10
87 : rbias: REWEIGHT_BIAS TEMP=300
88 :
89 : ff: COLLECT_FRAMES ARG=d1,d2,d3,d4,d5,d6 LOGWEIGHTS=rbias STRIDE=5
90 : pca: PCA USE_OUTPUT_DATA_FROM=ff METRIC=EUCLIDEAN NLOW_DIM=2
91 : OUTPUT_PCA_PROJECTION USE_OUTPUT_DATA_FROM=pca STRIDE=100 FILE=PCA-comp.pdb
92 : \endplumedfile
93 :
94 : */
95 : //+ENDPLUMEDOC
96 :
97 : namespace PLMD {
98 : namespace dimred {
99 :
100 13791 : PLUMED_REGISTER_ACTION(PCA,"PCA")
101 :
102 7 : void PCA::registerKeywords( Keywords& keys ) {
103 7 : DimensionalityReductionBase::registerKeywords( keys );
104 7 : keys.use("ARG");
105 14 : keys.reset_style("ARG","optional");
106 14 : keys.add("compulsory","METRIC","EUCLIDEAN","the method that you are going to use to measure the distances between points");
107 14 : keys.add("atoms","ATOMS","the list of atoms that you are going to use in the measure of distance that you are using");
108 7 : }
109 :
110 3 : PCA::PCA(const ActionOptions&ao):
111 : Action(ao),
112 3 : DimensionalityReductionBase(ao) {
113 : // Get the input PDB file from the underlying ReadAnalysisFrames object
114 3 : analysis::ReadAnalysisFrames* myframes = dynamic_cast<analysis::ReadAnalysisFrames*>( my_input_data );
115 3 : if( !myframes ) {
116 0 : error("input to PCA should be ReadAnalysisFrames object");
117 : }
118 6 : parse("METRIC",mtype);
119 : std::vector<AtomNumber> atoms;
120 3 : log.printf(" performing PCA analysis using %s metric \n", mtype.c_str() );
121 3 : if( my_input_data->getNumberOfAtoms()>0 ) {
122 2 : parseAtomList("ATOMS",atoms);
123 1 : if( atoms.size()!=0 ) {
124 0 : mypdb.setAtomNumbers( atoms );
125 0 : for(unsigned i=0; i<atoms.size(); ++i) {
126 : bool found=false;
127 0 : for(unsigned j=0; j<my_input_data->getAtomIndexes().size(); ++j) {
128 0 : if( my_input_data->getAtomIndexes()[j]==atoms[i] ) {
129 : found=true;
130 : break;
131 : }
132 : }
133 0 : if( !found ) {
134 : std::string num;
135 0 : Tools::convert( atoms[i].serial(), num );
136 0 : error("atom number " + num + " is not stored in any action that has been input");
137 : }
138 : }
139 0 : mypdb.addBlockEnd( atoms.size() );
140 1 : } else if( getNumberOfArguments()==0 ) {
141 1 : mypdb.setAtomNumbers( my_input_data->getAtomIndexes() );
142 1 : mypdb.addBlockEnd( my_input_data->getAtomIndexes().size() );
143 1 : if( mtype=="EUCLIDEAN" ) {
144 : mtype="OPTIMAL";
145 : }
146 : }
147 : }
148 3 : if( my_input_data->getArgumentNames().size()>0 ) {
149 2 : if( getNumberOfArguments()==0 && atoms.size()==0 ) {
150 2 : std::vector<std::string> argnames( my_input_data->getArgumentNames() );
151 2 : mypdb.setArgumentNames( argnames );
152 2 : requestArguments( my_input_data->getArgumentList() );
153 2 : } else {
154 0 : std::vector<Value*> myargs( getArguments() );
155 0 : std::vector<std::string> inargnames( my_input_data->getArgumentNames() );
156 0 : std::vector<std::string> argnames( myargs.size() );
157 0 : for(unsigned i=0; i<myargs.size(); ++i) {
158 0 : argnames[i]=myargs[i]->getName();
159 : bool found=false;
160 0 : for(unsigned j=0; j<inargnames.size(); ++j) {
161 0 : if( argnames[i]==inargnames[j] ) {
162 : found=true;
163 : break;
164 : }
165 : }
166 0 : if( !found ) {
167 0 : error("input named " + my_input_data->getLabel() + " does not store/calculate quantity named " + argnames[i] );
168 : }
169 : }
170 0 : mypdb.setArgumentNames( argnames );
171 0 : requestArguments( myargs );
172 0 : }
173 : }
174 3 : if( nlow==0 ) {
175 0 : error("dimensionality of output not set");
176 : }
177 3 : checkRead();
178 3 : }
179 :
180 3 : void PCA::performAnalysis() {
181 : // Align everything to the first frame
182 3 : my_input_data->getStoredData( 0, false ).transferDataToPDB( mypdb );
183 3 : auto myconf0=metricRegister().create<ReferenceConfiguration>(mtype, mypdb);
184 3 : MultiValue myval( 1, myconf0->getNumberOfReferenceArguments() + 3*myconf0->getNumberOfReferencePositions() + 9 );
185 3 : ReferenceValuePack mypack( myconf0->getNumberOfReferenceArguments(), myconf0->getNumberOfReferencePositions(), myval );
186 25 : for(unsigned i=0; i<myconf0->getNumberOfReferencePositions(); ++i) {
187 : mypack.setAtomIndex( i, i );
188 : }
189 : // Setup some PCA storage
190 3 : myconf0->setupPCAStorage ( mypack );
191 3 : std::vector<double> displace( myconf0->getNumberOfReferencePositions() );
192 3 : if( myconf0->getNumberOfReferencePositions()>0 ) {
193 1 : ReferenceAtoms* at = dynamic_cast<ReferenceAtoms*>( myconf0.get() );
194 1 : displace = at->getDisplace();
195 : }
196 :
197 : // Create some arrays to store the average position
198 3 : std::vector<double> sarg( myconf0->getNumberOfReferenceArguments(), 0 );
199 3 : std::vector<Vector> spos( myconf0->getNumberOfReferencePositions() );
200 25 : for(unsigned i=0; i<myconf0->getNumberOfReferencePositions(); ++i) {
201 22 : spos[i].zero();
202 : }
203 :
204 : // Calculate the average displacement from the first frame
205 3 : double norm=getWeight(0);
206 3 : std::vector<double> args( getNumberOfArguments() );
207 1638 : for(unsigned i=1; i<getNumberOfDataPoints(); ++i) {
208 1635 : my_input_data->getStoredData( i, false ).transferDataToPDB( mypdb );
209 3815 : for(unsigned j=0; j<getArguments().size(); ++j) {
210 2180 : mypdb.getArgumentValue( getArguments()[j]->getName(), args[j] );
211 : }
212 1635 : myconf0->calc( mypdb.getPositions(), getPbc(), getArguments(), args, mypack, true );
213 : // Accumulate average displacement of arguments (Here PBC could do fucked up things - really needs Berry Phase ) GAT
214 3815 : for(unsigned j=0; j<myconf0->getNumberOfReferenceArguments(); ++j) {
215 2180 : sarg[j] += 0.5*getWeight(i)*mypack.getArgumentDerivative(j);
216 : }
217 : // Accumulate average displacement of position
218 13625 : for(unsigned j=0; j<myconf0->getNumberOfReferencePositions(); ++j) {
219 11990 : spos[j] += getWeight(i)*mypack.getAtomsDisplacementVector()[j] / displace[j];
220 : }
221 1635 : norm += getWeight(i);
222 : }
223 : // Now normalise the displacements to get the average and add these to the first frame
224 3 : double inorm = 1.0 / norm ;
225 7 : for(unsigned j=0; j<myconf0->getNumberOfReferenceArguments(); ++j) {
226 4 : sarg[j] = inorm*sarg[j] + myconf0->getReferenceArguments()[j];
227 : }
228 25 : for(unsigned j=0; j<myconf0->getNumberOfReferencePositions(); ++j) {
229 22 : spos[j] = inorm*spos[j] + myconf0->getReferencePositions()[j];
230 : }
231 : // Now accumulate the covariance
232 3 : unsigned narg=myconf0->getNumberOfReferenceArguments(), natoms=myconf0->getNumberOfReferencePositions();
233 3 : Matrix<double> covar( narg+3*natoms, narg+3*natoms );
234 : covar=0;
235 1641 : for(unsigned i=0; i<getNumberOfDataPoints(); ++i) {
236 1638 : my_input_data->getStoredData( i, false ).transferDataToPDB( mypdb );
237 3822 : for(unsigned j=0; j<getArguments().size(); ++j) {
238 2184 : mypdb.getArgumentValue( getArguments()[j]->getName(), args[j] );
239 : }
240 1638 : myconf0->calc( mypdb.getPositions(), getPbc(), getArguments(), args, mypack, true );
241 3822 : for(unsigned jarg=0; jarg<narg; ++jarg) {
242 : // Need sorting for PBC with GAT
243 2184 : double jarg_d = 0.5*mypack.getArgumentDerivative(jarg) + myconf0->getReferenceArguments()[jarg] - sarg[jarg];
244 6552 : for(unsigned karg=0; karg<narg; ++karg) {
245 : // Need sorting for PBC with GAT
246 4368 : double karg_d = 0.5*mypack.getArgumentDerivative(karg) + myconf0->getReferenceArguments()[karg] - sarg[karg];
247 4368 : covar( jarg, karg ) += 0.25*getWeight(i)*jarg_d*karg_d; // mypack.getArgumentDerivative(jarg)*mypack.getArgumentDerivative(karg);
248 : }
249 : }
250 13650 : for(unsigned jat=0; jat<natoms; ++jat) {
251 48048 : for(unsigned jc=0; jc<3; ++jc) {
252 36036 : double jdisplace = mypack.getAtomsDisplacementVector()[jat][jc] / displace[jat] + myconf0->getReferencePositions()[jat][jc] - spos[jat][jc];
253 828828 : for(unsigned kat=0; kat<natoms; ++kat) {
254 3171168 : for(unsigned kc=0; kc<3; ++kc) {
255 2378376 : double kdisplace = mypack.getAtomsDisplacementVector()[kat][kc] /displace[kat] + myconf0->getReferencePositions()[kat][kc] - spos[kat][kc];
256 2378376 : covar( narg+3*jat + jc, narg+3*kat + kc ) += getWeight(i)*jdisplace*kdisplace;
257 : }
258 : }
259 : }
260 : }
261 : }
262 : // Normalise
263 73 : for(unsigned i=0; i<covar.nrows(); ++i) {
264 4434 : for(unsigned j=0; j<covar.ncols(); ++j) {
265 4364 : covar(i,j) *= inorm;
266 : }
267 : }
268 :
269 : // Diagonalise the covariance
270 3 : std::vector<double> eigval( narg+3*natoms );
271 : Matrix<double> eigvec( narg+3*natoms, narg+3*natoms );
272 3 : diagMat( covar, eigval, eigvec );
273 :
274 : // Output the reference configuration
275 3 : mypdb.setAtomPositions( spos );
276 7 : for(unsigned j=0; j<sarg.size(); ++j) {
277 4 : mypdb.setArgumentValue( getArguments()[j]->getName(), sarg[j] );
278 : }
279 : // Reset the reference configuration
280 6 : myref = metricRegister().create<ReferenceConfiguration>( mtype, mypdb );
281 :
282 : // Store and print the eigenvectors
283 3 : std::vector<Vector> tmp_atoms( natoms );
284 9 : for(unsigned dim=0; dim<nlow; ++dim) {
285 6 : unsigned idim = covar.ncols() - 1 - dim;
286 14 : for(unsigned i=0; i<narg; ++i) {
287 8 : mypdb.setArgumentValue( getArguments()[i]->getName(), eigvec(idim,i) );
288 : }
289 50 : for(unsigned i=0; i<natoms; ++i) {
290 176 : for(unsigned k=0; k<3; ++k) {
291 132 : tmp_atoms[i][k]=eigvec(idim,narg+3*i+k);
292 : }
293 : }
294 6 : mypdb.setAtomPositions( tmp_atoms );
295 : // Create a direction object so that we can calculate other PCA components
296 12 : directions.push_back( Direction(ReferenceConfigurationOptions("DIRECTION")));
297 6 : directions[dim].read( mypdb );
298 : }
299 3 : }
300 :
301 0 : void PCA::getProjection( const unsigned& idata, std::vector<double>& point, double& weight ) {
302 0 : if( point.size()!=nlow ) {
303 0 : point.resize( nlow );
304 : }
305 : // Retrieve the weight
306 0 : weight = getWeight(idata);
307 : // Retrieve the data point
308 0 : getProjection( my_input_data->getStoredData( idata, false ), point );
309 0 : }
310 :
311 546 : void PCA::getProjection( analysis::DataCollectionObject& myidata, std::vector<double>& point ) {
312 546 : myidata.transferDataToPDB( mypdb );
313 546 : std::vector<double> args( getArguments().size() );
314 1638 : for(unsigned j=0; j<getArguments().size(); ++j) {
315 1092 : mypdb.getArgumentValue( getArguments()[j]->getName(), args[j] );
316 : }
317 : // Create some storage space
318 546 : MultiValue myval( 1, 3*mypdb.getPositions().size() + args.size() + 9);
319 546 : ReferenceValuePack mypack( args.size(), mypdb.getPositions().size(), myval );
320 546 : for(unsigned i=0; i<mypdb.getPositions().size(); ++i) {
321 : mypack.setAtomIndex( i, i );
322 : }
323 546 : myref->setupPCAStorage( mypack );
324 : // And calculate
325 546 : myref->calculate( mypdb.getPositions(), getPbc(), getArguments(), mypack, true );
326 1638 : for(unsigned i=0; i<nlow; ++i) {
327 1092 : point[i]=myref->projectDisplacementOnVector( directions[i], getArguments(), args, mypack );
328 : }
329 1092 : }
330 :
331 : }
332 : }
|