Line data Source code
1 : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 : Copyright (c) 2015-2020 The plumed team
3 : (see the PEOPLE file at the root of the distribution for a list of names)
4 :
5 : See http://www.plumed.org for more information.
6 :
7 : This file is part of plumed, version 2.
8 :
9 : plumed is free software: you can redistribute it and/or modify
10 : it under the terms of the GNU Lesser General Public License as published by
11 : the Free Software Foundation, either version 3 of the License, or
12 : (at your option) any later version.
13 :
14 : plumed is distributed in the hope that it will be useful,
15 : but WITHOUT ANY WARRANTY; without even the implied warranty of
16 : MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17 : GNU Lesser General Public License for more details.
18 :
19 : You should have received a copy of the GNU Lesser General Public License
20 : along with plumed. If not, see <http://www.gnu.org/licenses/>.
21 : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
22 : #include "core/ActionRegister.h"
23 : #include "core/ActionShortcut.h"
24 : #include "multicolvar/MultiColvarShortcuts.h"
25 : #include "tools/IFile.h"
26 : #include "core/ActionSetup.h"
27 :
28 : //+PLUMEDOC MCOLVARF PAMM
29 : /*
30 : Probabilistic analysis of molecular motifs.
31 :
32 : Probabilistic analysis of molecular motifs (PAMM) was introduced in this paper \cite pamm.
33 : The essence of this approach involves calculating some large set of collective variables
34 : for a set of atoms in a short trajectory and fitting this data using a Gaussian Mixture Model.
35 : The idea is that modes in these distributions can be used to identify features such as hydrogen bonds or
36 : secondary structure types.
37 :
38 : The assumption within this implementation is that the fitting of the Gaussian mixture model has been
39 : done elsewhere by a separate code. You thus provide an input file to this action which contains the
40 : means, covariance matrices and weights for a set of Gaussian kernels, \f$\{ \phi \}\f$. The values and
41 : derivatives for the following set of quantities is then computed:
42 :
43 : \f[
44 : s_k = \frac{ \phi_k}{ \sum_i \phi_i }
45 : \f]
46 :
47 : Each of the \f$\phi_k\f$ is a Gaussian function that acts on a set of quantities calculated within
48 : a \ref mcolv . These might be \ref TORSIONS, \ref DISTANCES, \ref ANGLES or any one of the many
49 : symmetry functions that are available within \ref mcolv actions. These quantities are then inserted into
50 : the set of \f$n\f$ kernels that are in the the input file. This will be done for multiple sets of values
51 : for the input quantities and a final quantity will be calculated by summing the above \f$s_k\f$ values or
52 : some transformation of the above. This sounds less complicated than it is and is best understood by
53 : looking through the example given below.
54 :
55 : \warning Mixing \ref mcolv actions that are periodic with variables that are not periodic has not been tested
56 :
57 : \par Examples
58 :
59 : In this example I will explain in detail what the following input is computing:
60 :
61 : \plumedfile
62 : #SETTINGS MOLFILE=regtest/basic/rt32/helix.pdb
63 : MOLINFO MOLTYPE=protein STRUCTURE=M1d.pdb
64 : psi: TORSIONS ATOMS1=@psi-2 ATOMS2=@psi-3 ATOMS3=@psi-4
65 : phi: TORSIONS ATOMS1=@phi-2 ATOMS2=@phi-3 ATOMS3=@phi-4
66 : p: PAMM DATA=phi,psi CLUSTERS=clusters.pamm MEAN1={COMPONENT=1} MEAN2={COMPONENT=2}
67 : PRINT ARG=p.mean-1,p.mean-2 FILE=colvar
68 : \endplumedfile
69 :
70 : The best place to start our explanation is to look at the contents of the clusters.pamm file
71 :
72 : \auxfile{clusters.pamm}
73 : #! FIELDS height phi psi sigma_phi_phi sigma_phi_psi sigma_psi_phi sigma_psi_psi
74 : #! SET multivariate von-misses
75 : #! SET kerneltype gaussian
76 : 2.97197455E-0001 -1.91983118E+0000 2.25029540E+0000 2.45960237E-0001 -1.30615381E-0001 -1.30615381E-0001 2.40239117E-0001
77 : 2.29131448E-0002 1.39809354E+0000 9.54585380E-0002 9.61755708E-0002 -3.55657919E-0002 -3.55657919E-0002 1.06147253E-0001
78 : 5.06676398E-0001 -1.09648066E+0000 -7.17867907E-0001 1.40523052E-0001 -1.05385552E-0001 -1.05385552E-0001 1.63290557E-0001
79 : \endauxfile
80 :
81 : This files contains the parameters of two two-dimensional Gaussian functions. Each of these Gaussian kernels has a weight, \f$w_k\f$,
82 : a vector that specifies the position of its center, \f$\mathbf{c}_k\f$, and a covariance matrix, \f$\Sigma_k\f$. The \f$\phi_k\f$ functions that
83 : we use to calculate our PAMM components are thus:
84 :
85 : \f[
86 : \phi_k = \frac{w_k}{N_k} \exp\left( -(\mathbf{s} - \mathbf{c}_k)^T \Sigma^{-1}_k (\mathbf{s} - \mathbf{c}_k) \right)
87 : \f]
88 :
89 : In the above \f$N_k\f$ is a normalization factor that is calculated based on \f$\Sigma\f$. The vector \f$\mathbf{s}\f$ is a vector of quantities
90 : that are calculated by the \ref TORSIONS actions. This vector must be two dimensional and in this case each component is the value of a
91 : torsion angle. If we look at the two \ref TORSIONS actions in the above we are calculating the \f$\phi\f$ and \f$\psi\f$ backbone torsional
92 : angles in a protein (Note the use of \ref MOLINFO to make specification of atoms straightforward). We thus calculate the values of our
93 : 2 \f$ \{ \phi \} \f$ kernels 3 times. The first time we use the \f$\phi\f$ and \f$\psi\f$ angles in the second residue of the protein,
94 : the second time it is the \f$\phi\f$ and \f$\psi\f$ angles of the third residue of the protein and the third time it is the \f$\phi\f$ and \f$\psi\f$ angles
95 : of the fourth residue in the protein. The final two quantities that are output by the print command, p.mean-1 and p.mean-2, are the averages
96 : over these three residues for the quantities:
97 : \f[
98 : s_1 = \frac{ \phi_1}{ \phi_1 + \phi_2 }
99 : \f]
100 : and
101 : \f[
102 : s_2 = \frac{ \phi_2}{ \phi_1 + \phi_2 }
103 : \f]
104 : There is a great deal of flexibility in this input. We can work with, and examine, any number of components, we can use any set of collective variables
105 : and compute these PAMM variables and we can transform the PAMM variables themselves in a large number of different ways when computing these sums.
106 : */
107 : //+ENDPLUMEDOC
108 :
109 : namespace PLMD {
110 : namespace pamm {
111 :
112 : class PAMM : public ActionShortcut {
113 : public:
114 : static void registerKeywords( Keywords& keys );
115 : explicit PAMM(const ActionOptions&);
116 : };
117 :
118 : PLUMED_REGISTER_ACTION(PAMM,"PAMM")
119 :
120 4 : void PAMM::registerKeywords( Keywords& keys ) {
121 4 : ActionShortcut::registerKeywords( keys );
122 8 : keys.add("compulsory","ARG","the vectors from which the pamm coordinates are calculated");
123 8 : keys.add("compulsory","CLUSTERS","the name of the file that contains the definitions of all the clusters");
124 8 : keys.add("compulsory","REGULARISE","0.001","don't allow the denominator to be smaller then this value");
125 8 : keys.add("compulsory","KERNELS","all","which kernels are we computing the PAMM values for");
126 4 : multicolvar::MultiColvarShortcuts::shortcutKeywords( keys );
127 4 : keys.needsAction("KERNEL");
128 4 : keys.needsAction("COMBINE");
129 4 : }
130 :
131 2 : PAMM::PAMM(const ActionOptions& ao) :
132 : Action(ao),
133 2 : ActionShortcut(ao) {
134 : // Must get list of input value names
135 : std::vector<std::string> valnames;
136 4 : parseVector("ARG",valnames);
137 : // Create input values
138 2 : std::string argstr=" ARG=" + valnames[0];
139 3 : for(unsigned j=1; j<valnames.size(); ++j) {
140 2 : argstr += "," + valnames[j];
141 : }
142 :
143 : // Create actions to calculate all pamm kernels
144 : unsigned nkernels = 0;
145 : double h;
146 : std::string fname;
147 2 : parse("CLUSTERS",fname);
148 2 : IFile ifile;
149 2 : ifile.open(fname);
150 2 : ifile.allowIgnoredFields();
151 : for(unsigned k=0;; ++k) {
152 22 : if( !ifile.scanField("height",h) ) {
153 : break;
154 : }
155 : // Create a kernel for this cluster
156 : std::string num, wstr, ktype;
157 9 : Tools::convert( k+1, num );
158 9 : Tools::convert(h,wstr);
159 9 : ifile.scanField("kerneltype",ktype);
160 18 : readInputLine( getShortcutLabel() + "_kernel-" + num + ": KERNEL NORMALIZED" + argstr + " NUMBER=" + num + " REFERENCE=" + fname + " WEIGHT=" + wstr + " TYPE=" + ktype );
161 9 : nkernels++;
162 9 : ifile.scanField();
163 9 : }
164 2 : ifile.close();
165 :
166 : // And add on the regularization
167 : std::string regparam;
168 4 : parse("REGULARISE",regparam);
169 : // Now combine all the PAMM objects with the regparam
170 2 : std::string paramstr, cinput = getShortcutLabel() + "_ksum: COMBINE PERIODIC=NO";
171 11 : for(unsigned k=0; k<nkernels; ++k) {
172 : std::string num;
173 9 : Tools::convert( k+1, num );
174 9 : if( k==0 ) {
175 : cinput += " ARG=";
176 4 : paramstr=" PARAMETERS=-" + regparam;
177 : } else {
178 : cinput += ",";
179 : paramstr += ",0";
180 : }
181 18 : cinput += getShortcutLabel() + "_kernel-" + num;
182 : }
183 4 : readInputLine( cinput + paramstr );
184 :
185 : // And now compute all the pamm kernels
186 : std::string kchoice;
187 4 : parse("KERNELS",kchoice);
188 : std::map<std::string,std::string> keymap;
189 2 : multicolvar::MultiColvarShortcuts::readShortcutKeywords( keymap, this );
190 2 : if( kchoice=="all" ) {
191 11 : for(unsigned k=0; k<nkernels; ++k) {
192 : std::string num;
193 9 : Tools::convert( k+1, num );
194 18 : readInputLine( getShortcutLabel() + "-" + num + ": CUSTOM ARG=" + getShortcutLabel() + "_kernel-" + num + "," + getShortcutLabel() + "_ksum FUNC=x/y PERIODIC=NO");
195 18 : multicolvar::MultiColvarShortcuts::expandFunctions( getShortcutLabel() + "-" + num, getShortcutLabel() + "-" + num, "", keymap, this );
196 : }
197 : } else {
198 0 : std::vector<std::string> awords=Tools::getWords(kchoice,"\t\n ,");
199 0 : Tools::interpretRanges( awords );
200 0 : for(unsigned k=0; k<awords.size(); ++k) {
201 0 : readInputLine( getShortcutLabel() + "-" + awords[k] + ": CUSTOM ARG=" + getShortcutLabel() + "_kernel-" + awords[k] + "," + getShortcutLabel() + "_ksum FUNC=x/y PERIODIC=NO");
202 0 : multicolvar::MultiColvarShortcuts::expandFunctions( getShortcutLabel() + "-" + awords[k], getShortcutLabel() + "-" + awords[k], "", keymap, this );
203 : }
204 0 : }
205 4 : }
206 :
207 : }
208 : }
|