All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
CS2Backbone.cpp
Go to the documentation of this file.
1 /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2  Copyright (c) 2013 The plumed team
3  (see the PEOPLE file at the root of the distribution for a list of names)
4 
5  See http://www.plumed-code.org for more information.
6 
7  This file is part of plumed, version 2.0.
8 
9  plumed is free software: you can redistribute it and/or modify
10  it under the terms of the GNU Lesser General Public License as published by
11  the Free Software Foundation, either version 3 of the License, or
12  (at your option) any later version.
13 
14  plumed is distributed in the hope that it will be useful,
15  but WITHOUT ANY WARRANTY; without even the implied warranty of
16  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17  GNU Lesser General Public License for more details.
18 
19  You should have received a copy of the GNU Lesser General Public License
20  along with plumed. If not, see <http://www.gnu.org/licenses/>.
21 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
22 #ifdef __PLUMED_HAS_ALMOST
23 
24 #include "Colvar.h"
25 #include "ActionRegister.h"
26 #include "core/PlumedMain.h"
27 #include "core/Atoms.h"
28 #include "tools/Communicator.h"
29 
30 #include <string>
31 #include <cmath>
32 #include <cassert>
33 
34 #include <almost/mdb.h>
35 #include <almost/pdb.h>
36 #include <almost/forcefield/const/camshift2.h>
37 #include <almost/io/formostream.h>
38 
39 using namespace std;
40 using namespace Almost;
41 
42 namespace PLMD{
43 
44 //+PLUMEDOC COLVAR CS2BACKBONE
45 /*
46 This collective variable calculates a scoring function based on the comparison of backcalculated and
47 experimental backbone chemical shifts for a protein (CA, CB, C', H, HA, N).
48 
49 CamShift \cite Kohlhoff:2009us is employed to back calculate the chemical shifts that are then compared
50 with a set of experimental values to generate a score \cite Robustelli:2010dn \cite Granata:2013dk.
51 
52 It is also possible to backcalculate the chemical shifts from multiple replicas and then average them
53 to perform Replica-Averaged Restrained MD simulations \cite Camilloni:2012je \cite Camilloni:2013hs.
54 
55 HOW TO COMPILE IT
56 
57 In general the system for which chemical shifts are to be calculated must be completly included in
58 ATOMS. It should also be made WHOLE before the the backcalculation. CamShift is included in the
59 free package ALMOST v.2.1 that can be dowload via SVN (svn checkout svn://svn.code.sf.net/p/almost/code/ almost-code).
60 ALMOST 2.1 can be found in branches/almost-2.1/ and it can be compiled:
61 
62 \verbatim
63 ./configure
64 make
65 \endverbatim
66 
67 Once the code is compiled you should see the ALMOST library libAlm.a in src/lib/
68 
69 PLUMED 2 must then be compiled by linking ALMOST:
70 
71 \verbatim
72 in DYNAMIC_LIBS the following paths should be added:
73 
74 -L/ALMOST_BASE_PATH/branches/almost-2.1/src/lib -lAlm \
75 -L/ALMOST_BASE_PATH/branches/almost-2.1/lib/sqlite-3.6.23.1 -lsqlite3 -lz -lbz2 \
76 -L/ALMOST_BASE_PATH/branches/almost-2.1/src/forcefield -lnbimpl \
77 -L/ALMOST_BASE_PATH/branches/almost-2.1/src/lib/modules -lshx
78 
79 and in CPPFLAGS
80 
81 -I/ALMOST_BASE_PATH/almost/branches/almost-2.1/include \
82 -I/ALMOST_BASE_PATH/almost/branches/almost-2.1/include/almost \
83 -I/ALMOST_BASE_PATH/almost/branches/almost-2.1/lib/sqlite-3.6.23.1 -D__PLUMED_HAS_ALMOST
84 
85 with ALMOST_BASE_PATH the full path to the ALMOST folder
86 \endverbatim
87 
88 HOW TO USE IT
89 
90 To use CamShift as a restraint or as collective variable or as a replica-averaged restraint
91 first of all the experimental data are needed. CamShift uses backbone and Cb chemical shifts
92 that must be provided as text files:
93 
94 \verbatim
95 CAshifts.dat:
96 CBshifts.dat:
97 Cshifts.dat:
98 Hshifts.dat:
99 HAshifts.dat:
100 Nshifts.dat:
101 #1 0.0
102 2 55.5
103 3 58.4
104 .
105 .
106 #last 0.0
107 #last+1 (first) of second chain
108 .
109 #last of second chain
110 \endverbatim
111 
112 All of them must always be there. If a chemical shift for an atom of a residue is not available 0.0 must be used.
113 So if for example all the Cb are not available all the chemical shifts for all the residues should be set as 0.0.
114 
115 A template.pdb file is needed to the generate a topology of the protein within ALMOST. For histidines in protonation
116 states different from D the HIE/HIP name should be used in the template.pdb. GLH and ASH can be used for the alternative
117 protonation of GLU and ASP. Non-standard amino acids and other molecules are not yet supported! If multiple chains are
118 present the chain identifier must be in the standard PDB format, together with the TER keyword at the end of each chain.
119 Residues numbering should always go from 1 to N in both the chemical shifts files as well as in the template.pdb file.
120 Two more keywords can be used to setup the topology: CYS-DISU to tell ALMOST to look for disulphide bridges and TERMINI
121 to define the protonation state of the the chain's termini (currently only DEFAULT (NH3+, CO2-) and NONE (NH, CO)).
122 
123 Two more standard files are also needed: an ALMOST force-field file, corresponding to the force-field family used in
124 the MD code, (a03_cs2_gromacs.mdb for the amber family or all22_gromacs.mdb for the charmm family) and camshift.db,
125 both these files can be copied from almost/branches/almost-2.1/toppar.
126 
127 All the above files must be in a single folder that must be specified with the keyword DATA.
128 
129 \par Examples
130 
131 case 1:
132 
133 \verbatim
134 WHOLEMOLECULES ENTITY0=1-174
135 cs: CS2BACKBONE ATOMS=1-174 DATA=data/ FF=a03_gromacs.mdb FLAT=0.0 NRES=13 ENSEMBLE
136 cse: RESTRAINT ARG=cs SLOPE=24 KAPPA=0 AT=0.
137 PRINT ARG=cs,cse.bias
138 \endverbatim
139 
140 case 2:
141 
142 \verbatim
143 WHOLEMOLECULES ENTITY0=1-174
144 cs: CS2BACKBONE ATOMS=1-174 DATA=data/ FF=a03_gromacs.mdb FLAT=1.0 NRES=13 TERMINI=DEFAULT,NONE CYS-DISU WRITE_CS=1000
145 PRINT ARG=cs
146 \endverbatim
147 
148 (See also \ref WHOLEMOLECULES, \ref RESTRAINT and \ref PRINT)
149 
150 */
151 //+ENDPLUMEDOC
152 
153 class CS2Backbone : public Colvar {
154  vector<CamShift2> cam_list;
155  Molecules molecules;
156  int numResidues;
157  int pperiod;
158  int ens_dim;
159  bool ensemble;
160  bool serial;
161  double **sh;
162  double ene_pl2alm;
163  double len_pl2alm;
164  double for_pl2alm;
165 public:
166  CS2Backbone(const ActionOptions&);
167  ~CS2Backbone();
168  static void registerKeywords( Keywords& keys );
169  virtual void calculate();
170 };
171 
172 PLUMED_REGISTER_ACTION(CS2Backbone,"CS2BACKBONE")
173 
174 void CS2Backbone::registerKeywords( Keywords& keys ){
175  Colvar::registerKeywords( keys );
176  keys.addFlag("SERIAL",false,"Perform the calculation in serial - for debug purpose.");
177  keys.add("atoms","ATOMS","The atoms to be included in the calculatios, e.g. the whole protein.");
178  keys.add("compulsory","DATA","data/","The folder with the experimental chemical shifts.");
179  keys.add("compulsory","FF","a03_gromacs.mdb","The ALMOST force-field to map the atoms' names.");
180  keys.add("compulsory","FLAT","1.0","Flat region in the scoring function.");
181  keys.add("compulsory","NEIGH_FREQ","10","Period in step for neighbour list update.");
182  keys.add("compulsory","WRITE_CS","0","Write the back-calculated chemical shifts every # steps.");
183  keys.add("compulsory","NRES","Number of residues, corresponding to the number of chemical shifts.");
184  keys.add("optional","TERMINI","Defines the protonation states of the chain-termini.");
185  keys.addFlag("CYS-DISU",false,"Set to TRUE if your system has disulphide bridges.");
186  keys.addFlag("ENSEMBLE",false,"Set to TRUE if you want to average over multiple replicas.");
187  keys.remove("NOPBC");
188 }
189 
190 CS2Backbone::CS2Backbone(const ActionOptions&ao):
192 {
193  string stringadb;
194  string stringamdb;
195  string stringapdb;
196 
197  serial=false;
198  parseFlag("SERIAL",serial);
199 
200  string stringa_data;
201  parse("DATA",stringa_data);
202 
203  string stringa_forcefield;
204  parse("FF",stringa_forcefield);
205 
206  bool disu=false;
207  parseFlag("CYS-DISU",disu);
208 
209  double grains=1.0;
210  parse("FLAT", grains);
211 
212  int neigh_f=10;
213  parse("NEIGH_FREQ", neigh_f);
214 
215  int w_period=0;
216  parse("WRITE_CS", w_period);
217  pperiod=w_period;
218 
219  parse("NRES", numResidues);
220 
221  ensemble=false;
222  parseFlag("ENSEMBLE",ensemble);
223  if(ensemble&&comm.Get_rank()==0) {
224  if(multi_sim_comm.Get_size()<2) plumed_merror("You CANNOT run Replica-Averaged simulations without running multiple replicas!\n");
225  else ens_dim=multi_sim_comm.Get_size();
226  } else ens_dim=0;
227  if(ensemble) comm.Sum(&ens_dim, 1);
228 
229  stringadb = stringa_data + string("/camshift.db");
230  stringamdb = stringa_data + string("/") + stringa_forcefield;
231  stringapdb = stringa_data + string("/template.pdb");
232 
233  log.printf(" loading force-field %s\n", stringamdb.c_str()); log.flush();
234  Almost::MDB mdb((char*)stringamdb.c_str());
235  log.printf(" loading template %s\n", stringapdb.c_str()); log.flush();
236  Almost::PDB pdb((char*)stringapdb.c_str());
237 
238  vector<string> termini;
239  string stringa_mol;
240  parse("TERMINI",stringa_mol);
241  if(stringa_mol.length()>0) {
242  int num_chains = pdb[0].size();
243  vector<string> data=Tools::getWords(stringa_mol,",");
244  if(data.size()!=2*num_chains) plumed_merror("You have to define both the NTerm and the CTerm for each chain of your system!\n");
245  for(unsigned i=0;i<data.size();i++) termini.push_back(data[i]);
246  } else {
247  int num_chains = pdb[0].size();
248  for(unsigned i=0;i<(2*num_chains);i++) termini.push_back("DEFAULT");
249  }
250 
251  log.printf(" building molecule ..."); log.flush();
252  for(unsigned i=0;i<pdb[0].size();i++){
253  unsigned j=2*i;
254  string str;
255  str +='A'+i;
256  Protein p(str);
257  p.build_missing(pdb[0][i],mdb,termini[j],termini[j+1]);
258  if(disu) p.auto_disu_bonds(2.9,mdb);
259  molecules.add_protein(p);
260  }
261  log.printf(" done!\n"); log.flush();
262  log.printf(" Writing converted template.pdb ...\n"); log.flush();
263  mol2pdb(molecules,"converted-template.pdb");
264 
265  log.printf(" Initialization of the predictor ...\n"); log.flush();
266  CamShift2 a = CamShift2(molecules, stringadb);
267 
268  log.printf(" Reading experimental data ...\n"); log.flush();
269  stringadb = stringa_data + string("/CAshifts.dat");
270  log.printf(" Initializing CA shifts %s\n", stringadb.c_str()); log.flush();
271  a.read_cs(stringadb, "CA");
272  stringadb = stringa_data + string("/CBshifts.dat");
273  log.printf(" Initializing CB shifts %s\n", stringadb.c_str()); log.flush();
274  a.read_cs(stringadb, "CB");
275  stringadb = stringa_data + string("/Cshifts.dat");
276  log.printf(" Initializing C' shifts %s\n", stringadb.c_str()); log.flush();
277  a.read_cs(stringadb, "C");
278  stringadb = stringa_data + string("/HAshifts.dat");
279  log.printf(" Initializing HA shifts %s\n", stringadb.c_str()); log.flush();
280  a.read_cs(stringadb, "HA");
281  stringadb = stringa_data + string("/Hshifts.dat");
282  log.printf(" Initializing H shifts %s\n", stringadb.c_str()); log.flush();
283  a.read_cs(stringadb, "H");
284  stringadb = stringa_data + string("/Nshifts.dat");
285  log.printf(" Initializing N shifts %s\n", stringadb.c_str()); log.flush();
286  a.read_cs(stringadb, "N");
287  /* this is a workaround for those chemical shifts that can result in too large forces */
288  a.remove_problematic("GLN","CB");
289  a.remove_problematic("ILE","CB");
290  /* this is a workaround for those chemical shifts that are not parameterized */
291  a.remove_problematic("HIE", "HA"); a.remove_problematic("HIP", "HA"); a.remove_problematic("HSP", "HA");
292  a.remove_problematic("HIE", "H"); a.remove_problematic("HIP", "H"); a.remove_problematic("HSP", "H");
293  a.remove_problematic("HIE", "N"); a.remove_problematic("HIP", "N"); a.remove_problematic("HSP", "N");
294  a.remove_problematic("HIE", "CA"); a.remove_problematic("HIP", "CA"); a.remove_problematic("HSP", "CA");
295  a.remove_problematic("HIE", "CB"); a.remove_problematic("HIP", "CB"); a.remove_problematic("HSP", "CB");
296  a.remove_problematic("HIE", "C"); a.remove_problematic("HIP", "C"); a.remove_problematic("HSP", "C");
297  a.remove_problematic("GLH", "HA"); a.remove_problematic("ASH", "HA"); a.remove_problematic("HSE", "HA");
298  a.remove_problematic("GLH", "H"); a.remove_problematic("ASH", "H"); a.remove_problematic("HSE", "H");
299  a.remove_problematic("GLH", "N"); a.remove_problematic("ASH", "N"); a.remove_problematic("HSE", "N");
300  a.remove_problematic("GLH", "CA"); a.remove_problematic("ASH", "CA"); a.remove_problematic("HSE", "CA");
301  a.remove_problematic("GLH", "CB"); a.remove_problematic("ASH", "CB"); a.remove_problematic("HSE", "CB");
302  a.remove_problematic("GLH", "C"); a.remove_problematic("ASH", "C"); a.remove_problematic("HSE", "C");
303  if(disu) {
304  a.remove_problematic("CYS", "HA");
305  a.remove_problematic("CYS", "H");
306  a.remove_problematic("CYS", "N");
307  a.remove_problematic("CYS", "CA");
308  a.remove_problematic("CYS", "CB");
309  a.remove_problematic("CYS", "C");
310  }
311  /* done */
312 
313  log.printf(" Setting parameters ...\n"); log.flush();
314  unsigned stride=comm.Get_size();
315  unsigned rank=comm.Get_rank();
316  if(serial) {stride=1; rank=0;}
317  if(stride>1) log.printf(" Parallelized over %u processors\n", stride);
318  a.set_mpi(stride, rank);
319 
320  if(ensemble) { log.printf(" ENSEMBLE averaging over %i replicas\n", ens_dim); }
321 
322  a.set_flat_bottom_const(grains);
323  a.set_box_nupdate(neigh_f);
324  a.set_lambda(1);
325  cam_list.push_back(a);
326 
327  sh = new double*[numResidues];
328  sh[0] = new double[numResidues*6];
329  for (int i=1; i<numResidues; i++) sh[i]=sh[i-1]+6;
330 
331  /* Energy and Lenght conversion */
332  ene_pl2alm = 4.186/plumed.getAtoms().getUnits().getEnergy();
333  len_pl2alm = 10.00*plumed.getAtoms().getUnits().getLength();
334  for_pl2alm = ene_pl2alm*len_pl2alm;
335  log.printf(" Conversion table from plumed to Almost:\n");
336  log.printf(" Energy %lf\n", ene_pl2alm);
337  log.printf(" Length %lf\n", len_pl2alm);
338 
339  vector<AtomNumber> atoms;
340  parseAtomList("ATOMS",atoms);
341  checkRead();
342 
343  log<<" Bibliography "
344  <<plumed.cite("Kohlhoff K, Robustelli P, Cavalli A, Salvatella A, Vendruscolo M, J. Am. Chem. Soc. 131, 13894 (2009)")
345  <<plumed.cite("Camilloni C, Robustelli P, De Simone A, Cavalli A, Vendruscolo M, J. Am. Chem. Soc. 134, 3968 (2012)") <<"\n";
346 
348  setNotPeriodic();
349  requestAtoms(atoms);
350  log.printf(" DONE!\n"); log.flush();
351 }
352 
353 CS2Backbone::~CS2Backbone()
354 {
355  delete[] sh[0];
356  delete[] sh;
357 }
358 
359 
360 void CS2Backbone::calculate()
361 {
362  double energy=0.;
363  Tensor virial;
364  virial.zero();
365  vector<Vector> deriv(getNumberOfAtoms());
366  int N = getNumberOfAtoms();
367  Coor<double> coor(N);
368  Coor<double> forces(N);
369 
370  forces.clear();
371  for(int i=0; i<numResidues; i++) for(unsigned j=0; j<6; j++) sh[i][j]=0.;
372 
373  for (int i = 0; i < N; i++) {
374  int ipos = 4 * i;
375  Vector Pos = getPosition(i);
376  coor.coor[ipos] = len_pl2alm*Pos[0];
377  coor.coor[ipos+1] = len_pl2alm*Pos[1];
378  coor.coor[ipos+2] = len_pl2alm*Pos[2];
379  }
380  cam_list[0].ens_return_shifts(coor, sh);
381  if(!serial) comm.Sum(&sh[0][0], numResidues*6);
382 
383  bool printout=false;
384  if(pperiod>0&&comm.Get_rank()==0) printout = (!(getStep()%pperiod));
385  if(printout) {
386  string csfile;
387  char tmps1[21], tmps2[21];
388  // add to the name the label of the cv in such a way to have different files
389  // when there is more than one defined variable
390  sprintf(tmps1, "%li", getStep());
391  if(ensemble) {
392  sprintf(tmps2, "%i", multi_sim_comm.Get_rank());
393  csfile = string("cs")+tmps2+"-"+tmps1+string(".dat");
394  } else csfile = string("cs")+tmps1+string(".dat");
395  cam_list[0].printout_chemical_shifts(csfile.c_str());
396  }
397 
398  double fact=1.0;
399  if(ensemble) {
400  fact = 1./((double) ens_dim);
401  if(comm.Get_rank()==0) { // I am the master of my replica
402  // among replicas
403  multi_sim_comm.Sum(&sh[0][0], numResidues*6);
404  multi_sim_comm.Barrier();
405  for(unsigned i=0;i<6;i++) for(int j=0;j<numResidues;j++) sh[j][i] *= fact;
406  } else for(unsigned i=0;i<6;i++) for(int j=0;j<numResidues;j++) sh[j][i] = 0.;
407  // inside each replica
408  comm.Sum(&sh[0][0], numResidues*6);
409  }
410 
411  energy = cam_list[0].ens_energy_force(coor, forces, sh);
412  if(!serial) comm.Sum(&forces[0][0], N*4);
413 
414  for (int i = 0; i < N; i++)
415  {
416  Vector For;
417  int ipos = 4 * i;
418  For[0] = forces.coor[ipos];
419  For[1] = forces.coor[ipos+1];
420  For[2] = forces.coor[ipos+2];
421  deriv[i] = fact*for_pl2alm*For;
422  virial=virial+(-1.*Tensor(getPosition(i),deriv[i]));
423  }
424 
425  for(unsigned i=0;i<getNumberOfAtoms();++i) setAtomsDerivatives(i,deriv[i]);
426  setValue (ene_pl2alm*energy);
427  setBoxDerivatives (virial);
428 }
429 
430 }
431 #endif
void zero()
set it to zero
Definition: Tensor.h:198
void parseFlag(const std::string &key, bool &t)
Parse one keyword as boolean flag.
Definition: Action.cpp:104
void setNotPeriodic()
Set your default value to have no periodicity.
Log & log
Reference to the log stream.
Definition: Action.h:93
void checkRead()
Check if Action was properly read.
Definition: Action.cpp:161
STL namespace.
void parseAtomList(const std::string &key, std::vector< AtomNumber > &t)
Parse a list of atoms without a numbered keyword.
Communicator & comm
Definition: Action.h:158
void addValueWithDerivatives()
Add a value with the name label that has derivatives.
void requestAtoms(const std::vector< AtomNumber > &a)
Definition: Colvar.cpp:44
#define PLUMED_COLVAR_INIT(ao)
Definition: Colvar.h:29
void parse(const std::string &key, T &t)
Parse one keyword as generic type.
Definition: Action.h:273
Communicator & multi_sim_comm
Definition: Action.h:159
FileBase & flush()
Flushes the file to disk.
Definition: FileBase.cpp:70
int Get_rank() const
Obtain the rank of the present process.
static std::vector< std::string > getWords(const std::string &line, const char *sep=NULL, int *parlevel=NULL, const char *parenthesis="{")
Split the line in words using separators.
Definition: Tools.cpp:112
int printf(const char *fmt,...)
Formatted output with explicit format - a la printf.
Definition: OFile.cpp:82
Main plumed object.
Definition: Plumed.h:201
Tensor3d Tensor
Definition: Tensor.h:425
int Get_size() const
Obtain the number of processes.
void const char const char int double * a
Definition: Matrix.h:42
Vector3d Vector
Alias for three dimensional vectors.
Definition: Vector.h:351
void Sum(T *, int)
Wrapper for MPI_Allreduce with MPI_SUM.
Definition: Communicator.h:124