Belfast tutorial: Analyzing CVs

Aims

The aim of this tutorial is to introduce the users to the plumed syntax. We will go through the writing of simple collective variable and we will use them to analyze a trajectory in terms of probability distributions and free energy.

Learning Outcomes

Once this tutorial is completed students will:

  • Know how to write a simple plumed input file
  • Know how to analyze a trajectory using plumed

Resources

The tarball for this project contains the following files:

  • trajectory-short.xyz : a (short) trajectory for a 16 residue protein in xyz format. All calculations with plumed driver use this trajectory.
  • template.pdb : a single frame from the trajectory that can be used in conjunction with the MOLINFO command

Instructions

PLUMED is a library that can be accessed by multiple codes adding a relatively simple and well documented interface. Once PLUMED is installed you can run a plumed executable that can be used for multiple purposes:

plumed --help 

some of the listed options report about features that are available in plumed, others can be used to tell plumed to do something: from analyzing a trajectory to patch the source code of a MD code and so on. All the commands have further options that can be seen using plumed command –help, i.e.:

plumed driver --help

In the following we are going to see how to write an input file for plumed2 that can be used to analyze a trajectory.

A note on units

By default the PLUMED inputs and outputs quantities in the following units:

  • Energy - kJ/mol
  • Length - nanometers
  • Time - picoseconds

If you want to change these units you can do this using the UNITS keyword.

Introduction to the PLUMED input file

A typical input file for PLUMED input is composed by specification of one or more CVs, the printout frequency and a termination line. Comments are denoted with a # and the termination of the input for PLUMED is marked with the keyword ENDPLUMED. Whatever it follows is ignored by PLUMED. You can introduce blank lines. They are not interpreted by PLUMED.

In the following input we will analyze the DISTANCE between the two terminal carbons of a 16 residues peptide, and we will PRINT the results in file named COLVAR.

#my first plumed input:
DISTANCE ATOMS=2,253 LABEL=e2edist

#printout frequency
PRINT ARG=e2edist STRIDE=1 FILE=COLVAR 

#endofinput 
ENDPLUMED
here I can write what I want it won't be read.

Now we can use this simple input file to analyze the trajectory included in the RESOURCES:

plumed driver --plumed plumed.dat --ixyz trajectory-short.xyz --length-units 0.1

NOTE: –length-units 0.1, xyz files, as well as pdb files, are in Angstrom.

You should have a file COLVAR, if you look at it (i.e. more COLVAR) the first two lines should be:

#! FIELDS time e2edist 
 0.000000 2.5613161

NOTE: the first line of the file COLVAR tells you what is the content of each column.

In PLUMED the commands defined in the input files are executed in the same order in which they are written, this means that the following input file is wrong:

#printout frequency
PRINT ARG=e2edist STRIDE=1 FILE=COLVAR 
#my first plumed input:
DISTANCE ATOMS=2,253 LABEL=e2edist
#endofinput 
ENDPLUMED
here I can write what I want it won't be read.

Try to run it.

Sometimes, when calculating a collective variable, you may not want to use the positions of a number of atoms directly. Instead you may wish to use the position of a virtual atom whose position is generated based on the positions of a collection of other atoms. For example you might want to use the center of a group of atoms (CENTER):

Since PLUMED executes the input in order you need to define the new Virtual Atom before using it:

first: CENTER ATOMS=1,2,3,4,5,6
last: CENTER ATOMS=251-256

e2edist: DISTANCE ATOMS=2,253
comdist: DISTANCE ATOMS=first,last

PRINT ARG=e2edist,comdist STRIDE=1 FILE=COLVAR 

ENDPLUMED

NOTE: an action (i.e. CENTER or DISTANCE here) can be either labeled using LABEL as we did before or as label: ACTION as we have just done here.

With the above input this is what happen inside PLUMED with a STRIDE=1:

  1. calculates the position of the Virtual Atom 'first' as the CENTER of atoms from 1 to 6;
  2. calculates the position of the Virtual Atom 'last' as the CENTER of atoms from 251 to 256;
  3. calculates the distance between atoms 2 and 253 and saves it in 'e2edist';
  4. calculates the distance between the two atoms 'first' and 'last' and saves it in 'comdist';
  5. print the content of 'e2edist' and 'comdist' in the file COLVAR

In the above input we have used two different ways of writing the atoms used in CENTER calculation:

  1. ATOMS=1,2,3,4,5,6 is the explicit list of the atoms we need
  2. ATOMS=251-256 is the range of atoms needed

ranges of atoms can be defined with a stride which can also be negative:

  1. ATOMS=from,to:by (i.e.: 251-256:2)
  2. ATOMS=to,from:-by (i.e.: 256-251:-2)

Now by plotting the content of the COLVAR file we can compare the behavior in this trajectory of both the terminal carbons as well as of the center of masses of the terminal residues.

gnuplot

What do you expect to see now by looking at the trajectory? Let's have a look at it

vmd template.pdb trajectory-short.xyz 

Virtual atoms can be used in place of standard atoms everywhere an atom can be given as input, they can also be used together with standard atoms. So for example we can analyze the TORSION angle for a set of Virtual and Standard atoms:

first: CENTER ATOMS=1-6
last: CENTER ATOMS=251-256
cvtor: TORSION ATOMS=first,102,138,last

PRINT ARG=cvtor STRIDE=1 FILE=COLVAR 

ENDPLUMED

The above CV don't look smart to learn something about the system we are looking at. In principle CV are used to reduce the complexity of a system by looking at a small number of properties that could be enough to rationalize its behavior.

Now try to write a collective variable that measures the Radius of Gyration of the system: GYRATION.

NOTE: if what you need for one or more variables is a long list of atoms and not a virtual atom one can use the keyword GROUP. A GROUP can be defined using ATOMS in the same way we saw before, in addition it is also possible to define a GROUP by reading a GROMACS index file.

ca: GROUP ATOMS=9,16,31,55,69,90,102,114,124,138,160,174,194,208,224,238

Now 'ca' is not a virtual atom but a simple list of atoms.

MULTICOLVAR

Sometimes it can be useful to calculate properties of many similar collective variables at the same time, for example one can be interested in calculating the properties of the distances between a group of atoms, or properties linked to the distribution of the dihedral angles of a chain and so on. In PLUMED this kind of collective variables fall under the name of MULTICOLVAR (cf. MultiColvar.) Here we are going to analyze the distances between CA carbons along the chain:

ca: GROUP ATOMS=9,16,31,55,69,90,102,114,124,138,160,174,194,208,224,238
dd: DISTANCES GROUP=ca MEAN MIN={BETA=50} MAX={BETA=0.02} MOMENTS=2

PRINT ARG=dd.mean,dd.min,dd.max,dd.moment-2 STRIDE=1 FILE=COLVAR 

ENDPLUMED

The above input tells PLUMED to calculate all the distances between CA carbons and then look for the mean distance, the minimum distance, the maximum distance and the variance. In this way we have defined four collective variables that are calculated using the distances. These four collective variables are stored as components of the defined action 'dd': dd.mean, dd.min, dd.max, dd.moment-2.

The infrastructure of multicolvar has been used to develop many PLUMED collective variables as for example the set of Secondary Structure CVs (ANTIBETARMSD, PARABETARMSD and ALPHARMSD).

MOLINFO STRUCTURE=template.pdb
abeta: ANTIBETARMSD RESIDUES=all TYPE=DRMSD LESS_THAN={RATIONAL R_0=0.08 NN=8 MM=12} STRANDS_CUTOFF=1

PRINT ARG=abeta.lessthan STRIDE=1 FILE=COLVAR 

ENDPLUMED

We have now seen how to write the input some of the many CVs available in PLUMED. More complex CVs will be discussed in the next workshop, Belfast tutorial: Adaptive variables I.

Analysis of Collective Variables

Collective variables are usually used to visualize the Free Energy of a system. Given a system evolving at fixed temperature, fixed number of particles and fixed volume, it will explore different conformations with a probability

\[ P(q)\propto e^{-\frac{U(q)}{kb_BT}} \]

where \( q \) are the microscopic coordinates and \( k_B \) is the Boltzmann constant.

It is possible to analyze the above probability as a function of one or more collective variable \( s(q)\):

\[ P(s)\propto \int dq e^{-\frac{U(q)}{kb_BT}} \delta(s-s(q)) \]

where the \( \delta \) function means that to for a given value \( s\) of the collective variable are counted only those conformations for which the CV is \( s\). The probability can be recast to a free energy by taking its logarithm:

\[ F(s)=-k_B T \log P(s) \]

This means that by estimating the probability distribution of a CV it is possible to know the free energy of a system along that CV. Estimating the probability distribution of the conformations of a system is what is called 'sampling'.

In order to estimate a probability distribution one needs to make HISTOGRAM from the calculated CVs. PLUMED includes the possibility of constructing a histogram from data both on the fly as well as a posteriori as we are going to do now.

MOLINFO STRUCTURE=template.pdb
abeta: ANTIBETARMSD RESIDUES=all TYPE=DRMSD LESS_THAN={RATIONAL R_0=0.08 NN=8 MM=12} STRANDS_CUTOFF=1
ca: GROUP ATOMS=9,16,31,55,69,90,102,114,124,138,160,174,194,208,224,238
DISTANCES ...
GROUP=ca MEAN MIN={BETA=50} MAX={BETA=0.02} MOMENTS=2 LABEL=dd
... DISTANCES 

PRINT ARG=abeta.lessthan,dd.mean,dd.min,dd.max,dd.moment-2 STRIDE=1 FILE=COLVAR 

HISTOGRAM ...
ARG=abeta.lessthan,dd.mean
LABEL=hh
KERNEL=DISCRETE
GRID_MIN=0,0.8
GRID_MAX=4,1.2
GRID_BIN=40,40
... HISTOGRAM

DUMPGRID GRID=hh FILE=histo

ENDPLUMED

NOTE: HISTOGRAM ... means that what follow is part of the HISTOGRAM function, the same can be done for any action in PLUMED.

The above input tells PLUMED to accumulate the two collective variables on a GRID. In addition the probability can be converted to a free-energy using the CONVERT_TO_FES method and then use DUMPGRID to write it. Histograms can be accumulated in a smoother way by using a KERNEL function, a kernel is a normalized function, for example a normalized gaussian is the default kernel in PLUMED, that is added to the histogram centered in the position of the data. Estimating a probability density using kernels can in principle give more accurate results, on the other hand in addition to the choice of the binning one has to choose a parameter that is the WIDTH of the kernel function. As a rule of thumb: the grid spacing should be smaller (i.e. one half or less) than the BANDWIDTH and the BANDWIDTH should be smaller (i.e. one order of magnitude) than the variance observed/expected for the variable.

HISTOGRAM ...
LABEL=hh
ARG=abeta.lessthan,dd.mean
GRID_MIN=0,0.8
GRID_MAX=4,1.2
GRID_SPACING=0.04,0.004
BANDWIDTH=0.08,0.008
... HISTOGRAM

DUMPGRID GRID=hh FILE=histo

ENDPLUMED

If you have time less at the end of the session read the manual and look for alternative collective variables to analyze the trajectory. Furthermore try to play with the HISTOGRAM parameters to see the effect of using KERNEL in analyzing data.