MARVEL tutorial: Analyzing CVs

Aims

The aim of this tutorial is to introduce you to the PLUMED syntax. We will go through the writing of input files to calculate and print simple collective variables. We will then discuss how we can use PLUMED to analyze a trajectory by calculating ensemble averages, histograms and free energy surfaces.

Learning Outcomes

Once this tutorial is completed students will:

  • Know how to write PLUMED input files that can be used to calculate and print collective variables.
  • Be able to calculate a collective variable that take the position of center of mass as input.
  • Know how to write a PLUMED input file that can be used to calculate an ensemble average.
  • Know how to write a PLUMED input file that can be used to calculate a histogram. Students will also learn how this histogram can be converted into a free energy surface.

Resources

The tarball for this project contains the following files:

  • trajectory-short.xyz : a (short) trajectory for a 16 residue protein in xyz format. All the calculations with plumed driver that will be performed during this tutorial will use this trajectory.
  • template.pdb : a single frame from the trajectory that can be used in conjunction with the MOLINFO command
  • in : An input file for the simplemd code that forms part of PLUMED
  • input.xyz : A configuration file for Lennard-Jones solid with an FCC solid structure

Instructions

PLUMED is a library that can be incorporated into many MD codes by adding a relatively simple and well documented interface. Once it is incorporated you can use PLUMED to perform a variety of different analyzes on the fly and to bias the sampling in the molecular dynamics engine. PLUMED can also, however, be used as a standalone code for analyzing trajectories. If you are using the code in this way you can, once PLUMED is compiled, run the plumed executable by issuing the command:

plumed <instructions>

Let's start by getting a feel for the range of calculations that we can use PLUMED to perform. Issue the following command now:

plumed --help 

What is output when this command is run is a list of the tasks you can use PLUMED to perform. There are commands that allow you to patch an MD code, commands that allow you to run molecular dynamics and commands that allow you to build the manual. In this tutorial we will mostly be using PLUMED to analyze trajectories, however. As such most of the calculations we will perform will be performed using the driver tool. Let's look at the options we can issue to plumed driver by issuing the following command:

plumed driver --help

As you can see we can do a number of things with plumed driver. For all of these options, however, we are going to need to write a PLUMED input file. Before we get on to writing input files for PLUMED there is information Codes interfaced with PLUMED here which provides details on what the other PLUMED tools do and instructions for how to interface PLUMED with an MD code. You may like to look at this information now or you might prefer to return after you have finished the exercises here.

The PLUMED internal units

By default the PLUMED inputs and outputs quantities in the following units:

  • Energy - kJ/mol
  • Length - nanometers
  • Time - picoseconds

If you want to change these units you can do this using the UNITS keyword.

Introduction to the PLUMED input file

Many input files for PLUMED provides specifications for one or more CVs. These specifications are then followed by an instruction to PLUMED to PRINT these CVs and a termination line. Comments are denoted with a # and the termination of the input for PLUMED is marked with the keyword ENDPLUMED. Any words that follow the ENDPLUMED line are ignored by PLUMED. You can also introduce blank lines as these will not be interpreted by PLUMED.

The following input can be used analyze the DISTANCE between the two terminal carbons of a 16 residues peptide. The PRINT command after the DISTANCE command ensures that the results (i.e. the distances calculated) are printed into the file named COLVAR.

#my first PLUMED input:
e2edist: DISTANCE ATOMS=2,253 

#printout frequency
PRINT ARG=e2edist STRIDE=1 FILE=COLVAR 

#endofinput 
ENDPLUMED
here I can write what I want it won't be read.

Let's use this simple input file to analyze the trajectory included in the RESOURCES. To do so copy the input above into a file called plumed.dat and then issue the command below:

plumed driver --plumed plumed.dat --ixyz trajectory-short.xyz --length-units 0.1

Notice the –length-units 0.1 flag here. This tells PLUMED to convert the positions in the xyz file here, which are in Angstroms, into nm, which remember are The PLUMED internal units
When this command finishes running you should have a file called COLVAR. If you look at it's contents (using the command more COLVAR for instance) you will find that the first two lines read:

#! FIELDS time e2edist 
 0.000000 2.5613161

The first line of the COLVAR files tells you what values are in each of the columns. The remaining lines then tell you the values these quantities took in the various trajectory frames. We can plot this data using gnuplot (or our favorite graphing program) by issuing the following commands:

gnuplot
p 'COLVAR' u 1:2 w l

What this graph shows is the value of the distance that we calculated using PLUMED as a function of time. As you can see the distance fluctuates about as the atoms in our system move about in accordance with the various forces that act upon them.

Right so hopefully that wasn't too hard. What we are going to next is we are going to try to understand the input file that we have written a bit better.

The PLUMED input syntax

The input file that we issued in the last section looked something like this:

e2edist: DISTANCE ATOMS=2,253
PRINT ARG=e2edist STRIDE=1 FILE=COLVAR

What happens if we reverse the order of the two commands in the input file. In other words, what would have happened if we had run with an input file that looked like this:

PRINT ARG=e2edist STRIDE=1 FILE=COLVAR
e2edist: DISTANCE ATOMS=2,253

Run the input file above using the commands described in the previous section to find out.

If everything is working correctly the input above should crash with the following error message:

+++ Internal PLUMED error
+++ file Action.cpp, line 234
+++ message: ERROR in input to action PRINT with label @0 : cannot find action named e2edist (hint! the actions in this ActionSet are: )

Take a moment to read that error message and to try to think what it might mean before reading on.

To understand the error lets look at the correct input again:

e2edist: DISTANCE ATOMS=2,253
PRINT ARG=e2edist STRIDE=1 FILE=COLVAR

You should think of the PLUMED input syntax as a kind of scripting language with commands and variables. The first line in the above file thus tells PLUMED to calculate the distance between atoms 2 and 253 and to store the value of this distance in a variable called e2edist . The fact that this quantity is stored in a variable is important as it ensures that we can access this quantity when we come to issue commands later in the input file. So in the second line above we print a quantity. What quantity should be printed? e2edist - the distance between atom 2 and atom 253. Easy!

So why does the second input, that has the order of these two commands reversed, not work? Well the variable e2edist has to be defined before it can be used because in PLUMED the commands are executed in the same order as they are defined in the input file. We thus cannot do anything with e2edist (i.e. PRINT it) without first explaining how it is calculated.

This input demonstrates the key idea of the PLUMED syntax. Quantities calculated by commands such as DISTANCE are given labels (e2edist) so that they can be reused when performing other commands. This idea is discussed in more depth in the following video https://www.youtube.com/watch?v=PxJP16qNCYs.
If you understand this idea though you are 90% of the way to understanding how to used PLUMED. Well done.

Center of mass positions

When calculating many collective variables it is useful to not think in terms of calculating them directly based on the positions of a number of atoms. It is useful to instead think of them as being calculated from the position of one or more virtual atoms whose positions are generated based on the position of a collection of other atoms. For example you might want to calculate the distance between the center of masses of two molecules. In this case it is useful to calculate the two positions of the centers of mass first and to then calculate the distance between the centers of mass. The PLUMED input that you would use for such a calculation reflects this way of thinking. An example of a PLUMED input that can be used to perform this sort of calculation is shown below:

first: CENTER ATOMS=1,2,3,4,5,6
last: CENTER ATOMS=251-256

e2edist: DISTANCE ATOMS=2,253
comdist: DISTANCE ATOMS=first,last

PRINT ARG=e2edist,comdist STRIDE=1 FILE=COLVAR 

ENDPLUMED

Make a PLUMED input containing the above input and execute it on the trajectory that you downloaded at the start of the exercise by making use of the commands in section Introduction to the PLUMED input file

Before we turn to analyzing what is output from this calculation there are a few things to note about this input file. Firstly, I should describe what this file instructs PLUMED to do. It tells PLUMED to:

  1. calculate the position of the Virtual Atom 'first' as the CENTER of atoms from 1 to 6;
  2. calculate the position of the Virtual Atom 'last' as the CENTER of atoms from 251 to 256;
  3. calculate the distance between atoms 2 and 253 and saves it in 'e2edist';
  4. calculate the distance between the two atoms 'first' and 'last' and saves it in 'comdist';
  5. print the content of 'e2edist' and 'comdist' in the file COLVAR

Notice that in the input above we have used two different ways of writing the atoms used in the CENTER calculation:

  1. ATOMS=1,2,3,4,5,6 is the explicit list of the atoms we need
  2. ATOMS=251-256 is the range of atoms needed

Notice also that ranges of atoms can be defined with a stride which can also be negative as shown by the commands below, which are both equivalent:

  1. ATOMS=from,to:by (i.e.: 251-256:2)
  2. ATOMS=to,from:-by (i.e.: 256-251:-2)

Lets now return to the business of analyzing what was output by the calculation. Lets begin by looking at the contents of the COLVAR file that was output. When you do so you should find that the first few lines of this file read:

#! FIELDS time e2edist comdist
 0.000000 2.516315 2.464043

Notice that at variance with the file that was output in the previous section we now have three columns of numbers in the output file. Given the PRINT command that we used in the input to this calculation though this new behavior makes a lot of sense.

Lets now plot contents of the COLVAR file so we can compare the behavior of the distance between the two terminal carbons and the distance between the centers of the mass of the two terminal residues in this trajectory (these two distances are what the above input is calculating). To plot this data issue the following commands:

gnuplot
p 'COLVAR' u 1:2 w l, '' u 1:3 w l

Given what you observe for the behavior of these two distance what do you now expect to see in the trajectory? Let's look at the trajectory to see if we are right. To look at the trajectory issue the following commands:

vmd template.pdb trajectory-short.xyz 

Lets summarize what we have learned from these sections thus far. We have seen that:

  • PLUMED provides a simple syntax that can be used to calculate the DISTANCE between any pair of atoms in our system.
  • PLUMED also allows us to calculate the positions of virtual atom (e.g. CENTER) and that we can calculate the DISTANCE between these quantities.
  • Calculating these quantities is useful because it allows us to simplify the high-dimensional information contained in a trajectory.

Now, obviously, PLUMED can do much more than calculate the distances between pairs of atoms as we will start to see that in the following sections.

Calculating torsion angles

In the previous sections we have seen how we can use PLUMED to calculate distances and how by plotting these distances we can begin to simplify the high dimensional data contained in a trajectory. Obviously, calculating a DISTANCE is not always the best way to simplify the information contained in a trajectory and we often find we have to work with other more-complex quantities. PLUMED thus started as a library that was used to gather all the various implementations people had written for different collective variables (CVs) that people had used to "analyze" trajectories over the years (analyze is in inverted commas here because, as you will see if you continue to use PLUMED, we use CVs to do much more than simply analyze trajectories).
Now we will not have time to go over all the quantities that can be calculated in this tutorial. Once you understand the basic principles, however, you should be able to use the manual to work out how to calculate other quantities of interest. With this in mind then lets learn how to calculate a TORSION. As with DISTANCE the atoms that we specify in our TORSION command can be real or virtual. In the example below two real atoms and a virtual atom are used:

first: CENTER ATOMS=1-6
last: CENTER ATOMS=251-256
cvtor: TORSION ATOMS=first,102,138,last

PRINT ARG=cvtor STRIDE=1 FILE=COLVAR 

ENDPLUMED

Copy this input to a PLUMED input file and use it to analyze the trajectory you downloaded at the start of this exercise by using the commands described in section Introduction to the PLUMED input file then plot the CV output using gnuplot.

As you can hopefully see calculating TORSION values and other CVs is no more difficult than calculating DISTANCE values. In fact it is easier as generally when you calculate the torsion angles of a protein you often wish to calculate particular, named torsion angles (i.e. the \(\phi\) and \(\psi\) angles). The MOLINFO command makes it particularly easy to do this. For instance suppose that you want to calculate and print the \(\phi\) angle in the sixth residue of the protein and the \(\psi\) angle in the eighth residue of the protein. You can do so using the following input:

#SETTINGS MOLFILE=user-doc/tutorials/marvel-1/template.pdb
MOLINFO STRUCTURE=template.pdb
phi6: TORSION ATOMS=@phi-6
psi8: TORSION ATOMS=@psi-8
PRINT ARG=phi6,psi8 FILE=colvar

Copy this input to a PLUMED input file and use it to analyze the trajectory you downloaded at the start of this exercise by using the commands described in section Introduction to the PLUMED input file then plot the CV output using gnuplot. Notice that you will need the template.pdb file you downloaded at the start of this exercise in order for this to run.

An exercise with the radius of gyration

Lets now see if you can use everything you have learned to setup a PLUMED input file of your own. What I would like you to do is to write a PLUMED input file that measures the Radius of Gyration GYRATION for the configurations in each of the frames in the trajectory that you downloaded at the start of this exercise. This radius of gyration will be calculated using the positions of all the atoms in that trajectory.

NOTE: if what you need for one or more variables is a long list of atoms and not a virtual atom you can use the keyword GROUP. A GROUP can be defined using ATOMS in the same way we saw before, in addition it is also possible to define a GROUP by reading a GROMACS index file.

ca: GROUP ATOMS=9,16,31,55,69,90,102,114,124,138,160,174,194,208,224,238

Now 'ca' is not a virtual atom but a simple list of atoms.

Coordination numbers

In the previous sections we have learned how PLUMED can be used to calculate simple functions of atomic positions such as the DISTANCE between pairs of atoms. As discussed here (https://www.youtube.com/watch?v=iDvZmbWE5ps) many of the more complicated collective variables that we use to analyze simulations can be calculated by computing these simple quantities multiple times for different sets of atoms. That is to say we can calculate many more complicated collective variables using:

\[ s = \sum_i g[ f(\{\mathbf{X}_i\}) ] \]

Here \(g\) is a function of a scalar and \(f\) is a function that takes in the positions of a set of atoms \(\{\mathbf{X}\}\) and outputs a scalar. The sum then runs over the different sets of atoms from which the quantity \(f\) can be calculated. This is all rather abstract so lets make it more concrete by considering an example. Suppose that we want to calculate the coordination number of atom \(k\). What we need to do is:

  1. We need to calculate the distance between atom \(k\) and every atom in the system that is not atom \(k\). This will be the set of sets of atoms that we have to perform the sum above on. Furthermore, the function \(f\) in the above will be Pythagoras theorem, which is the function we use to calculate the distance between a pair of atoms.
  2. We need to transform the distances calculated by a switching function ( \(f\) in the above) that is equal to one if the distance is less than or equal to the typical length of a chemical bond and zero otherwise.

Lets thus use PLUMED to calculate the coordination number of atom 9. We can do this as follows:

d1: DISTANCES GROUPA=9 GROUPB=1-8,10-256 LESS_THAN={RATIONAL D_0=0.16 R_0=0.01 D_MAX=0.2}
PRINT ARG=d1.* FILE=colvar


Copy this input file to a PLUMED input file. Before using it to analyze the trajectory that you downloaded at the start of the exercise using the commands described in section Introduction to the PLUMED input file try to guess what value this coordination number will take. Hint: what element is atom 9?

Now see if you can adjust the above input to calculate the coordination number of atom 5. What is the coordination number of this atom and why does it take this value?

Multicolvar

In the previous section we exploited a feature of PLUMED known as multicolvar when calculating the coordination number. When using this feature we are not confined to simply calculating coordination numbers. For instance the input below allows us to calculate a number of distances and to then calculate the mean of the distribution of distances, the minimum distance in the distribution, the maximum distance in the distribution and the second moment of the distribution (the variance).

ca: GROUP ATOMS=9,16,31,55,69,90,102,114,124,138,160,174,194,208,224,238
dd: DISTANCES GROUP=ca MEAN MIN={BETA=50} MAX={BETA=0.02} MOMENTS=2

PRINT ARG=dd.mean,dd.min,dd.max,dd.moment-2 STRIDE=1 FILE=COLVAR 

ENDPLUMED

Try to copy this input now and to use it to analyze the trajectory you downloaded at the start of the exercise using the commands described in section Introduction to the PLUMED input file.
Multicolvar is not just for DISTANCES though. The infrastructure of multicolvar has been used to develop many PLUMED collective variables. One interesting example is the set of Secondary Structure CVs (ANTIBETARMSD, PARABETARMSD and ALPHARMSD). You can use the input below to calculate the degree of anti-beta secondary structure in each of the trajectory frames by copying this input to a PLUMED input file and by exploiting the commands to run driver that were described in section Introduction to the PLUMED input file.

#SETTINGS MOLFILE=user-doc/tutorials/marvel-1/template.pdb
MOLINFO STRUCTURE=template.pdb
abeta: ANTIBETARMSD RESIDUES=all TYPE=DRMSD LESS_THAN={RATIONAL R_0=0.08 NN=8 MM=12} STRANDS_CUTOFF=1

PRINT ARG=abeta.lessthan STRIDE=1 FILE=COLVAR 

ENDPLUMED

We can do a large number of other things with multicolvar. If you are interested this topic is described in more detail in the tutorial: Belfast tutorial: Steinhardt Parameters.

Understanding the need for ensemble averages

In the previous sections we have learned how we can use PLUMED to calculate collective variables from simulation trajectories and have seen how, by plotting how these collective variables change as a function of time, we can get a handle on what happens during the trajectory. Generally this level of analysis is not sufficient for us to publish a paper about the simulations we have run, however. In this section we are going to run some molecular dynamics simulations in order to understand why.

You are going to need to do the following set of things in order to do this exercise:

  1. Take the two files you downloaded at the start of this exercise that are called called in and input.xyz and place them in a directory.
  2. In the same directory write an input file for PLUMED called plumed.dat that calculates and prints the distance between atoms 2 and 3 to a file called colvar.
  3. Run simplemd by issuing the command:
plumed simplemd < in
  1. Open the file called input.xyz and modify the z-coordinate of the 1st atom. It should currently be equal to zero. Set it equal to 0.1.
  2. Run simplemd again using the command above.
  3. Plot the colvar files output during the two calculations using:
gnuplot
p 'colvar' w l, 'bck.0.colvar' w l

If you have done everything correctly you should see that the values of the distance in the early parts of the simulation are similar but that the values of the distance in the two simulations are very different by the end of the simulations.

Allow me to explain what we have just done. simplemd is a tool for running molecular dynamics using the Lennard-Jones potential. We have thus just run two very similar molecular dynamics calculations. The only difference between these two calculations was in the z-coordinate of the first atom in the input structure. The initial velocities of all the atoms were the same and the initial positions of all other atoms were identical. In spite of these similarities, however, the trajectories that we obtain in the two calculations are very different. This is important as it tells us that the time series of CV values that we get from a single MD simulation is (in and of itself) not particularly valuable as we would have got a completely different time series of values if we had run the calculation with only a slightly different input structure. We should, therefore, think of the trajectory output from a molecular dynamics simulations as a time series of random variables and analyze it accordingly.

The justification for thinking of a trajectory as a time series of random variables is based on statistical mechanics which tells us that a system with a constant number of atoms and a constant volume evolving at a fixed temperature has a probability:

\[ P(q)\propto e^{-\frac{U(q)}{k_B T}} \]


of being in a microstate \(q\) with internal energy \(U(q)\) as discussed and explained in the videos that can be found on the following pages:

Now, and as already explained above, given that each microstate is sampled with this particular probability it makes sense to think of the microstates, and by extension the collective variables values that we calculate from these microstates, as random variables. We can thus define the ensemble average for a particular collective variable, \(A\), using:

\[ \langle A \rangle = \int A(q) P(q) \textrm{d}q \]

where the integral here runs over all possible microstates, \(A(q)\) is the function that allows us to calculate the value of the CV from the positions of the atoms in microstate \(q\) and where \(P(q)\) is the "probability" (as it appears in an integral it is, strictly speaking, a probability density) of being in microstate \(q\) which was introduced above. For more information on this business of random variables and ensemble averages (or expectations) check out the videos that can be found on the following pages:

Instead of using the formula for the ensemble average given above we can estimate an ensemble averages by taking a time series of random variables (like our trajectory) adding all the values of the random variable together (in our case the CV values in each of the trajectory frames) and dividing by the number of random variables that were added together. This works because of results known as the law of large numbers and the central limit theorem, which you can find videos on here:

Alternatively, we can justify this way of analyzing our trajectory by thinking of the set of sampled frames as random variables generated from Markov chain.
These mathematical objects are discussed here:

Regardless, however, and as we will see in the following section we can estimate ensemble averages of collective variables using:

\[ \langle A \rangle \approx \frac{1}{T} \sum_{t} A[q(t)] \]

where the sum runs over the set of \(T\) microstates, \(q(t)\), in our trajectory and where \(A[q(t)]\) is the function that is used to calculate the value of the collective variable from the positions of the atoms in the microstate. When we do so we find that the values of the ensemble averages from different trajectories are reasonably consistent and certainly much more consistent than the set of instantaneous CV values.

Calculating ensemble averages using PLUMED

Repeat the steps from the previous section that were used to run the two MD calculations with slightly different input configurations. This time, however, your PLUMED input should look like this:

d1: DISTANCE ATOMS=2,3
d1a: AVERAGE ARG=d1 STRIDE=10
PRINT ARG=d1,d1a FILE=colvar STRIDE=10

If you now plot the values of the CV output from these two calculations together with the estimates of the ensemble averages using something like the commands below:

gnuplot
p 'colvar' u 1:2 w l, '' u 1:3 w l, 'bck.0.colvar' u 1:2 w l, '' u 1:3 w l

You should see that, although the instantaneous values of the CVs differ greatly in the two simulations, the average values of the CVs are relatively similar for both simulations.
Lets discuss the AVERAGE command that we have added here and what this does. In essence it accumulates the sum of all the distances calculated by the DISTANCE command labelled d1. When it comes time to output this accumulated average (every 10 trajectory steps) this accumulated sum is divided by the number of trajectory frames that have been summed in calculated the current sum. In other words, the average allows us to compute the following quantity:

\[ a = \frac{1}{N} \sum_{i=1}^N d(t_i) \]

where \(d(t_i)\) is the value of the distance at time \(t_i\). Now you may reasonably ask: what the keyword STRIDE=10 is doing in this command?
Well this is telling PLUMED to only add distances into the accumulated average every 10 steps. In other words, when calculating the average we are disregarding the value of the distance in frames 1,2,3,4,5,6,7,8 and 9. Why are we disregarding this information though? Well because there are correlations between the values the CV takes in two adjacent trajectory frames. Ideally, we want the values of the distance we are averaging to be independent and identical random variables.
Before we move on to calculating histograms and free energy surfaces I have a little challenge for you. If you are able to answer the following question then you really understand what the AVERAGE command is doing. Try running the calculation above using the following input:

d1: DISTANCE ATOMS=2,3
d1a: AVERAGE ARG=d1 STRIDE=10 CLEAR=10
PRINT ARG=d1,d1a FILE=colvar STRIDE=10

Based on what you see when you plot the colvar file and the information on the page about the AVERAGE command what is the CLEAR=10 keyword telling PLUMED to do?

Calculating histograms

Most of the time, we are not really interested in calculating ensemble averages for particular collective variables. What we would really like is the probability that the collective variable takes a particular value or set of values. In other words, and as discussed in the following video, we would like the probability as a function of collective variable value:

We can calculate approximate histograms like these using PLUMED. Furthermore, and as discussed in the video below, when we do this what we are really doing is we are calculating multiple ensemble averages at once:

To calculate these ensemble averages we must make use of the HISTOGRAM command. An input file for a calculation of this sort is provided below. Use this input now and the input files for simple MD from the previous two sections to calculate the histogram of DISTANCE values that are explored in these trajectories.

d1: DISTANCE ATOMS=2,3
histogram: HISTOGRAM ARG=d1 BANDWIDTH=0.05 GRID_MIN=0 GRID_MAX=4.0 GRID_BIN=200 STRIDE=10
DUMPGRID GRID=histogram FILE=histo STRIDE=25000

You can plot the histogram output from this simulation using:

gnuplot
p 'histo' w l

You should see that there is a large peak in the histogram at around 2.0 indicating that this is the value of the distance that the atoms were most likely to have during the course of the simulation. N.B. The histogram is unlikely to be converged with a trajectory this short.

Lets now look at the syntax of the HISTOGRAM command. The first thing to note is the similarities between what is done with this command and what is done with the AVERAGE command. Once again we have a STRIDE keyword for HISTOGRAM (and a CLEAR keyword for that matter) that tells us how often data should be added to to the accumulated averages. Furthermore, in both these commands we have an additional instruction with its own STRIDE keyword (PRINT for AVERAGE and DUMPGRID for HISTOGRAM) that tells us how frequently the accumulated averages should be output to human readable files.

A substantial difference between these two input files is the object the label histogram refers to. In all previous examples the label of an action has referred to a scalar quantity that was calculated by that action. In the example above, however, the label histogram refers to the function accumulated on a grid that is evaluated by the HISTOGRAM command. This is why we cannot print this quantity using PRINT - it is not a simple scalar valued function anymore. As you become more familiar with PLUMED you will find that these labels can refer to a range of different types of mathematical object.
Lets now see if we can bring together everything we have learned in this tutorial in order to analyze the protein trajectory that was downloaded at the start of the exercise.

A histogram for the protein trajectory

We are going to calculate the HISTOGRAM from our protein trajectory as a function of two different collective variables: ANTIBETARMSD and the average distance between the ca atoms of our protein backbone. The input that allows us to calculate perform this analysis is shown below:

#SETTINGS MOLFILE=user-doc/tutorials/marvel-1/template.pdb
# Read in protein structure template
MOLINFO STRUCTURE=template.pdb
# Calculate collective variables
abeta: ANTIBETARMSD RESIDUES=all TYPE=DRMSD LESS_THAN={RATIONAL R_0=0.08 NN=8 MM=12} STRANDS_CUTOFF=1
ca: GROUP ATOMS=9,16,31,55,69,90,102,114,124,138,160,174,194,208,224,238
DISTANCES GROUP=ca MEAN MIN={BETA=50} MAX={BETA=0.02} MOMENTS=2 LABEL=dd
# Print instaneous values of collective variables
PRINT ARG=abeta.lessthan,dd.mean,dd.min,dd.max,dd.moment-2 STRIDE=1 FILE=COLVAR 
# Accumulate histogram of collective variables
hh: HISTOGRAM ARG=abeta.lessthan,dd.mean KERNEL=DISCRETE GRID_MIN=0,0.8 GRID_MAX=4,1.2 GRID_BIN=40,40
# Output histogram - N.B. when we are running with driver we can not provide a STRIDE.  
# The histogram will then only be output once all the trajectory frames have been read in and analyzed
DUMPGRID GRID=hh FILE=histo

Try running the input above on the trajectory that you downloaded at the start of this exercise by using the commands detailed in section Introduction to the PLUMED input file. You can plot the two dimensional histogram output using the following commands:

gnuplot
set pm3d map
sp 'histo' w pm3d

If you do so though, you will probably find that the structure in the histogram, \(H(s)\), is a bit difficult to visualize because the probability changes from point to point by multiple orders of magnitude. This is why we often convert the histogram, \(H(s)\), to a free energy surface, \(F(s)\), using:

\[ F(s) = -k_B T \ln H(s) \]

If you want to use PLUMED to output the free energy rather than the histogram you need to use the CONVERT_TO_FES command as shown below:

#SETTINGS MOLFILE=user-doc/tutorials/marvel-1/template.pdb
# Read in protein structure template
MOLINFO STRUCTURE=template.pdb
# Calculate collective variables
abeta: ANTIBETARMSD RESIDUES=all TYPE=DRMSD LESS_THAN={RATIONAL R_0=0.08 NN=8 MM=12} STRANDS_CUTOFF=1
ca: GROUP ATOMS=9,16,31,55,69,90,102,114,124,138,160,174,194,208,224,238
DISTANCES GROUP=ca MEAN MIN={BETA=50} MAX={BETA=0.02} MOMENTS=2 LABEL=dd
# Print instaneous values of collective variables
PRINT ARG=abeta.lessthan,dd.mean,dd.min,dd.max,dd.moment-2 STRIDE=1 FILE=COLVAR
# Accumulate histogram of collective variables
hh: HISTOGRAM ARG=abeta.lessthan,dd.mean KERNEL=DISCRETE GRID_MIN=0,0.8 GRID_MAX=4,1.2 GRID_BIN=40,40
fes: CONVERT_TO_FES GRID=hh TEMP=300
# Output free energy - N.B. when we are running with driver we can not provide a STRIDE.
# The histogram will then only be output once all the trajectory frames have been read in and analyzed
DUMPGRID GRID=fes FILE=fes.dat

Notice though that even when we do this complicated looking calculation we are still, underneath it all, calculating functions of a large number of ensemble averages.

Conclusions and further work

If you have worked through all of this tutorial make sure that you have understood it by ensuring that you understand what the list of learning outcomes in section Learning Outcomes means and that you can use PLUMED to perform all these tasks. In terms of further work you should investigate issues related to the convergence of estimates of ensemble averages such as block averaging. You might like to investigate how long your simulations have to be in order to obtain reliable estimates of the ensemble average for a collective variable and reliable estimates for the free energy as a function of a collective variable. Alternatively, you might like to explore other collective variables that could be used to analyze the protein trajectory that you have worked on in this tutorial.