Belfast tutorial: Adaptive variables II

The aim of this tutorial is to consolidate the material that was covered during Belfast tutorial: Analyzing CVs and Belfast tutorial: Adaptive variables I on analysing trajectories using collective variables and path collective variables. We will then build on this material by showing how you can use the multidimensional scaling algorithm to automate the process of finding collective variables.

Once this tutorial is completed students will:

- Know how to load colvar data into the GISMO plugin
- Know how to run the multidimensional scaling algorithms on a trajectory
- Be able to explain how we can automate the process of finding collective variables by seeking out an isometry between a high-dimensional and low-dimensional space

The tarball for this project contains the following files:

- trajectory-short.xyz : a (short) trajectory for a 16 residue protein in xyz format. All calculations with plumed driver use this trajectory.
- trajectory-short.pdb : the same trajectory in pdb format, this can be loaded with VMD
- template.pdb : a single frame from the trajectory that can be used in conjuction with the MOLINFO command

The aim of this tutorial is to understand the data contained in the trajectory called trajectory-short.pdb. This file contains some frames from a simulation from a 16 residue protein. As a start point then let load this trajectory with vmd and have a look at it. Type the following command into the command line:

vmd trajectory-short.pdb

Look at it with the various representations that vmd offers. If you at are at the plumed tutorial try typing the letter m on the keyboard - we have made the new cartoon representation will update automatically for each frame of the trajectory - cool huh! What are your impressions about this trajectory based on looking at it with VMD? How many basins in the free energy landscape is this trajectory sampling from? What can we tell from looking at this trajectory that we could perhaps put in a paper?

If your answers to the questions at the end of the above paragraph are I don't know that is good. We can tell very little by just looking at a trajectory. In fact the whole point of today has been to find ways of analyzing trajectories precisely so that we are not put in this position of staring at trajetories mystified!

Right so lets come up with some CVs to analyse this trajectory with. As some of you may know we can understand the conformation of proteins by looking at the Ramachandran angles. For those of you who don't know here is a Wikkepedia article:

http://en.wikipedia.org/wiki/Ramachandran_plot

Our protein has 32 ramachandran angles. We'll come back to that. For the time being pick out a couple that you think might be useful and construct a plumed input to calculate them and print them to a file. You will need to use the TORSION and PRINT commands in order to do this. Once you have created your plumed input use driver to calculate the torsional angles using the following command:

plumed driver --plumed plumed.dat --ixyz trajectory.xyz

If you have done this correctly you should have an output file containing your torsional angles. We can use vmd+GISMO to visualise the relationship between the ramachandran angles and the atomic configurations. To do this first load the trajectory in VMD:

vmd trajectory-short.pdb

Then click on Extensions>Analysis>GISMO. A new window should open in this window click on File>Load colvars. You will be asked to select a colvar file. Select the file that was output by the plumed calculation above. Once the file is loaded you should be able to select the labels that you gave to the Ramachandran angles you calculated with plumed. If you do so you will see that this data is plotted in the GISMO window so that you can interact with it and the trajectory.

What can you conclude from this exercise. Do the CV values of the various frames appear in clusters in the plane? Do points in different clusters correspond to structures that look the same or different? Are there similar looking structures clustered together or are they always far apart? What can we conclude about the various basins in the free energy landscape that have been explored in this trajectory? How many are there? Would your estimate be the same if you tried the above estimate with a different pair of ramachandran angles?

What we have done for most of today is seek out a function that takes as input the position of all the atoms in the system - a \(3N\) dimensional vector, where \(N\) is the number of atoms. This function then outputs a single number - the value of the collective variable - that tells us where in a low dimensional space we should project that configuration. Problems can arise because this collective-variable function is many-to-one. As you have hopefully seen in the previous exercise markedly different confifgurations of the protein can actually have quite similar values of a particular ramachandran angle.

We are going to spend the rest of this session introducing an alternative approach to this bussiness of finding collective variables. In this alternative approach we are going to stop trying to seek out a function that can take any configuration of the atoms (any \(3N\)-dimensional vector) and find it's low dimensional proejection on the collective variable axis. Instead we are going to take a set of configurations of the atoms (a set of \(3N\)-dimensional vectors of atom positions) and try to find a sensible set of projections for these configurations. We already touched on this idea earlier when we looked at paths. Our assumption, when we introduced this idea, was that we could find an ordered set of high-dimensional configurations that represented the transtion pathway the system takes as it crossed a barrier and changed between two particularly interesting configurations. Lets say we have a path defined by four reference configurations - this implies that to travel between the configurations at the start and the end of this path you have to pass through configuration 1, then configuration 2, then configuration 3 and then configuration 4. This ordering means that the numbers 1 through 4 constitute sensible projections of these high-dimensional configurations. The numbers 1 through 4 all lie on a single cartesian axis - a low-dimensional space.

The problem when it comes to applying this idea to the data that we have in the trajectory-short trajectory is that we have no information on the ``order" of these points. We have not been told that this trajectory represents the transition between two interesting points in phase space and thus we cannot apply the logic of paths. Hence, to seek out a low dimensional representation we are going to try and find a representation of this data we are going to seek out an isometry between the space containing the \(3N\)-dimensional vectors of atom positions and some lower-dimensional space. This idea is explained in more detail in the following video .

Let's now generate our isometric embedding. You will need to create a plumed input file that contains the following instructions:

CLASSICAL_MDS ... ATOMS=1-256 METRIC=OPTIMAL-FAST USE_ALL_DATA NLOW_DIM=2 OUTPUT_FILE=rmsd-embed ... CLASSICAL_MDS

You should then run this calculation using the following command:

plumed driver --ixyz trajectory-short.xyz --plumed plumed.dat

This should generate an output file called rmsd-embed. You should now be able to use VMD+GISMO to visualise this output. Do the CV values of the various frames appear in clusters in the plane? Do points in different clusters correspond to structures that look the same or different? Are there similar looking structures clustered together or are they always far apart? What can we conclude about the various basins in the free energy landscape that have been explored in this trajectory? How many are there? Do you think this gives you a fuller picture of the trajectory than the ones you obtained by considering ramachandran angles?

As discussed in the previous section this approach to trajectory analysis works by calcalating distances between pairs of atomic configurations. Projections corresponding to these configurations are then generated in the low dimensional space in a way that tries to preserve these pairwise distances. There are, however, an infinite number of ways of calculating the distance between two high-dimensional configurations. In the previous section we used the RMSD distance but you could equally use the DRMSD distance. You could even try calculating a large number of collective variables for each of the high-dimensional points and seeing how much these all changed. You can use these different types of distances with the CLASSICAL_MDS action that was introduced in the previous section. If you have time less at the end of the session read the manual for the CLASSICAL_MDS action and see if you can calculate an MDS projection using an alternative defintion of the distances between configurations. Some suggestions to try in order of increasing difficulty: DRMSD, how much the full set of 32 ramachandran angles change and the change in the contact map

There is a growing community of people using these ideas to analyse trajectory data. Some start points for investigating their work in more detail are: