PLUMED Masterclass 21.1: PLUMED syntax and analysis
Authors
Max Bonomi
Date
January 18, 2021

Aims

The aim of this Masterclass is to introduce users to the PLUMED syntax and to illustrate how PLUMED can be used to analyze pre-existing molecular dynamics trajectories.

Objectives

Once this Masterclass is completed, users will be able to:

  • Write a simple PLUMED input file and use it with the driver utility to analyze a trajectory.
  • Use advanced selection tools with MOLINFO.
  • Define and use virtual atoms, such as CENTER.
  • Deal with discontinuities in the trajectory due to periodic boundary conditions.
  • Use RMSD to measure protein conformational changes.
  • Align the system to a template with FIT_TO_TEMPLATE.

Setting up the software

Installation

We will use conda to install all the software needed in this Masterclass:

First, make sure conda is installed by typing:

conda

If the command is not found, please refer to these instructions to install conda on your machine. Alternatively, if you use the Homebrew package manager, you can install conda with:

brew install --cask anaconda
# add this line to your .bashrc
export PATH="/usr/local/anaconda3/bin:$PATH"

Now we can create a conda environment for the PLUMED Masterclass:

conda create --name plumed-masterclass 

and activate it with:

conda activate plumed-masterclass

Finally, we can proceed with the installation of the required software:

conda install -c conda-forge plumed py-plumed numpy pandas matplotlib notebook mdtraj mdanalysis git
Note
Do not forget to activate the plumed-masterclass environment every time you open a new terminal/shell.

PLUMED overview

PLUMED is a library that can be incorporated into many molecular dynamics (MD) codes by adding a relatively simple and well documented interface. Once it is incorporated you can use PLUMED to perform on-the-fly a variety of different analyses and to bias the sampling in MD simulations. Additionally, PLUMED can be used as a standalone code for analyzing trajectories. If you want to use the code in this way, you can run the PLUMED executable by issuing the command:

plumed <instructions>

Let's start by getting a feel for the range of calculations that PLUMED can do. Issue the following command now:

plumed --help 

The output of this command is the list of Command Line Tools included in PLUMED. Among these, there are commands that allow you to patch an MD code, postprocess metadynamics simulations, and build the manual. In this class we will use PLUMED to analyze trajectories. In order to do so, we will learn how to use the driver command line tool. Let's look at the options of PLUMED driver by issuing the following command:

plumed driver --help

As you can see we can do a number of things with driver. For all of these options, however, we are going to need to write a PLUMED input file.

Resources

The data needed to complete the exercises of this Masterclass can be found on GitHub. You can clone this repository locally on your machine using the following command:

git clone https://github.com/plumed/masterclass-21-1.git

The MD trajectory that we will analyze can be found in the folder called data:

  • 5-HT1B.pdb: reference conformation of the 5-HT1B receptor with the serotonin ligand.
  • 5-HT1B.xtc: trajectory of the 5-HT1B receptor with the serotonin ligand.

A Jupyter notebook to be used as template for the analysis of the PLUMED output can be found in the folder called notebooks. In this directory, you will also find the complete solution to all exercises in the notebook solution.ipynb.

Please note that:

  • This is a simulation of a membrane receptor, but water, lipids, and ions have been stripped out of the trajectory.
  • This is the raw trajectory generated by GROMACS. Therefore it is discontinuous due to periodic boundaries conditions (PBCs).
  • The students are invited to solve the exercises by themselves after completing the PLUMED input file templates provided below. In case of problems, students can rely on the solution notebook provided in the GitHub repository.

To keep things clean, it is recommended to run each exercise in a separate sub-directory (i.e. Exercise-1, Exercise-2, ...), which you can create inside the root directory masterclass-21-1.

Note
All the exercises have been tested with PLUMED version 2.7.0.

A sample PLUMED input file

The main goal of PLUMED is to compute collective variables (or CVs), which are complex descriptors of the system that can be used to describe the conformational change of a protein or a chemical reaction. This can be done either on-the-fly during a molecular dynamics simulations or a posteriori on a pre-calculated trajectory using PLUMED as a post-processing tool. In both cases, you should create an input file with a specific PLUMED syntax. Have a look at the sample input file below:

Click on the labels of the actions for more information on what each action computes
tested on v2.9
# Compute distance between atoms 1 and 10.
# Atoms are ordered as in the trajectory files and their numbering starts from 1.
# The distance is called "d" for future reference.
d: DISTANCE 
ATOMS
the pair of atom that we are calculating the distance between.
=1,10 # Compute the torsional angle between atoms 1, 10, 20, and 30. # The angle is called "phi1" for future reference. phi1: TORSION
ATOMS
the four atoms involved in the torsional angle
=1,10,20,30 # The same CV defined above can be split into multiple lines # The angle is called "phi2" for future reference. phi2: TORSION ...
ATOMS
the four atoms involved in the torsional angle
=1,10,20,30 ... # Print "d" on a file named "COLVAR1" every 10 steps. PRINT
ARG
the input for this action is the scalar output from one or more other actions.
=d
FILE
the name of the file on which to output these quantities
=COLVAR1
STRIDE
compulsory keyword ( default=1 ) the frequency with which the quantities of interest should be output
=10 # Print "phi1" and "phi2" on another file named "COLVAR2" every 100 steps. PRINT
ARG
the input for this action is the scalar output from one or more other actions.
=phi1,phi2
FILE
the name of the file on which to output these quantities
=COLVAR2
STRIDE
compulsory keyword ( default=1 ) the frequency with which the quantities of interest should be output
=100

In the input file above, each line defines a so-called action. In this simple example, actions are used to compute a distance, a dihedral angle, or print some values on a file. Each action supports a number of keywords, whose value is specified. Action names are highlighted in green and, by clicking on them, you can go to the corresponding page in the manual that contains a detailed description of each keyword. Actions that support the keyword STRIDE are those that determine how frequently things are done. Notice that the default value for STRIDE is always 1. In the example above, omitting STRIDE keywords the corresponding COLVAR files would have been written for every frame of the analyzed trajectory. All the other actions in the example above, i.e. DISTANCE and TORSION, do not support the STRIDE keyword and are only calculated when requested. That is, d will be computed every 10 frames, and phi1 and phi2 every 100 frames.

Variables should be given a name (in the example above, d, phi1, and phi2), which is then used to refer to these variables in subsequent actions, such as the PRINT command. A lists of atoms should be provided as comma separated numbers, with no space.

You can find more information on the PLUMED syntax at Getting Started page of the manual. The complete documentation for all the supported collective variables can be found at the Collective Variables page.

The PLUMED internal units

By default the PLUMED inputs and outputs quantities in the following units:

  • Length: nanometers
  • Energy: kJ/mol
  • Time: picoseconds
  • Mass: amu
  • Charge: e

If you want to change these units, you can do this using the UNITS keyword.

Exercises

Exercise 1: Computing and printing simple collective variables

In this exercise, we will learn how to compute and print collective variables on a pre-calculated MD trajectory. To analyze the trajectory provided here, we will:

  • create a PLUMED input file with a text editor (typically called plumed.dat);
  • run the PLUMED driver utility;
  • visualize the output with the aid of a Jupyter notebook.

Notice that you can also visualize trajectories with VMD (always a good idea!). For example, the trajectory 5-HT1B.xtc can be visualized with the command:

vmd 5-HT1B.pdb 5-HT1B.xtc 

When you try this, you will notice that this trajectory is discontinuous due to PBCs. We need to keep this in mind in our analysis.

Let's now prepare a PLUMED input file to calculate:

  • the gyration radius of the CA protein atoms (GYRATION) of the first 40 N-terminal residues;
  • the distance (DISTANCE) between CA atoms of residues 1 and 40.

The first 40 residues of the 5-HT1B receptor correspond to an extracellular flexible loop of which we want to characterize the dynamics during our MD simulation. Below you can find a sample plumed.dat file that you can use as a template. Whenever you see an highlighted FILL string, be careful because this is a string that you must replace. To retrieve the atom indexes that you need to include in the input file, you can have a look at 5-HT1B.pdb. The atoms indexes are contained in the second column. Keep in mind that numbering scheme in PLUMED starts from 1.

Click on the labels of the actions for more information on what each action computes
tested on v2.9
# Compute gyration radius on CA atoms of the first 40 N-terminal residues:
r: GYRATION 
ATOMS
the group of atoms that you are calculating the Gyration Tensor for.
=__FILL__ # Compute distance between CA atoms of residues 1 and 40 d: DISTANCE
ATOMS
the pair of atom that we are calculating the distance between.
=__FILL__ # Print the two collective variables on COLVAR file every step PRINT
ARG
the input for this action is the scalar output from one or more other actions.
=__FILL__
FILE
the name of the file on which to output these quantities
=COLVAR
STRIDE
compulsory keyword ( default=1 ) the frequency with which the quantities of interest should be output
=__FILL__

Once your plumed.dat file is complete, you can run the PLUMED driver as follows:

plumed driver --plumed plumed.dat --mf_xtc 5-HT1B.xtc

Scroll in your terminal to read the PLUMED log. As you can see, PLUMED gives a lot of feedback about the input that it is reading and the actions that it will execute. Please take your time to inspect the log file and check if PLUMED is actually doing what you intend to do.

The command above will create a COLVAR file. The first lines should be identical to these ones:

#! FIELDS time r d
 0.000000 1.271069 2.782998
 1.000000 1.263125 2.388697
 2.000000 1.348965 2.606062
 3.000000 1.291011 2.204363
 4.000000 1.280714 2.411836
 5.000000 1.257692 2.334839

Notice that the first line informs you about the content of each column.

In case you obtain different numbers, check your input, you might have made some mistakes!

This COLVAR file can be analyzed using the Jupyter notebook plumed-pandas.ipynb provided in the folder notebooks. You can use the following command to open the notebook:

jupyter notebook plumed-pandas.ipynb

This notebook allows you to import the COLVAR file produced by PLUMED and to generate the desired figures using the matplotlib library:

# import python modules
import plumed
import matplotlib.pyplot as plt
# import COLVAR file as pandas dataset
# set the right path to the COLVAR file
data=plumed.read_as_pandas("../Exercise-1/COLVAR")
# print pandas dataset
data
# plot time serie of gyration radius (r) and distance (d)
plt.plot(data.time,data.r, label="gyration radius")
plt.plot(data.time,data.d, label="distance")
# x-y axis labels
plt.xlabel("MD frame")
plt.ylabel("r/d [nm]")
plt.legend()
# plot gyration radius vs distance
plt.plot(data.r,data.d, 'o')
# x-y axis labels
plt.xlabel("gyration radius [nm]")
plt.ylabel("distance [nm]")

What can you deduce about the dynamics of this region of the 5-HT1B receptors? Are the two CVs both providing useful information or are they quite correlated? To answer to the latter question, you can inspect the plot of one CV against the other.

Exercise 2: Mastering advanced selection tools

PLUMED provides some shortcuts to select atoms with specific properties. To use this feature, you should specify the MOLINFO action along with a reference PDB file. This command is used to provide information on the molecules that are present in your system.

Let's try to use this functionality to calculate the backbone dihedral angle \( \phi \) (phi) of residue 2 of the 5-HT1B receptor. This CV is defined by the action TORSION and a set of 4 atoms. For residue i, the dihedral \( \phi \) is defined by these atoms: C(i-1),N(i),CA(i),C(i) (see Fig. masterclass-21-1-dih-fig).

Definition of backbone dihedral angles, with dihedral phi highlighted in red.

After consulting the manual and inspecting 5-HT1B.pdb, let's define the dihedral angle \( \phi \) of residue 2 in two different ways:

  1. specifying an explicit list of 4 atoms (t1).
  2. using the MOLINFO shortcut to select quadruplets for dihedral angles (t2);
Click on the labels of the actions for more information on what each action computes
tested on v2.9
# Activate MOLINFO functionalities
MOLINFO 
STRUCTURE
compulsory keyword a file in pdb format containing a reference structure.
=__FILL__ # Define the dihedral phi of residue 2 as an explicit list of 4 atoms t1: TORSION
ATOMS
the four atoms involved in the torsional angle
=__FILL__ # Define the same dihedral using MOLINFO shortcuts t2: TORSION
ATOMS
the four atoms involved in the torsional angle
=__FILL__ # Print the two collective variables on COLVAR file every step PRINT
ARG
the input for this action is the scalar output from one or more other actions.
=__FILL__
FILE
the name of the file on which to output these quantities
=COLVAR
STRIDE
compulsory keyword ( default=1 ) the frequency with which the quantities of interest should be output
=__FILL__

After completing the PLUMED input file above, let's use it to analyze the trajectory 5-HT1B.xtc using the driver tool:

plumed driver --plumed plumed.dat --mf_xtc 5-HT1B.xtc

You can modify the Jupyter notebook used in Exercise 1: Computing and printing simple collective variables to visualize the trajectory of the two CVs calculated with the PLUMED input file above and written in the COLVAR file. If you executed this exercise correctly, these two trajectories should be identical.

As a second example of MOLINFO capabilities, we will use the advanced atom selection tools provided by the MDAnalysis and MDTraj libraries. Let's redo Exercise 1: Computing and printing simple collective variables, this time using MOLINFO shortcuts to select CA atoms. You need to complete the following template PLUMED input file using the appropriate selection syntax for the corresponding library used. Please consult the MDAnalysis and MDTraj documentations if you are not familiar with these libraries and their syntax.

Click on the labels of the actions for more information on what each action computes
tested on v2.9
# Activate MOLINFO functionalities
MOLINFO 
STRUCTURE
compulsory keyword a file in pdb format containing a reference structure.
=__FILL__ # To use MDAnalysis selection tools: r1: GYRATION
ATOMS
the group of atoms that you are calculating the Gyration Tensor for.
={@mda:{resid __FILL__ and name __FILL__}} d1: DISTANCE
ATOMS
the pair of atom that we are calculating the distance between.
={@mda:{resid __FILL__ and name __FILL__}} # To use MDTraj selection tools: r2: GYRATION
ATOMS
the group of atoms that you are calculating the Gyration Tensor for.
={@mdt:{resid __FILL__ and name __FILL__}} d2: DISTANCE
ATOMS
the pair of atom that we are calculating the distance between.
={@mdt:{resid __FILL__ and name __FILL__}} # Print all the collective variables on COLVAR file every step PRINT
ARG
the input for this action is the scalar output from one or more other actions.
=__FILL__
FILE
the name of the file on which to output these quantities
=COLVAR
STRIDE
compulsory keyword ( default=1 ) the frequency with which the quantities of interest should be output
=__FILL__

Now, you can compare the COLVAR file obtained with the one of Exercise 1: Computing and printing simple collective variables: they should be identical!

Exercise 3: Using virtual atoms

Sometimes, when calculating a CV, you may not want to use the positions of a number of atoms directly. Instead you may want to define a virtual atom whose position is generated based on the positions of a collection of other atoms. For example you might want to use the center of mass (COM) or the geometric center (CENTER) of a group of atoms.

In this exercise, you will learn how to specify virtual atoms and later use them to define a CV. The objective is to calculate the distances between the geometric center of the serotonin ligand (indicated as residue LIG in 5-HT1B.pdb) and the geometric centers of the two glycans located at position N24 and N32. Glycans are carbohydrate-based polymers that are sometimes linked to certain protein aminoacids. If you examine 5-HT1B.pdb, you will find the two glycans defined after the end of the protein, i.e. after residue SER-390. These two glycans have different length:

  • the glycan attached at position N24 ranges from residue BGLC-1 to AFUC-9
  • the glycan attached at position N32 ranges from residue BGLC-1 to AFUC-10

Let's complete the PLUMED input file below. You can use the advanced selection tools learned in Exercise 2: Mastering advanced selection tools to specify the atoms belonging to the ligand and to the two glycans:

Click on the labels of the actions for more information on what each action computes
tested on v2.9
# Geometric center of the ligand
lig: CENTER 
ATOMS
the list of atoms which are involved the virtual atom's definition.
=__FILL__ # Geometric center of the first glycan g1: CENTER
ATOMS
the list of atoms which are involved the virtual atom's definition.
=__FILL__ # Geometric center of the second glycan g2: CENTER
ATOMS
the list of atoms which are involved the virtual atom's definition.
=__FILL__ # Distance between ligand and first glycan d1: DISTANCE
ATOMS
the pair of atom that we are calculating the distance between.
=__FILL__ # Distance between ligand and second glycan d2: DISTANCE
ATOMS
the pair of atom that we are calculating the distance between.
=__FILL__ # Print the two distances on COLVAR file every step PRINT
ARG
the input for this action is the scalar output from one or more other actions.
=__FILL__
FILE
the name of the file on which to output these quantities
=COLVAR
STRIDE
compulsory keyword ( default=1 ) the frequency with which the quantities of interest should be output
=__FILL__

Once you have prepared a PLUMED input file containing the above instructions, you can execute it on the trajectory 5-HT1B.xtc by making use of the following command:

plumed driver --mf_xtc 5-HT1B.xtc --plumed plumed.dat

Let's now analyze the output of the calculation by plotting the time series of the two CVs. Can you say if the ligand is overall staying closer to the first or second glycan?

Exercise 4: Fixing PBCs discontinuities

As mentioned above, 5-HT1B.xtc is the raw trajectory generated by the GROMACS MD code. Therefore, it typically presents discontinuities due to PBCs. Many of the CVs used so far, such as CENTER or DISTANCE, take care of these discontinuities automatically. However, other CVs need a special command, called WHOLEMOLECULES, to fix PBCs discontinuities before the calculation of the CV. In this exercise, you will learn how to use this action.

We have seen that the first 40 N-terminal residues of the 5-HT1B receptor are quite flexibile. In this exercise, we want to estimate the secondary structure content (alpha-helix and beta-sheet) of this fragment during the course of the MD simulations. In order to do so, we can use the following 3 CVs:

  • ALPHARMSD to measure the alpha-helical content of a protein structure.
  • PARABETARMSD to measure the parallel beta-sheet content.
  • ANTIBETARMSD to measure the antiparallel beta-sheet content.

Let's first try to complete the following PLUMED input file:

Click on the labels of the actions for more information on what each action computes
tested on v2.9
# Activate MOLINFO functionalities
MOLINFO __FILL__
# make the first 40 N-terminal residues whole
WHOLEMOLECULES __FILL__
# alpha-helix content of residues 1-40
h: ALPHARMSD __FILL__
TYPE
compulsory keyword ( default=DRMSD ) the manner in which RMSD alignment is performed.
=OPTIMAL # parallel beta-sheet content of residues 1-40 pb: PARABETARMSD __FILL__
TYPE
compulsory keyword ( default=DRMSD ) the manner in which RMSD alignment is performed.
=OPTIMAL # antiparallel beta sheet-content of residues 1-40 ab: ANTIBETARMSD __FILL__
TYPE
compulsory keyword ( default=DRMSD ) the manner in which RMSD alignment is performed.
=OPTIMAL # now we create a new CV that sums parallel and antiparallel beta-sheet contents b: COMBINE __FILL__
PERIODIC
compulsory keyword if the output of your function is periodic then you should specify the periodicity of the function.
=NO # Print the alpha-helical content and the *total* beta-sheet content on COLVAR file every step PRINT __FILL__

and use it to analyze the trajectory 5-HT1B.xtc. Can you say if the first 40 N-terminal residues tend to populate more alpha-helical or beta-sheet conformations?

Exercise 5: Using RMSD to measure conformational changes

As previously mentioned, the first 40 N-terminal residues of the 5-HT1B receptor are quite flexible, while the rest of the protein remains more stable during the course of the simulation. In this exercise, you will learn how to use RMSD to measure deviations from a reference structure. To use this CV, you need to keep in mind that you must specify in the PLUMED input a PDB file in which you mark the atoms that you want to use to:

  • optimally align a conformation to the reference;
  • calculate the displacement from the reference conformation after optimal alignment.

Keep in mind that these two sets of atoms might be different! In fact, the objective of this exercise is to calculate:

  • the RMSD of the backbone atoms of the first 40 N-terminal residues after aligning the system on the backbone atoms of residues 41 to 390;
  • the RMSD of the backbone atoms of residues 41 to 390 after aligning the system on the same set of atoms.

To create the two PDB files needed to define the two RMSD CVs, you can start from the provided PDB file 5-HT1B.pdb. Please consult the manual at the RMSD page, in order to:

  • create the PLUMED input file to calculate the two CVs defined above (use TYPE=OPTIMAL);
  • learn how to mark atoms for alignment and displacement in the PDB files;
  • check whether PBCs are automatically taken care of or you need to use the WHOLEMOLECULES action.

After analyzing 5-HT1B.xtc with the driver tool, can you say which part of the receptor is more flexible and deviates more from the starting conformation during the course of the simulation?

Exercise 6: Aligning conformations to a template

In this exercise, we will learn how to align a MD trajectory to a reference conformation, after fixing possible discontinuities due to PBCs. The goal is to compute the vertical position of the serotonin ligand with respect to the lipid bilayer. In the simulations of membrane proteins, typically the initial conformation is oriented so that the lipid bilayer is parallel to the xy plane (look for example at 5-HT1B.pdb). Therefore, initially one could use for example the coordinate z of the geometric center of the ligand to measure how far it is from the membrane bilayer. However, during the simulation:

  • the system can translate from its original position;
  • the system can be broken by PBCs;

therefore one could not use an absolute position to keep track of the location of the ligand. To solve this problem, there are several PLUMED actions that can be used to make sure the system is not broken by PBCs, to re-align it to a reference conformation, and thus to use absolute positions safely.

To complete this exercise, the users will need to make heavy use of the PLUMED manual to prepare the input file on their own. In the following, some suggestions will be given:

  • first, make sure the entire protein is not broken by PBCs using WHOLEMOLECULES;
  • then, make sure the ligand is not broken by PBCs and in the same cell as the protein, using the WRAPAROUND action and the GROUPBY option;
  • to align the stable protein residues (as defined in Exercise 5: Using RMSD to measure conformational changes, i.e. residues 41 to 390) to the template 5-HT1B.pdb, you can use FIT_TO_TEMPLATE;
  • at this point, you can safely define the position of the geometric center of the ligand using the POSITION CV with the option NOPBC;
  • the requested CV is the z component of the POSITION CV.

You can check that your PLUMED input file is correct in two ways. First, you can print out the conformations of the system after the transformations done by WHOLEMOLECULES, WRAPAROUND, and FIT_TO_TEMPLATE by modifying your input file as follows:

Click on the labels of the actions for more information on what each action computes
tested on v2.9
# Activate MOLINFO functionalities
MOLINFO 
STRUCTURE
compulsory keyword a file in pdb format containing a reference structure.
=__FILL__ # Here put your PLUMED input file # # # Write coordinates of all the atoms to file after PLUMED transformations DUMPATOMS
FILE
compulsory keyword file on which to output coordinates; extension is automatically detected
=5-HT1B_aligned.gro
ATOMS
the atom indices whose positions you would like to print out.
=__FILL__

Now, you can visualize the 5-HT1B_aligned.gro file using VMD. Second, if you inspect the time series of your CV, this should be a continuous trajectory that spans the range from 9 nm to 18 nm.

Exercise 7: Estimating binding propensity

In this last exercise, we want to determine the propensity of the serotonin ligand to bind the first 40 N-terminal flexible residues of the 5-HT1B receptor and if there are hot-spots where binding is more favorite. In order to answer to these questions, the user will:

  • compute the fraction of bound conformations over the total number of frames in the MD trajectory;
  • for each individual residue and glycan, compute the fraction of bound conformation per residue/glycan;
  • dump all bound conformations to a gro file, after fixing PBCs as in Exercise 6: Aligning conformations to a template;
  • for each individual residue and glycan, dump all bound conformations to a separate gro file, after fixing PBCs.

To solve this exercise, no template PLUMED input file nor any indication of the procedure to follow will be given. The users should only keep in mind that:

  • we arbitrarily define as bound a conformation in which at least one pair of atoms of the ligand and of the protein/residue/glycan is closer than 0.4 nm;
  • any pre-exisiting CV defined in the PLUMED manual can be used;
  • any CV defined directly by the user in the PLUMED input file via the CUSTOM action can be used.