Aims

This tutorial is about the use of experimental data, in particular NMR data, either as collective variables or as replica-averaged restraints in MD simulations. While the first is a just a simple extension of what we have been already doing in previous tutorials, the latter is an approach that can be used to increase the quality of a force-field in describing the properties of a specific system.

Learning Outcomes

Once this tutorial is completed students will:

know why and how to use experimental data to define a collective variable
know why and how to use experimental data as replica-averaged restraints in MD simulations

Resources

The tarball for this project contains the following:

system: the files use to generate the topol?.tpr files of the first and second example
first: an example on the use of chemical shifts as a collective variable
second: an example on the use of chemical shifts as replica-averaged restraints
third: an example on the use of RDCs (calculated with the theta-method) as replica-averaged restrains

Instructions

Experimental data as Collective Variables

In the former tutorials it has been often discussed the possibility of measuring a distance with respect to a structure representing some kind of state for a system, i.e. Belfast tutorial: Out of equilibrium dynamics. An alternative possibility is to use as a reference a set of experimental data that represent a state and measure the current deviation from the set. In plumed there are currently implemented the following NMR experimental observables: Chemical Shifts (only for proteins) CS2BACKBONE and CH3SHIFTS, NOE distances and Residual Dipolar couplings RDC. In addition NOE collective variable can be also used for PRE distances and 3J Couplings can be implemented using TORSION and MATHEVAL. Among the above listed collective variables those based on chemical shifts make use of an external library, ALMOST, that must be downloaded and compiled separately. In addition plumed must be configured in such a way to link ALMOST. Detailed instructions on how to compile PLUMED with ALMOST can be found in CS2BACKBONE.

In the following we will write the CS2BACKBONE collective variable that has been used in Gratana et al. (2013).

prot: GROUP ATOMS=1-862
WHOLEMOLECULES ENTITY0=prot

cs: CS2BACKBONE ATOMS=prot DATA=data FF=a03_gromacs.mdb NRES=56 FLAT=1.0 WRITE_CS=50 

PRINT ARG=cs FILE=COLVAR STRIDE=100

ENDPLUMED

In this case the chemical shifts are those measured for the native state of the protein and can be used, together with other CVs and Bias-Exchange Metadynanics, to guide the system back and forth from the native structure. The experimental chemical shifts are in six files inside the "data/" folder (see first example in the resources tarball), one file for each nucleus. A 0 chemical shift is used where a chemical shift doesn't exist (i.e. CB of GLY) or where it has not been assigned. Additionally the data folder contains:

camshift.db: this file is a parameter file for camshift, it is a standard file needed to calculate the chemical shifts from a structure
a03_gromacs.mdb: this is a Amber force field in ALMOST format and it is used to map the atom names from plumed and almost (in this case we are using amber for our simulation)
template.pdb: this is a pdb file for the protein we are simulating (i.e. editconf -f conf.gro -o template.pdb) where atoms are ordered in the same way in which are included in the main code and again it is used to map the atom in plumed with those in almost.

This example can be executed as

mdrun_mpi -s topol -plumed plumed

Replica-Averaged Restrained Simulations

NMR data, as all the equilibrium experimental data, are the result of a measure over an ensemble of structures and over time. In principle a "perfect" molecular dynamics simulations, that is a simulations with a perfect force-field and a perfect sampling can predict the outcome of an experiments in a quantitative way. Actually in most of the cases obtaining a qualitative agreement is already a fortunate outcome. In order to increase the accuracy of a force field in a system dependent manner it is possible to add to the force-field an additional term based on the agreement with a set of experimental data. This agreement is not enforced as a simple restraint because this would mean to ask the system to be always in agreement with all the experimental data at the same time, instead the restraint is applied over an AVERAGED COLLECTIVE VARIABLE where the average is performed over multiple identical simulations. In this way the is not a single replica that must be in agreement with the experimental data but they should be in agreement on average. It has been shown that this approach is equivalent in solving the problem of finding a modified version of the force field that will reproduce the provided set of experimental data withouth any additional assumption on the data themselves.

Currently ENSEMBLE AVERAGING of a collective variable can be performed only using the NMR variables (CS2BACKBONE, CH3SHIFTS, NOE and RDC).

The second example included in the resources show how the amber force field can be improved in the case of protein domain GB3 using the native state chemical shifts a replica-averaged restraint. By the fact that replica-averaging needs the use of multiple replica simulated in parallel in the same conditions it is easily complemented with BIAS-EXCHANGE or MULTIPLE WALKER metadynamics to enhance the sampling.

prot: GROUP ATOMS=1-862
WHOLEMOLECULES ENTITY0=prot

cs: CS2BACKBONE ATOMS=prot DATA=data FF=a03_gromacs.mdb NRES=56 FLAT=0.0 WRITE_CS=500 ENSEMBLE

cse: RESTRAINT ARG=cs AT=0. KAPPA=0. SLOPE=24

PRINT ARG=cs FILE=COLVAR STRIDE=10

ENDPLUMED

with respect to the case in which chemical shifts are used to define a standard collective variable, in this case the keyword ENSEMBLE tells plumed to calculate all the chemical shifts from the replicas (i.e. 4 replicas) average them and only after the averaging calculate the difference with respect to the experimental ones. On this difference that is the AVERAGED Collective Variable it is possible to apply a linear RESTRAINT (because the variable is already a sum of squared differences) that is the new term we are adding to the underlying force field.

This example can be executed as

mpiexec -np 4 mdrun_mpi -s topol -plumed plumed -multi 4

The third example show how RDC (calculated with the theta-methods) can be employed in the same way, in this case to describe the native state of Ubiquitin. In particular it is possible to observe how the RDC averaged restraint applied on the correlation between the calculated and experimental N-H and CA-HA RDCs result in the increase of the correlation of the RDCs for other bonds already on a very short time scale.

RDC ...
ENSEMBLE
CORRELATION
GYROM=-72.5388
SCALE=0.001060 
ATOMS1=20,21 COUPLING1=8.17
ATOMS2=37,38 COUPLING2=-8.271
ATOMS3=56,57 COUPLING3=-10.489
ATOMS4=76,77 COUPLING4=-9.871
#continue....

In this input the first four N-H RDCs are defined.

This example can be executed as

mpiexec -np 8 mdrun_mpi -s topol -plumed plumed -multi 8

Reference

Granata, D., Camilloni, C., Vendruscolo, M. & Laio, A. Characterization of the free-energy landscapes of proteins by NMR-guided metadynamics. Proc. Natl. Acad. Sci. U.S.A. 110, 6817–6822 (2013).
Cavalli, A., Camilloni, C. & Vendruscolo, M. Molecular dynamics simulations with replica-averaged structural restraints generate structural ensembles according to the maximum entropy principle. J. Chem. Phys. 138, 094112 (2013).
Camilloni, C., Cavalli, A. & Vendruscolo, M. Replica-Averaged Metadynamics. Journal of Chemical Theory … 9, 5610–5617 (2013).
Roux, B. & Weare, J. On the statistical equivalence of restrained-ensemble simulations with the maximum entropy method. J. Chem. Phys. 138, 084107 (2013).
Boomsma, W., Lindorff-Larsen, K. & Ferkinghoff-Borg, J. Combining Experiments and Simulations Using the Maximum Entropy Principle. PLoS Comput. Biol. 10, e1003406 (2014).
Camilloni, C. & Vendruscolo M. A Tensor-Free Method for the Structural and Dynamical Refinement of Proteins using Residual Dipolar Couplings. J. PHYS. CHEM. B XXX (2014).