MARVEL-VES tutorial (Lugano Feb 2017): Kinetics

Aims

The aim of this tutorial is to introduce the use of VES for obtaining kinetic information of activated processes. We will learn how to set up a biased simulation using a fixed flooding potential acting on a relevant slow degree of freedom. We can then scale the accelerated MD time in a post-processing procedure.

Learning Outcomes

Once this tutorial is completed students will:

  • Optimize a bias using VES with an energy cutoff to selectively fill low regions of the free energy surface.
  • Use the optimized bias to observe several rare event transitions
  • Post-process the accelerated trajectory to obtain an unbiased estimate of the transition rate
  • Compute the empirical cumulative distribution of first passage times and compare to a theoretical model.

Resources

The tarball for this project contains the following files:

  • CH.airebo : Force field parameters for LAMMPS
  • input : LAMMPS input script
  • data.start : Starting configuration in LAMMPS format
  • plumed.dat : Example PLUMED file
  • time-reweighting.py : Python script for post-processing trajectory
  • TRAJECTORIES-1700K : A directory containing many trajectories for post processing
  • get-all-fpt.py : A python script to extract transition times from the trajectories
  • cdf.analysis.py : A python script to compute the cumulative probability distribution and perform KS test

Requirements

  • python with numpy, scipy, and statsmodels
  • LAMMPS compiled with MANYBODY package
  • VMD for visualization

Instructions

Stone Wales Transformation

Exercise 1A. Preliminary investigation using VES

As an example of an activated process in materials science, we will work with the Stone-Wales transformation in a carbon nanotube. The tarball for this project contains the inputs necessary to run a simulation in LAMMPS for a 480 atom carbon nanotube. We use the AIREBO (Adaptive Intermolecular Reactive Empirical Bond Order) force field parameters which can approximately describe C-C bond breakage and formation at reasonable computational cost. For CVs we can use the coordination number in PLUMED to measure the number of covalent bonds among different groups of atoms. The transformation involves breaking two C-C bonds and forming two alternative C-C bonds.A definition of these CVs as well as the relevant C-C bonds are depicted in Figure stone-wales. We prepare a PLUMED input file as follows.

Click on the labels of the actions for more information on what each action computes
tested on v2.7
# set distance units to angstrom, time to ps, and energy to eV
UNITS 
LENGTH
the units of lengths.
=A
TIME
the units of time.
=ps
ENERGY
the units of energy.
=eV # define two variables CV1: COORDINATION
GROUPA
First list of atoms.
=229,219
GROUPB
Second list of atoms (if empty, N*(N-1)/2 pairs in GROUPA are counted).
=238,207
R_0
could not find this keyword
=1.8
NN
compulsory keyword ( default=6 ) The n parameter of the switching function
=8
MM
compulsory keyword ( default=0 ) The m parameter of the switching function; 0 implies 2*NN
=16
PAIR
( default=off ) Pair only 1st element of the 1st group with 1st element in the second, etc
CV2: COORDINATION
GROUPA
First list of atoms.
=229,219
GROUPB
Second list of atoms (if empty, N*(N-1)/2 pairs in GROUPA are counted).
=207,238
R_0
could not find this keyword
=1.8
NN
compulsory keyword ( default=6 ) The n parameter of the switching function
=8
MM
compulsory keyword ( default=0 ) The m parameter of the switching function; 0 implies 2*NN
=16
PAIR
( default=off ) Pair only 1st element of the 1st group with 1st element in the second, etc
# the difference between variables d1: COMBINE
ARG
the input for this action is the scalar output from one or more other actions.
=CV1,CV2
COEFFICIENTS
compulsory keyword ( default=1.0 ) the coefficients of the arguments in your function
=1,-1
POWERS
compulsory keyword ( default=1.0 ) the powers to which you are raising each of the arguments in your function
=1,1
PERIODIC
compulsory keyword if the output of your function is periodic then you should specify the periodicity of the function.
=NO

In the above, the first line sets the energy units. The second and third line define the two CVs for the C-C covalent bonds. (We have chosen atoms 238,207,229 and 219; however this choice is arbitrary and other atoms could equally well have been chosen.) Lastly, we define a simple approximate reaction coordinate given by the difference between CV1 and CV2 that we can use to monitor the transition.

Next we will use VES to drive the transformation at 1700 K. We bias the formation of bonds DB and AC shown in Figure stone-wales using CV2 which changes from 0 to 2 during the transformation. (Although a more rigorous treatment would bias both CVs, for this tutorial we will simplify things and work in only one dimension). We choose a Chebyshev polynomial basis set up to order 36

Click on the labels of the actions for more information on what each action computes
tested on v2.7
__FILL__ 
# The basis set to use
bf1: BF_CHEBYSHEV 
ORDER
compulsory keyword The order of the basis function expansion.
=36
MINIMUM
compulsory keyword The minimum of the interval on which the basis functions are defined.
=0.0
MAXIMUM
compulsory keyword The maximum of the interval on which the basis functions are defined.
=2.0

and we tell PLUMED to use VES acting on CV2 with a free energy cutoff

Click on the labels of the actions for more information on what each action computes
tested on v2.7
__FILL__ 
td_uniform: TD_UNIFORM 
variational: VES_LINEAR_EXPANSION ...
   
ARG
the input for this action is the scalar output from one or more other actions.
=CV2
BASIS_FUNCTIONS
compulsory keyword the label of the one dimensional basis functions that should be used.
=bf1
TEMP
the system temperature - this is needed if the MD code does not pass the temperature to PLUMED.
=1700
BIAS_CUTOFF
cutoff the bias such that it only fills the free energy surface up to certain level F_cutoff, here you should give the value of the F_cutoff.
=15.0
BIAS_CUTOFF_FERMI_LAMBDA
the lambda value used in the Fermi switching function for the bias cutoff (BIAS_CUTOFF), the default value is 10.0.
=10.0
TARGET_DISTRIBUTION
the label of the target distribution to be used.
=td_uniform ...

Here we are biasing CV2 with the basis set defined above. The final two lines impose the cutoff at 15 eV. The cutoff is of the form

\[ \frac{1}{1+e^{\lambda [F(s)-F_c]}} \]

where \(\lambda\) (inverse energy units) controls how sharply the function goes to zero. Above we have set \(\lambda=10.0\) to ensure the cutoff goes sharply enough to zero.

The following input updates the VES bias every 200 steps, writing out the bias every 10 iterations. To enforce the energy cutoff we also need to update the target distribution which we do every 40 iterations with the TARGETDIST_STRIDE flag.

Click on the labels of the actions for more information on what each action computes
tested on v2.7
__FILL__ 
var-S: OPT_AVERAGED_SGD ...
   
BIAS
compulsory keyword the label of the VES bias to be optimized
=variational
STRIDE
compulsory keyword the frequency of updating the coefficients given in the number of MD steps.
=200
STEPSIZE
compulsory keyword the step size used for the optimization
=0.1
COEFFS_FILE
compulsory keyword ( default=coeffs.data ) the name of output file for the coefficients
=coeffs.dat
BIAS_OUTPUT
how often the bias(es) should be written out to file.
=10
TARGETDIST_STRIDE
stride for updating a target distribution that is iteratively updated during the optimization.
=40
TARGETDIST_OUTPUT
how often the dynamic target distribution(s) should be written out to file.
=40
COEFFS_OUTPUT
compulsory keyword ( default=100 ) how often the coefficients should be written to file.
=1 ...

Finally, we will stop the simulation when the transition occurs using the COMMITTOR in PLUMED

Click on the labels of the actions for more information on what each action computes
tested on v2.7
__FILL__ 
COMMITTOR 
ARG
the input for this action is the scalar output from one or more other actions.
=d1
BASIN_LL1
compulsory keyword List of lower limits for basin # You can use multiple instances of this keyword i.e.
=-2.0
BASIN_UL1
compulsory keyword List of upper limits for basin # You can use multiple instances of this keyword i.e.
=-1.0
STRIDE
compulsory keyword ( default=1 ) the frequency with which the CVs are analyzed
=600

Run the simulation using LAMMPS

lmp_mpi < input

and plot the last few bias output files (bias.variational.iter-n.data) using gnuplot. What is the difference between the maximum and minimum values of the bias obtained during the simulation? Is the cutoff value sufficient to cross the barrier? Is the cutoff value too large?

Figure Figure1A shows an example bias potential after 90 iterations. Note that a cutoff of 15 eV is too large as the system transitions before the bias reaches the prescribed cutoff. From Figure Figure1A we see a barrier height of approximately 7.3 eV.

Figure-1A

The output also produces an movie.xyz file which can be viewed in vmd. For better visualization, first change atom id from 1 to C by typing

sed "s/^1 /C /" movie.xyz > newmovie.xyz

Then load the newmovie.xyz into vmd and choose Graphics --> Representations and set Drawing Method to DynamicBonds. Create a second representation by clicking Create Rep and set the Drawing Method to VDW. Change the sphere scale to 0.3 and play the movie. You should observe the Stone-Wales transformation right before the end of the trajectory.

Exercise 1B. Set up and run a VES bias imposing a cutoff

In this exercise we will run a VES simulation to fill the FES only up to a certain cutoff. This will be the first step in order to obtain kinetic information from biased simulations. In the previous section, we observed that a cutoff of 15 eV is too strong for our purpose. Change the cutoff energy from 15.0 to 6.0 eV by setting

BIAS_CUTOFF=6.0

and rerun the simulation from Exercise 1A. [Note: In practice one can use multiple walkers during the optimization by adding the flag MULTIPLE_WALKERS]

Plot some of the bias files (bias.variational.iter-n.data) that are printed during the simulation using gnuplot. At the end of the simulation, you should be able to reproduce something like Figure Figure1B. Is the bias converging? If so, how many iterations does it require to converge?

Figure Figure1B shows the bias potential after 70,80, and 90 iteration steps. Note that the bias has reached the cutoff and goes to zero at around 1 Angstrom.

Figure-1B

Exercise 2. Using a fixed bias as a flooding potential to obtain rates

In this exercise we will use the bias obtained above as a static umbrella potential. We will set up and run a new trajectory to measure the first passage time of escape from the well.

We can extract the coefficients that we need from the final iteration in Exercise-1B above.

tail -n 47 coeffs.dat > fixed-coeffs.dat

Now create a new directory from which you will run a new simulation and copy the necessary input files (including the fixed-coeffs.dat) into this directory. Modify the PLUMED file so that the optimized coefficients are read by the VES_LINEAR_EXPANSION

Click on the labels of the actions for more information on what each action computes
tested on v2.7
__FILL__ 
td_uniform: TD_UNIFORM 
variational: VES_LINEAR_EXPANSION ...
   
ARG
the input for this action is the scalar output from one or more other actions.
=CV2
BASIS_FUNCTIONS
compulsory keyword the label of the one dimensional basis functions that should be used.
=bf1
TEMP
the system temperature - this is needed if the MD code does not pass the temperature to PLUMED.
=1700
BIAS_CUTOFF
cutoff the bias such that it only fills the free energy surface up to certain level F_cutoff, here you should give the value of the F_cutoff.
=6.0
BIAS_CUTOFF_FERMI_LAMBDA
the lambda value used in the Fermi switching function for the bias cutoff (BIAS_CUTOFF), the default value is 10.0.
=10.0
TARGET_DISTRIBUTION
the label of the target distribution to be used.
=td_uniform
COEFFS
read in the coefficients from files.
=fixed-coeffs.dat ...

The final line specifies the coefficients to be read from a file.

Make sure to remove the lines for the stochastic optimization (OPT_AVERAGED_SGD) as we no longer wish to update the bias.

We will also perform metadynamics with an infrequent deposition stride to ensure that the trajectory does not get stuck in any regions where the bias potential is not fully converged The following implements metadynamics on both CV1 and CV2 with a deposition stride of 4000 steps and a hill height of 0.15 eV.

Click on the labels of the actions for more information on what each action computes
tested on v2.7
__FILL__ 
metad: METAD ...
   
ARG
the input for this action is the scalar output from one or more other actions.
=CV1,CV2
SIGMA
compulsory keyword the widths of the Gaussian hills
=0.2,0.2
HEIGHT
the heights of the Gaussian hills.
=0.15
PACE
compulsory keyword the frequency for hill addition
=4000 ...

Again we will use the COMMITTOR to stop the trajectory after the transition.

Click on the labels of the actions for more information on what each action computes
tested on v2.7
__FILL__ 
COMMITTOR 
ARG
the input for this action is the scalar output from one or more other actions.
=d1
BASIN_LL1
compulsory keyword List of lower limits for basin # You can use multiple instances of this keyword i.e.
=-2.0
BASIN_UL1
compulsory keyword List of upper limits for basin # You can use multiple instances of this keyword i.e.
=-1.0

Now run a trajectory with the fixed bias. What is the time (biased) to escape? Plot the trajectory of the approximate reaction coordinate (column 4 in the COLVAR) in gnuplot. An example is shown in Figure Figure2.

Figure-2

Also look at the metadynamics bias in the column labeled metad.bias. How does the magnitude compare to the bias from VES in column variational.bias?

The crossing time for a single event doesn't tell us much because we don't have any statistics on the transition events. To obtain the mean first passage time, we have to repeat the calculation many times. To generate statistically independent samples, we have to change the seed for the random velocities that are generated in the LAMMPS input file. In a new directory, copy the necessary files and edit the following line in the input file

velocity      all create 1700.  495920

Choose a different 6 digit random number and repeat Exercise 2. How does the escape time compare to what you obtained before? Repeat the procedure several times with different velocity seeds to get a distribution of first passage times. Make sure you launch each simulation from a separate directory and keep all COLVAR files as you will need them in the next section where we will analyze the transition times.

Exercise 3. Post processing to obtain unbiased estimate for the transition rate

In the previous section you generated several trajectories with different first passage times. However, these times need to be re-weighted to correct for the bias potential. We can scale the time according to the hyperdynamics formula

\[ t^{*} = \Delta t_{MD} \sum_i^{n} e^{\beta V(s)} \]

Note that we need to add the total bias at each step, coming from both the VES bias and metadynamics. The python script time-reweighting.py will read the COLVAR from Exercise 2 and print the final reweighted time (in seconds). Open the script to make sure you understand how it works.

Run the script using

python time-reweighting.py

Note that the output first passage time is converted to seconds. What is the acceleration factor of our biased simulation? (i.e. the ratio of biased to unbiased transition times) The script also produces a time-reweighted trajectory COLVAR-RW for a specified CV (here we choose the approximate reaction coordinate, d1). Plot the reweighted COLVAR-RW in gnuplot and compare the original vs. time-reweighted trajectories. In particular, what effect does rescaling have on the time step?

Rerun the script for each of the trajectories you have run with the fixed bias and compute the mean first passage time from your data. The Stone-Wales transformation at 1700 K is estimated in fullerene to be ~10 days. How does your average time compare to this value?

The distribution of first passage times for an activated processes typically follows an exponential distribution. Instead of directly making a histogram of the first passage times, we can look at the cumulative distribution function which maps a value x to the fraction of values less than or equal to x. Since we are computing the cumulative distribution from a data set, this is called the empirical cumulative distribution (ECDF). On the other hand, the theoretical cumulative distribution (CDF) of an exponentially distributed random process is

\[ P(t) = 1-e^{-t/\tau} \]

where \(\tau\) is the mean first passage time.

In order to calculate the ECDF we need many trajectories. Several COLVAR files are included in the TRAJECTORIES-1700K directory. The script get-all-fpt.py is a modified version of time-reweighting.py and will calculate the first passage time (fpt) from all the simulation data and will output the times to a file fpt.dat. Run the script from the TRAJECTORIES-1700K directory

python get-all-fpt.py

You should obtain the output file fpt.dat which has a list of all the times. You can append your own values you obtained from Exercise 2 to the end of the list to increase the number of data points.

The script cdf-analysis.py will compute the ECDF and fit the distribution to the theoretical CDF. The script uses the statsmodels Python module. Run the script with

python cdf-analyysis.py

from the same directory where the fpt.dat file is located. The script prints both the mean first passage time of the data as well as the fit parameter \(\tau\). How do these two values compare?

Figure-3

Figure Figure3 shows an example fit of the ECDF to the theoretical CDF for a Poisson process. To test the reliability of the fit, we can generate some data set according the theoretical CDF and perform a two-sample Kolmogorov-Smirnov (KS) test. The KS test provides the probability that the two sets of data are drawn from the same underlying distribution expressed in the so-called p-value. The null hypothesis is typically rejected for p-value < 0.05. The script cdf-analysis.py also performs the KS test and prints the p-value. What is the p-value we obtain from your dataset of transition times?