Analysis

PLUMED can be used to analyze trajectories either on the fly during an MD run or via post processing a trajectory using driver. A molecular dynamics trajectory is in essence an ordered set of configurations of atoms. Trajectory analysis algorithms are methods that allow us to extract meaningful information from this extremely high-dimensionality information. In extracting this information much of the information in the trajectory will be discarded and assumed to be irrelevant to the problem at hand. For example, when we calculate a histogram from a trajectory we throw away all information on the order the frames were visited during the trajectory. We instead opt to display a time average that shows the parts of configuration space that were
visited most frequently. There are many situations in which this is a reasonable thing to do as we know that time averages are equivalent to ensemble averages in the long timescale limit and that these average probabilities of being in different parts of configuration space, $$P(s)$$, are thus related to the underlying free energy, $$F(s)$$, via:

$F(s) = - k_B T \ln P(s)$

About the simplest form of analysis that PLUMED can perform involves printing information to a file. PLUMED can output various different kinds of information to files as described below:

 COMMITTOR Does a committor analysis. DUMPATOMS Dump selected atoms on a file. DUMPDERIVATIVES Dump the derivatives with respect to the input parameters for one or more objects (generally CVs, functions or biases). DUMPFORCES Dump the force acting on one of a values in a file. DUMPMASSCHARGE Dump masses and charges on a selected file. DUMPMULTICOLVAR Dump atom positions and multicolvar on a file. DUMPPROJECTIONS Dump the derivatives with respect to the input parameters for one or more objects (generally CVs, functions or biases). PRINT Print quantities to a file. UPDATE_IF Conditional update of other actions.

The UPDATE_IF action allows you to do more complex things using the above print commands. As detailed in the documentation for UPDATE_IF when you put any of the above actions within an UPDATE_IF block then data will only be output to the file if colvars are within particular ranges. In other words, the above printing commands, in tandem with UPDATE_IF, allow you to identify the frames in your trajectory that satisfy some particular criteria and output information on those frames only.

Another useful command is the COMMITTOR command. As detailed in the documentation for COMMITTOR this command tells PLUMED (and the underlying MD code) to stop the calculation one some criteria is satisfied, alternatively one can use it to keep track of the number of times a criteria is satisfied.

A number of more complicated forms of analysis can be performed that take a number of frames from the trajectory as input. In all these commands the STRIDE keyword is used to tell PLUMED how frequently to collect data from the trajectory. In all these methods the output from the analysis is a form of ensemble average. If you are running with a bias it is thus likely that you may want to reweight the trajectory frames in order to remove the effect the bias has on the static behavior of the system. The following methods can thus be used to calculate weights for the various trajectory frames so that the final ensemble average is an average for the canonical ensemble at the appropriate temperature.

# Reweighting and Averaging

 REWEIGHT_BIAS Calculate weights for ensemble averages that negate the effect the bias has on the region of phase space explored REWEIGHT_METAD Calculate the weights configurations should contribute to the histogram in a simulation in which a metadynamics bias acts upon the system. REWEIGHT_TEMP_PRESS Calculate weights for ensemble averages at temperatures and/or pressures different than those used in your original simulation. REWEIGHT_WHAM Calculate the weights for configurations using the weighted histogram analysis method. WHAM_HISTOGRAM This can be used to output the a histogram using the weighted histogram technique WHAM_WEIGHTS Calculate and output weights for configurations using the weighted histogram analysis method.

You can then calculate ensemble averages using the following actions.

 AVERAGE Calculate the ensemble average of a collective variable HISTOGRAM Accumulate the average probability density along a few CVs from a trajectory. MULTICOLVARDENS Evaluate the average value of a multicolvar on a grid.

For many of the above commands data is accumulated on the grids. These grids can be further analyzed using one of the actions detailed below at some time.

 CONVERT_TO_FES Convert a histogram, $$H(x)$$, to a free energy surface using $$F(x) = -k_B T \ln H(x)$$. DUMPCUBE Output a three dimensional grid using the Gaussian cube file format. DUMPGRID Output the function on the grid to a file with the PLUMED grid format. FIND_CONTOUR Find an isocontour in a smooth function. FIND_CONTOUR_SURFACE Find an isocontour by searching along either the x, y or z direction. FIND_SPHERICAL_CONTOUR Find an isocontour in a three dimensional grid by searching over a Fibonacci sphere. FOURIER_TRANSFORM Compute the Discrete Fourier Transform (DFT) by means of FFTW of data stored on a 2D grid. GRID_TO_XYZ Output the function on the grid to an xyz file INTEGRATE_GRID Calculate the total integral of the function on the input grid INTERPOLATE_GRID Interpolate a smooth function stored on a grid onto a grid with a smaller grid spacing.

As an example the following set of commands instructs PLUMED to calculate the distance between atoms 1 and 2 for every fifth frame in the trajectory and to accumulate a histogram from this data which will be output every 100 steps (i.e. when 20 distances have been added to the histogram).

Click on the labels of the actions for more information on what each action computes
x: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =1,2
h: HISTOGRAM ARGthe input for this action is the scalar output from one or more other actions. =x GRID_MINcompulsory keyword
the lower bounds for the grid =0.0 GRID_MAXcompulsory keyword
the upper bounds for the grid =3.0 GRID_BINthe number of bins for the grid =100 BANDWIDTHcompulsory keyword
the bandwidths for kernel density estimation =0.1 STRIDEcompulsory keyword ( default=1 )
the frequency with which the data should be collected and added to the quantity being
averaged =5
DUMPGRID GRIDcompulsory keyword
the action that creates the grid you would like to output =h FILEcompulsory keyword ( default=density )
the file on which to write the grid. =histo STRIDEcompulsory keyword ( default=0 )
the frequency with which the grid should be output to the file. =100


It is important to note when using commands such as the above the first frame in the trajectory is assumed to be the initial configuration that was input to the MD code. It is thus ignored. Furthermore, if you are running with driver and you would like to analyze the whole trajectory (without specifying its length) and then print the result you simply call DUMPGRID (or any of the commands above) without a STRIDE keyword as shown in the example below.

Click on the labels of the actions for more information on what each action computes
x: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =1,2
h: HISTOGRAM ARGthe input for this action is the scalar output from one or more other actions. =x GRID_MINcompulsory keyword
the lower bounds for the grid =0.0 GRID_MAXcompulsory keyword
the upper bounds for the grid =3.0 GRID_BINthe number of bins for the grid =100 BANDWIDTHcompulsory keyword
the bandwidths for kernel density estimation =0.1 STRIDEcompulsory keyword ( default=1 )
the frequency with which the data should be collected and added to the quantity being
averaged =5
DUMPGRID GRIDcompulsory keyword
the action that creates the grid you would like to output =h FILEcompulsory keyword ( default=density )
the file on which to write the grid. =histo


Please note that even with this calculation the first frame in the trajectory is ignored when computing the histogram.

Notice that all the commands for calculating smooth functions described above calculate some sort of average. There are two ways that you may wish to average the data in your trajectory:

• You might want to calculate block averages in which the first $$N$$N frames in your trajectory are averaged separately to the second block of $$N$$ frames. If this is the case you should use the keyword CLEAR in the input to the action that calculates the smooth function. This keyword is used to specify how frequently you are clearing the stored data.
• You might want to calculate an accumulate an average over the whole trajectory and output the average accumulated at step $$N$$, step $$2N$$... This is what PLUMED does by default so you do not need to use CLEAR in this case.

# Diagnostic tools

PLUMED has a number of diagnostic tools that can be used to check that new Actions are working correctly:

 DUMPFORCES Dump the force acting on one of a values in a file. DUMPDERIVATIVES Dump the derivatives with respect to the input parameters for one or more objects (generally CVs, functions or biases). DUMPMASSCHARGE Dump masses and charges on a selected file. DUMPPROJECTIONS Dump the derivatives with respect to the input parameters for one or more objects (generally CVs, functions or biases).

These commands allow you to test that derivatives and forces are calculated correctly within colvars and functions. One place where this is very useful is when you are testing whether or not you have implemented the derivatives of a new collective variables correctly. So for example if we wanted to do such a test on the distance CV we would employ an input file something like this:

Click on the labels of the actions for more information on what each action computes
d1: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =1,2
d1n: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =1,2 NUMERICAL_DERIVATIVES( default=off ) calculate the derivatives for these quantities numerically
DUMPDERIVATIVES ARGthe input for this action is the scalar output from one or more other actions. =d1,d1n FILEcompulsory keyword
the name of the file on which to output the derivatives =derivatives


The first of these two distance commands calculates the analytical derivatives of the distance while the second calculates these derivatives numerically. Obviously, if your CV is implemented correctly these two sets of quantities should be nearly identical.

# Storing data for analysis

All the analysis methods described in previous sections accumulate averages or output diagnostic information on the fly. That is to say these methods calculate something given the instantaneous positions of the atoms or the instantaneous values of a set of collective variables. Many methods (e.g. dimensionality reduction and clustering) will not work like this, however, as information from multiple trajectory frames is required at the point when the analysis is performed. In other words the output from these types of analysis cannot be accumulated one frame at time. When using these methods you must therefore store trajectory frames for later analysis. You can do this storing by using the following action:

 COLLECT_FRAMES Collect and store trajectory frames for later analysis with one of the methods detailed below.

# Calculating dissimilarity matrices

One of the simplest things that we can do if we have stored a set of trajectory frames using COLLECT_FRAMES is we can calculate the dissimilarity between every pair of frames we have stored. When using the dimensionality reduction algorithms described in the sections that follow the first step is to calculate this matrix. Consequently, within PLUMED the following command will collect the trajectory data as your simulation progressed and calculate the dissimilarities:

 EUCLIDEAN_DISSIMILARITIES Calculate the matrix of dissimilarities between a trajectory of atomic configurations.

By exploiting the functionality described in Distances from reference configurations you can calculate these dissimilarities in a wide variety of different ways (e.g. you can use RMSD, or you can use a collection of collective variable values see TARGET). If you wish to view this dissimilarity information you can print these quantities to a file using:

 PRINT_DISSIMILARITY_MATRIX Print the matrix of dissimilarities between a trajectory of atomic configurations.

In addition, if PLUMED does not calculate the dissimilarities you need you can read this information from an external file

 READ_DISSIMILARITY_MATRIX Read a matrix of dissimilarities between a trajectory of atomic configurations from a file.

N.B. You can only use the two commands above when you are doing post-processing.

# Landmark Selection

Many of the techniques described in the following sections are very computationally expensive to run on large trajectories. A common strategy is thus to use a landmark selection algorithm to pick a particularly-representative subset of trajectory frames and to only apply the expensive analysis algorithm on these configurations. The various landmark selection algorithms that are available in PLUMED are as follows

 LANDMARK_SELECT_FPS Select a set of landmarks using farthest point sampling. LANDMARK_SELECT_RANDOM Select a random set of landmarks from a large set of configurations. LANDMARK_SELECT_STAGED Select a set of landmarks using the staged algorithm. LANDMARK_SELECT_STRIDE Select every $$k$$th landmark from the trajectory. RESELECT_LANDMARKS This allows us to use one measure in landmark selection and a different measure in dimensionality reduction

In general most of these landmark selection algorithms must be used in tandem with a dissimilarity matrix object as as follows:

Click on the labels of the actions for more information on what each action computes
d1: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =1,2
d2: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =3,4
d3: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =5,6
data: COLLECT_FRAMES ARGthe input for this action is the scalar output from one or more other actions. =d1,d2,d3 STRIDEthe frequency with which data should be stored for analysis. =1
ss1: EUCLIDEAN_DISSIMILARITIES USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object =data
ll2: LANDMARK_SELECT_FPS USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object =ss1 NLANDMARKScompulsory keyword
the number of landmarks that you would like to select =300
OUTPUT_ANALYSIS_DATA_TO_COLVAR USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object =ll2 FILEcompulsory keyword
the name of the file to output to =mylandmarks


When landmark selection is performed in this way a weight is ascribed to each of the landmark configurations. This weight is calculated by summing the weights of all the trajectory frames in each of the landmarks Voronoi polyhedron (https://en.wikipedia.org/wiki/Voronoi_diagram). The weight of each trajectory frame is one unless you are reweighting using the formula described in the Reweighting and Averaging to counteract the fact of a simulation bias or an elevated temperature. If you are reweighting using these formula the weight of each of the points is equal to the exponential term in the numerator of these expressions.

# Dimensionality Reduction

Many dimensionality reduction algorithms work in a manner similar to the way we use when we make maps. You start with distances between London, Belfast, Paris and Dublin and then you try to arrange points on a piece of paper so that the (suitably transformed) distances between the points in your map representing each of those cities are related to the true distances between the cities.
Stating this more mathematically MDS endeavors to find an isometry between points distributed in a high-dimensional space and a set of points distributed in a low-dimensional plane.
In other words, if we have $$M$$ $$D$$-dimensional points, $$\mathbf{X}$$, and we can calculate dissimilarities between pairs them, $$D_{ij}$$, we can, with an MDS calculation, try to create $$M$$ projections, $$\mathbf{x}$$, of the high dimensionality points in a $$d$$-dimensional linear space by trying to arrange the projections so that the Euclidean distances between pairs of them, $$d_{ij}$$, resemble the dissimilarities between the high dimensional points. In short we minimize:

$\chi^2 = \sum_{i \ne j} w_i w_j \left( F(D_{ij}) - f(d_{ij}) \right)^2$

where $$F(D_{ij})$$ is some transformation of the distance between point $$X^{i}$$ and point $$X^{j}$$ and $$f(d_{ij})$$ is some transformation of the distance between the projection of $$X^{i}$$, $$x^i$$, and the projection of $$X^{j}$$, $$x^j$$. $$w_i$$ and $$w_j$$ are the weights of configurations $$X^i$$ and $$^j$$ respectively. These weights are calculated using the reweighting and Voronoi polyhedron approaches described in previous sections. A tutorial on dimensionality reduction and how it can be used to analyze simulations can be found in the tutorial Lugano tutorial: Dimensionality reduction and in the following short video.

Within PLUMED running an input to run a dimensionality reduction algorithm can be as simple as:

Click on the labels of the actions for more information on what each action computes
d1: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =1,2
d2: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =3,4
d3: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =5,6
data: COLLECT_FRAMES STRIDEthe frequency with which data should be stored for analysis. =1 ARGthe input for this action is the scalar output from one or more other actions. =d1,d2,d3
ss1: EUCLIDEAN_DISSIMILARITIES USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object =data
mds: CLASSICAL_MDS USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object =ss1 NLOW_DIMcompulsory keyword
number of low-dimensional coordinates required =2


Where we have to use the EUCLIDEAN_DISSIMILARITIES action here in order to calculate the matrix of dissimilarities between trajectory frames. We can even throw some landmark selection into this procedure and perform

Click on the labels of the actions for more information on what each action computes
d1: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =1,2
d2: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =3,4
d3: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. =5,6
data: COLLECT_FRAMES STRIDEthe frequency with which data should be stored for analysis. =1 ARGthe input for this action is the scalar output from one or more other actions. =d1,d2,d3
matrix: EUCLIDEAN_DISSIMILARITIES USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object =data
ll2: LANDMARK_SELECT_FPS USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object =matrix NLANDMARKScompulsory keyword
the number of landmarks that you would like to select =300
mds: CLASSICAL_MDS USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object =ll2 NLOW_DIMcompulsory keyword
number of low-dimensional coordinates required =2
osample: PROJECT_ALL_ANALYSIS_DATA USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object =matrix PROJECTIONcompulsory keyword
the projection that you wish to generate out-of-sample projections with =mds


Notice here that the final command allows us to calculate the projections of all the non-landmark points that were collected by the action with label matrix.

Dimensionality can be more complicated, however, because the stress function that calculates $$\chi^2$$ has to optimized rather carefully using a number of different algorithms. The various algorithms that can be used to optimize this function are described below

 CLASSICAL_MDS Create a low-dimensional projection of a trajectory using the classical multidimensional scaling algorithm. OUTPUT_PCA_PROJECTION This is used to output the projection calculated by principle component analysis PCA Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input. PROJECT_ALL_ANALYSIS_DATA Find projections of all non-landmark points using the embedding calculated by a dimensionality reduction optimization calculation. SKETCHMAP_CONJGRAD Optimize the sketch-map stress function using conjugate gradients. SKETCHMAP_POINTWISE Optimize the sketch-map stress function using a pointwise global optimization algorithm. SKETCHMAP_READ Read in a sketch-map projection from an input file SKETCHMAP_SMACOF Optimize the sketch-map stress function using the SMACOF algorithm. SKETCH_MAP This can be used to output the data that has been stored in an Analysis object. SMACOF_MDS Optimize the multidimensional scaling stress function using the SMACOF algorithm.

# Outputting the results from analysis algorithms

The following methods are available for printing the result output by the various analysis algorithms:

 OUTPUT_ANALYSIS_DATA_TO_COLVAR Output the results from an analysis using the PLUMED colvar file format. OUTPUT_ANALYSIS_DATA_TO_PDB Output the results from an analysis using the PDB file format.

Using the above commands to output the data from any form of analysis is important as the STRIDE with which you output the data to a COLVAR or PDB file controls how frequently the analysis is performed on the collected data . If you specified no stride on the output lines then PLUMED assumes you want to perform analysis on the entire trajectory.

If you use the above commands to output data from one of the Landmark Selection algorithms then only the second will give you information on the atomic positions in your landmark configurations and their associated weights. The first of these commands will give the values of the colvars in the landmark configurations only. If you use the above commands to output data from one of the Dimensionality Reduction algorithms then OUTPUT_ANALYSIS_DATA_TO_COLVAR will give you an output file that contains the projection for each of your input points. OUTPUT_ANALYSIS_DATA_TO_PDB will give you a PDB that contains the position of the input point, the projections and the weight of the configuration.

A nice feature of plumed is that when you use Landmark Selection algorithms or Dimensionality Reduction algorithms the output information is just a vector of variables. As such you can use HISTOGRAM to construct a histogram of the information generated by these algorithms.