PLUMED can be used to analyze trajectories either on the fly during an MD run or via post processing a trajectory using driver. A molecular dynamics trajectory is in essence an ordered set of configurations of atoms. Trajectory analysis algorithms are methods that allow us to extract meaningful information from this extremely high-dimensionality information. In extracting this information much of the information in the trajectory will be discarded and assumed to be irrelevant to the problem at hand. For example, when we calculate a histogram from a trajectory we throw away all information on the order the frames were visited during the trajectory. We instead opt to display a time average that shows the parts of configuration space that were
visited most frequently. There are many situations in which this is a reasonable thing to do as we know that time averages are equivalent to ensemble averages in the long timescale limit and that these average probabilities of being in different parts of configuration space, \(P(s)\), are thus related to the underlying free energy, \(F(s)\), via:

\[ F(s) = - k_B T \ln P(s) \]

About the simplest form of analysis that PLUMED can perform involves printing information to a file. PLUMED can output various different kinds of information to files as described below:

COMMITTOR	Does a committor analysis.
DUMPATOMS	Dump selected atoms on a file.
DUMPDERIVATIVES	Dump the derivatives with respect to the input parameters for one or more objects (generally CVs, functions or biases).
DUMPFORCES	Dump the force acting on one of a values in a file.
DUMPMASSCHARGE	Dump masses and charges on a selected file.
DUMPPDB	Output PDB file.
DUMPPROJECTIONS	Dump the derivatives with respect to the input parameters for one or more objects (generally CVs, functions or biases).
DUMPVECTOR	Print a vector to a file
PRINT	Print quantities to a file.
PRINT_NDX	Print an ndx file
SELECT_COMPONENTS	Create a new value to hold a subset of the components that are in a vector or matrix
SELECT_WITH_MASK	Use a mask to select elements of an array
UPDATE_IF	Conditional update of other actions.

The UPDATE_IF action allows you to do more complex things using the above print commands. As detailed in the documentation for UPDATE_IF when you put any of the above actions within an UPDATE_IF block then data will only be output to the file if colvars are within particular ranges. In other words, the above printing commands, in tandem with UPDATE_IF, allow you to identify the frames in your trajectory that satisfy some particular criteria and output information on those frames only.

Another useful command is the COMMITTOR command. As detailed in the documentation for COMMITTOR this command tells PLUMED (and the underlying MD code) to stop the calculation one some criteria is satisfied, alternatively one can use it to keep track of the number of times a criteria is satisfied.

A number of more complicated forms of analysis can be performed that take a number of frames from the trajectory as input. In all these commands the STRIDE keyword is used to tell PLUMED how frequently to collect data from the trajectory. In all these methods the output from the analysis is a form of ensemble average. If you are running with a bias it is thus likely that you may want to reweight the trajectory frames in order to remove the effect the bias has on the static behavior of the system. The following methods can thus be used to calculate weights for the various trajectory frames so that the final ensemble average is an average for the canonical ensemble at the appropriate temperature.

Reweighting and Averaging

COVARIANCE_MATRIX	Calculate a covariance matix
REWEIGHT_BIAS	Calculate weights for ensemble averages that negate the effect the bias has on the region of phase space explored
REWEIGHT_METAD	Calculate the weights configurations should contribute to the histogram in a simulation in which a metadynamics bias acts upon the system.
REWEIGHT_TEMP_PRESS	Calculate weights for ensemble averages at temperatures and/or pressures different than those used in your original simulation.
WHAM	Calculate the weights for configurations using the weighted histogram analysis method.
WHAM_HISTOGRAM	This can be used to output the a histogram using the weighted histogram technique
WHAM_WEIGHTS	Calculate and output weights for configurations using the weighted histogram analysis method.

You can then calculate ensemble averages using the following actions.

ACCUMULATE	Sum the elements of this value over the course of the trajectory
AVERAGE	Calculate the ensemble average of a collective variable
CUSTOM_GRID	Calculate a function of the grid or grids that are input and return a new grid
EVALUATE_FUNCTION_FROM_GRID	Calculate the function stored on the input grid at an arbitrary point
EVALUATE_FUNCTION_FROM_GRID_SCALAR	Calculate the function stored on the input grid at an arbitrary point
EVALUATE_FUNCTION_FROM_GRID_VECTOR	Calculate the function stored on the input grid for each of the points in the input vector/s
FIND_GRID_MAXIMUM	Find the point with the highest value of the function on the grid
FIND_GRID_MINIMUM	Find the point with the lowest value of the function on the grid
HISTOGRAM	Accumulate the average probability density along a few CVs from a trajectory.
INTEGRATE_GRID	Calculate the numerical integral of the function stored on the grid
MATHEVAL_GRID	Calculate a function of the grid or grids that are input and return a new grid
MULTICOLVARDENS	Evaluate the average value of a multicolvar on a grid.
REFERENCE_GRID	Setup a constant grid by either reading values from a file or definining a function in input
SUM_GRID	Sum the values of all the function at the points on a grid

For many of the above commands data is accumulated on the grids. These grids can be further analyzed using one of the actions detailed below at some time.

CONVERT_TO_FES	Convert a histogram to a free energy surface.
DUMPCONTOUR	Print the contour
DUMPCUBE	Output a three dimensional grid using the Gaussian cube file format.
DUMPGRID	Output the function on the grid to a file with the PLUMED grid format.
FIND_CONTOUR	Find an isocontour in a smooth function.
FIND_CONTOUR_SURFACE	Find an isocontour by searching along either the x, y or z direction.
FIND_SPHERICAL_CONTOUR	Find an isocontour in a three dimensional grid by searching over a Fibonacci sphere.
FOURIER_TRANSFORM	Compute the Discrete Fourier Transform (DFT) by means of FFTW of data stored on a 2D grid.
INTERPOLATE_GRID	Interpolate a smooth function stored on a grid onto a grid with a smaller grid spacing.

As an example the following set of commands instructs PLUMED to calculate the distance between atoms 1 and 2 for every fifth frame in the trajectory and to accumulate a histogram from this data which will be output every 100 steps (i.e. when 20 distances have been added to the histogram).

Click on the labels of the actions for more information on what each action computes

x: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. 
=1,2  You cannot view the components that are calculated by each action for this input file. Sorry 
h: HISTOGRAM ARGthe quantities that are being used to construct the histogram 
=x GRID_MINcompulsory keyword ( default=auto )
the lower bounds for the grid 
=0.0 GRID_MAXcompulsory keyword ( default=auto )
the upper bounds for the grid 
=3.0 GRID_BINthe number of bins for the grid 
=100 BANDWIDTHthe bandwidths for kernel density esimtation 
=0.1 STRIDEcompulsory keyword ( default=1 )
the frequency with which to store data for averaging 
=5  You cannot view the components that are calculated by each action for this input file. Sorry 
DUMPGRID GRIDthe grid you would like to print (can also use ARG for specifying what is being printed)
=h FILEcompulsory keyword ( default=density )
the file on which to write the grid. 
=histo STRIDEcompulsory keyword ( default=0 )
the frequency with which the grid should be output to the file. 
=100  You cannot view the components that are calculated by each action for this input file. Sorry

It is important to note when using commands such as the above the first frame in the trajectory is assumed to be the initial configuration that was input to the MD code. It is thus ignored. Furthermore, if you are running with driver and you would like to analyze the whole trajectory (without specifying its length) and then print the result you simply call DUMPGRID (or any of the commands above) without a STRIDE keyword as shown in the example below.

Click on the labels of the actions for more information on what each action computes

x: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. 
=1,2  You cannot view the components that are calculated by each action for this input file. Sorry 
h: HISTOGRAM ARGthe quantities that are being used to construct the histogram 
=x GRID_MINcompulsory keyword ( default=auto )
the lower bounds for the grid 
=0.0 GRID_MAXcompulsory keyword ( default=auto )
the upper bounds for the grid 
=3.0 GRID_BINthe number of bins for the grid 
=100 BANDWIDTHthe bandwidths for kernel density esimtation 
=0.1 STRIDEcompulsory keyword ( default=1 )
the frequency with which to store data for averaging 
=5  You cannot view the components that are calculated by each action for this input file. Sorry 
DUMPGRID GRIDthe grid you would like to print (can also use ARG for specifying what is being printed)
=h FILEcompulsory keyword ( default=density )
the file on which to write the grid. 
=histo  You cannot view the components that are calculated by each action for this input file. Sorry

Please note that even with this calculation the first frame in the trajectory is ignored when computing the histogram.

Notice that all the commands for calculating smooth functions described above calculate some sort of average. There are two ways that you may wish to average the data in your trajectory:

You might want to calculate block averages in which the first \(N\)N frames in your trajectory are averaged separately to the second block of \(N\) frames. If this is the case you should use the keyword CLEAR in the input to the action that calculates the smooth function. This keyword is used to specify how frequently you are clearing the stored data.
You might want to calculate an accumulate an average over the whole trajectory and output the average accumulated at step \(N\), step \(2N\)... This is what PLUMED does by default so you do not need to use CLEAR in this case.

Diagnostic tools

PLUMED has a number of diagnostic tools that can be used to check that new Actions are working correctly:

DUMPFORCES	Dump the force acting on one of a values in a file.
DUMPDERIVATIVES	Dump the derivatives with respect to the input parameters for one or more objects (generally CVs, functions or biases).
DUMPMASSCHARGE	Dump masses and charges on a selected file.
DUMPPROJECTIONS	Dump the derivatives with respect to the input parameters for one or more objects (generally CVs, functions or biases).

These commands allow you to test that derivatives and forces are calculated correctly within colvars and functions. One place where this is very useful is when you are testing whether or not you have implemented the derivatives of a new collective variables correctly. So for example if we wanted to do such a test on the distance CV we would employ an input file something like this:

Click on the labels of the actions for more information on what each action computes

d1: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. 
=1,2 The DISTANCE action with label d1 calculates a single scalar value
d1n: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. 
=1,2 NUMERICAL_DERIVATIVES( default=off ) calculate the derivatives for these quantities numerically 
 The DISTANCE action with label d1n calculates a single scalar value
DUMPDERIVATIVES ARGcompulsory keyword 
the labels of the values whose derivatives should be output 
=d1,d1n FILEcompulsory keyword 
the name of the file on which to output the derivatives 
=derivatives The DUMPDERIVATIVES action with label

The first of these two distance commands calculates the analytical derivatives of the distance while the second calculates these derivatives numerically. Obviously, if your CV is implemented correctly these two sets of quantities should be nearly identical.

Storing data for analysis

All the analysis methods described in previous sections accumulate averages or output diagnostic information on the fly. That is to say these methods calculate something given the instantaneous positions of the atoms or the instantaneous values of a set of collective variables. Many methods (e.g. dimensionality reduction and clustering) will not work like this, however, as information from multiple trajectory frames is required at the point when the analysis is performed. In other words the output from these types of analysis cannot be accumulated one frame at time. When using these methods you must therefore store trajectory frames for later analysis. You can do this storing by using the following action:

COLLECT_FRAMES

Collect and store trajectory frames for later analysis with one of the methods detailed below.

Calculating dissimilarity matrices

One of the simplest things that we can do if we have stored a set of trajectory frames using COLLECT_FRAMES is we can calculate the dissimilarity between every pair of frames we have stored. When using the dimensionality reduction algorithms described in the sections that follow the first step is to calculate this matrix. Consequently, within PLUMED the following command will collect the trajectory data as your simulation progressed and calculate the dissimilarities:

EUCLIDEAN_DISSIMILARITIES

Calculate the matrix of dissimilarities between a trajectory of atomic configurations.

By exploiting the functionality described in Distances from reference configurations you can calculate these dissimilarities in a wide variety of different ways (e.g. you can use RMSD, or you can use a collection of collective variable values see TARGET). If you wish to view this dissimilarity information you can print these quantities to a file using:

PRINT_DISSIMILARITY_MATRIX

Print the matrix of dissimilarities between a trajectory of atomic configurations.

In addition, if PLUMED does not calculate the dissimilarities you need you can read this information from an external file

READ_DISSIMILARITY_MATRIX

Read a matrix of dissimilarities between a trajectory of atomic configurations from a file.

N.B. You can only use the two commands above when you are doing post-processing.

Landmark Selection

Many of the techniques described in the following sections are very computationally expensive to run on large trajectories. A common strategy is thus to use a landmark selection algorithm to pick a particularly-representative subset of trajectory frames and to only apply the expensive analysis algorithm on these configurations. The various landmark selection algorithms that are available in PLUMED are as follows

FARTHEST_POINT_SAMPLING	Select a set of landmarks using farthest point sampling.
LANDMARK_SELECT_FPS	Select a of landmarks from a large set of configurations using farthest point sampling.
LANDMARK_SELECT_RANDOM	Select a random set of landmarks from a large set of configurations.
LANDMARK_SELECT_STRIDE	Select every ith frame from the stored data

In general most of these landmark selection algorithms must be used in tandem with a dissimilarity matrix object as as follows:

Click on the labels of the actions for more information on what each action computes

d1: DISTANCE ATOMSthe pair of atom that we are calculating the distance between.
=1,2 You cannot view the components that are calculated by each action for this input file. Sorry
d2: DISTANCE ATOMSthe pair of atom that we are calculating the distance between.
=3,4 You cannot view the components that are calculated by each action for this input file. Sorry
d3: DISTANCE ATOMSthe pair of atom that we are calculating the distance between.
=5,6 You cannot view the components that are calculated by each action for this input file. Sorry
data: COLLECT_FRAMES ARGthe labels of the values whose time series you would like to collect for later analysis
=d1,d2,d3 STRIDEcompulsory keyword ( default=1 )
the frequency with which data should be stored for analysis.
=1 You cannot view the components that are calculated by each action for this input file. Sorry
ss1: EUCLIDEAN_DISSIMILARITIES USE_OUTPUT_DATA_FROM could not find this keyword
=data You cannot view the components that are calculated by each action for this input file. Sorry
ll2: LANDMARK_SELECT_FPS USE_OUTPUT_DATA_FROM could not find this keyword
=ss1 NLANDMARKScompulsory keyword
the numbe rof landmarks you would like to create
=300 You cannot view the components that are calculated by each action for this input file. Sorry
OUTPUT_ANALYSIS_DATA_TO_COLVAR USE_OUTPUT_DATA_FROM could not find this keyword
=ll2 FILE could not find this keyword
=mylandmarks You cannot view the components that are calculated by each action for this input file. Sorry

When landmark selection is performed in this way a weight is ascribed to each of the landmark configurations. This weight is calculated by summing the weights of all the trajectory frames in each of the landmarks Voronoi polyhedron (https://en.wikipedia.org/wiki/Voronoi_diagram). The weight of each trajectory frame is one unless you are reweighting using the formula described in the Reweighting and Averaging to counteract the fact of a simulation bias or an elevated temperature. If you are reweighting using these formula the weight of each of the points is equal to the exponential term in the numerator of these expressions.

Dimensionality Reduction

Many dimensionality reduction algorithms work in a manner similar to the way we use when we make maps. You start with distances between London, Belfast, Paris and Dublin and then you try to arrange points on a piece of paper so that the (suitably transformed) distances between the points in your map representing each of those cities are related to the true distances between the cities.
Stating this more mathematically MDS endeavors to find an isometry between points distributed in a high-dimensional space and a set of points distributed in a low-dimensional plane.
In other words, if we have \(M\) \(D\)-dimensional points, \(\mathbf{X}\), and we can calculate dissimilarities between pairs them, \(D_{ij}\), we can, with an MDS calculation, try to create \(M\) projections, \(\mathbf{x}\), of the high dimensionality points in a \(d\)-dimensional linear space by trying to arrange the projections so that the Euclidean distances between pairs of them, \(d_{ij}\), resemble the dissimilarities between the high dimensional points. In short we minimize:

\[ \chi^2 = \sum_{i \ne j} w_i w_j \left( F(D_{ij}) - f(d_{ij}) \right)^2 \]

where \(F(D_{ij})\) is some transformation of the distance between point \(X^{i}\) and point \(X^{j}\) and \(f(d_{ij})\) is some transformation of the distance between the projection of \(X^{i}\), \(x^i\), and the projection of \(X^{j}\), \(x^j\). \(w_i\) and \(w_j\) are the weights of configurations \(X^i\) and \(^j\) respectively. These weights are calculated using the reweighting and Voronoi polyhedron approaches described in previous sections. A tutorial on dimensionality reduction and how it can be used to analyze simulations can be found in the tutorial lugano-5 and in the following short video.

Within PLUMED running an input to run a dimensionality reduction algorithm can be as simple as:

Click on the labels of the actions for more information on what each action computes

d1: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. 
=1,2  You cannot view the components that are calculated by each action for this input file. Sorry 
d2: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. 
=3,4  You cannot view the components that are calculated by each action for this input file. Sorry 
d3: DISTANCE ATOMSthe pair of atom that we are calculating the distance between. 
=5,6  You cannot view the components that are calculated by each action for this input file. Sorry 
data: COLLECT_FRAMES STRIDEcompulsory keyword ( default=1 )
the frequency with which data should be stored for analysis. 
=1 ARGthe labels of the values whose time series you would like to collect for later analysis
=d1,d2,d3  You cannot view the components that are calculated by each action for this input file. Sorry 
ss1: EUCLIDEAN_DISSIMILARITIES USE_OUTPUT_DATA_FROM could not find this keyword 
=data  You cannot view the components that are calculated by each action for this input file. Sorry 
mds: CLASSICAL_MDS USE_OUTPUT_DATA_FROM could not find this keyword 
=ss1 NLOW_DIMcompulsory keyword 
number of low-dimensional coordinates required 
=2  You cannot view the components that are calculated by each action for this input file. Sorry

Where we have to use the EUCLIDEAN_DISSIMILARITIES action here in order to calculate the matrix of dissimilarities between trajectory frames. We can even throw some landmark selection into this procedure and perform

Click on the labels of the actions for more information on what each action computes

d1: DISTANCE ATOMSthe pair of atom that we are calculating the distance between.
=1,2 You cannot view the components that are calculated by each action for this input file. Sorry
d2: DISTANCE ATOMSthe pair of atom that we are calculating the distance between.
=3,4 You cannot view the components that are calculated by each action for this input file. Sorry
d3: DISTANCE ATOMSthe pair of atom that we are calculating the distance between.
=5,6 You cannot view the components that are calculated by each action for this input file. Sorry
data: COLLECT_FRAMES STRIDEcompulsory keyword ( default=1 )
the frequency with which data should be stored for analysis.
=1 ARGthe labels of the values whose time series you would like to collect for later analysis
=d1,d2,d3 You cannot view the components that are calculated by each action for this input file. Sorry
matrix: EUCLIDEAN_DISSIMILARITIES USE_OUTPUT_DATA_FROM could not find this keyword
=data You cannot view the components that are calculated by each action for this input file. Sorry
ll2: LANDMARK_SELECT_FPS USE_OUTPUT_DATA_FROM could not find this keyword
=matrix NLANDMARKScompulsory keyword
the numbe rof landmarks you would like to create
=300 You cannot view the components that are calculated by each action for this input file. Sorry
mds: CLASSICAL_MDS USE_OUTPUT_DATA_FROM could not find this keyword
=ll2 NLOW_DIMcompulsory keyword
number of low-dimensional coordinates required
=2 You cannot view the components that are calculated by each action for this input file. Sorry
osample: PROJECT_ALL_ANALYSIS_DATA USE_OUTPUT_DATA_FROM could not find this keyword
=matrix PROJECTION could not find this keyword
=mds You cannot view the components that are calculated by each action for this input file. Sorry

Notice here that the final command allows us to calculate the projections of all the non-landmark points that were collected by the action with label matrix.

Dimensionality can be more complicated, however, because the stress function that calculates \(\chi^2\) has to optimized rather carefully using a number of different algorithms. The various algorithms that can be used to optimize this function are described below

ARRANGE_POINTS	Arrange points in a low dimensional space so that the (transformed) distances between points in the low dimensional space match the dissimilarities provided in an input matrix.
CLASSICAL_MDS	Create a low-dimensional projection of a trajectory using the classical multidimensional scaling algorithm.
CREATE_MASK	Create a mask vector to use for landmark selection
PCA	Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.
PROJECT_POINTS	Find the projection of a point in a low dimensional space by matching the (transformed) distance between it and a series of reference configurations that were input
SKETCHMAP	Construct a sketch map projection of the input data
SKETCHMAP_PROJECTION	Read in a sketch-map projection

Outputting the results from analysis algorithms

The following methods are available for printing the result output by the various analysis algorithms:

OUTPUT_ANALYSIS_DATA_TO_COLVAR	Output the results from an analysis using the PLUMED colvar file format.
OUTPUT_ANALYSIS_DATA_TO_PDB	Output the results from an analysis using the PDB file format.

Using the above commands to output the data from any form of analysis is important as the STRIDE with which you output the data to a COLVAR or PDB file controls how frequently the analysis is performed on the collected data . If you specified no stride on the output lines then PLUMED assumes you want to perform analysis on the entire trajectory.

If you use the above commands to output data from one of the Landmark Selection algorithms then only the second will give you information on the atomic positions in your landmark configurations and their associated weights. The first of these commands will give the values of the colvars in the landmark configurations only. If you use the above commands to output data from one of the Dimensionality Reduction algorithms then OUTPUT_ANALYSIS_DATA_TO_COLVAR will give you an output file that contains the projection for each of your input points. OUTPUT_ANALYSIS_DATA_TO_PDB will give you a PDB that contains the position of the input point, the projections and the weight of the configuration.

A nice feature of plumed is that when you use Landmark Selection algorithms or Dimensionality Reduction algorithms the output information is just a vector of variables. As such you can use HISTOGRAM to construct a histogram of the information generated by these algorithms.