It is important to note when using commands such as the above the first frame in the trajectory is assumed to be the initial configuration that was input to the MD code. It is thus ignored. Furthermore, if you are running with driver and you would like to analyze the whole trajectory (without specifying its length) and then print the result you simply call DUMPGRID (or any of the commands above) without a STRIDE keyword as shown in the example below.
Click on the labels of the actions for more information on what each action computes
x: DISTANCE =1,2 The DISTANCE action with label x calculates a single scalar value
h: HISTOGRAM =x =0.0 =3.0 =100 =0.1 =5 The HISTOGRAM action with label h calculates a single scalar value
Please note that even with this calculation the first frame in the trajectory is ignored when computing the histogram.
Notice that all the commands for calculating smooth functions described above calculate some sort of average. There are two ways that you may wish to average the data in your trajectory:
- You might want to calculate block averages in which the first \(N\)N frames in your trajectory are averaged separately to the second block of \(N\) frames. If this is the case you should use the keyword CLEAR in the input to the action that calculates the smooth function. This keyword is used to specify how frequently you are clearing the stored data.
- You might want to calculate an accumulate an average over the whole trajectory and output the average accumulated at step \(N\), step \(2N\)... This is what PLUMED does by default so you do not need to use CLEAR in this case.
Diagnostic tools
PLUMED has a number of diagnostic tools that can be used to check that new Actions are working correctly:
DUMPFORCES | Dump the force acting on one of a values in a file. |
DUMPDERIVATIVES | Dump the derivatives with respect to the input parameters for one or more objects (generally CVs, functions or biases). |
DUMPMASSCHARGE | Dump masses and charges on a selected file. |
DUMPPROJECTIONS | Dump the derivatives with respect to the input parameters for one or more objects (generally CVs, functions or biases). |
These commands allow you to test that derivatives and forces are calculated correctly within colvars and functions. One place where this is very useful is when you are testing whether or not you have implemented the derivatives of a new collective variables correctly. So for example if we wanted to do such a test on the distance CV we would employ an input file something like this:
Click on the labels of the actions for more information on what each action computes
d1: DISTANCE =1,2 The DISTANCE action with label d1 calculates a single scalar value
d1n: DISTANCE =1,2 The DISTANCE action with label d1n calculates a single scalar value
The first of these two distance commands calculates the analytical derivatives of the distance while the second calculates these derivatives numerically. Obviously, if your CV is implemented correctly these two sets of quantities should be nearly identical.
Storing data for analysis
All the analysis methods described in previous sections accumulate averages or output diagnostic information on the fly. That is to say these methods calculate something given the instantaneous positions of the atoms or the instantaneous values of a set of collective variables. Many methods (e.g. dimensionality reduction and clustering) will not work like this, however, as information from multiple trajectory frames is required at the point when the analysis is performed. In other words the output from these types of analysis cannot be accumulated one frame at time. When using these methods you must therefore store trajectory frames for later analysis. You can do this storing by using the following action:
COLLECT_FRAMES | Collect and store trajectory frames for later analysis with one of the methods detailed below. |
Calculating dissimilarity matrices
One of the simplest things that we can do if we have stored a set of trajectory frames using COLLECT_FRAMES is we can calculate the dissimilarity between every pair of frames we have stored. When using the dimensionality reduction algorithms described in the sections that follow the first step is to calculate this matrix. Consequently, within PLUMED the following command will collect the trajectory data as your simulation progressed and calculate the dissimilarities:
By exploiting the functionality described in Distances from reference configurations you can calculate these dissimilarities in a wide variety of different ways (e.g. you can use RMSD, or you can use a collection of collective variable values see TARGET). If you wish to view this dissimilarity information you can print these quantities to a file using:
In addition, if PLUMED does not calculate the dissimilarities you need you can read this information from an external file
N.B. You can only use the two commands above when you are doing post-processing.
Landmark Selection
Many of the techniques described in the following sections are very computationally expensive to run on large trajectories. A common strategy is thus to use a landmark selection algorithm to pick a particularly-representative subset of trajectory frames and to only apply the expensive analysis algorithm on these configurations. The various landmark selection algorithms that are available in PLUMED are as follows
In general most of these landmark selection algorithms must be used in tandem with a dissimilarity matrix object as as follows:
Click on the labels of the actions for more information on what each action computes
d1: DISTANCE =1,2 The DISTANCE action with label d1 calculates a single scalar value
d2: DISTANCE =3,4 The DISTANCE action with label d2 calculates a single scalar value
d3: DISTANCE =5,6 The DISTANCE action with label d3 calculates a single scalar value
data: COLLECT_FRAMES =d1,d2,d3 =1 The COLLECT_FRAMES action with label data calculates the following quantities:
Quantity | Description |
data.d1 | the label of this action is set by user in the input. See documentation above. |
data.d2 | the label of this action is set by user in the input. See documentation above. |
data.d3 | the label of this action is set by user in the input. See documentation above. |
ss1: EUCLIDEAN_DISSIMILARITIES =data The EUCLIDEAN_DISSIMILARITIES action with label ss1
ll2: LANDMARK_SELECT_FPS =ss1 =300 The LANDMARK_SELECT_FPS action with label ll2
When landmark selection is performed in this way a weight is ascribed to each of the landmark configurations. This weight is calculated by summing the weights of all the trajectory frames in each of the landmarks Voronoi polyhedron (https://en.wikipedia.org/wiki/Voronoi_diagram). The weight of each trajectory frame is one unless you are reweighting using the formula described in the Reweighting and Averaging to counteract the fact of a simulation bias or an elevated temperature. If you are reweighting using these formula the weight of each of the points is equal to the exponential term in the numerator of these expressions.
Dimensionality Reduction
Many dimensionality reduction algorithms work in a manner similar to the way we use when we make maps. You start with distances between London, Belfast, Paris and Dublin and then you try to arrange points on a piece of paper so that the (suitably transformed) distances between the points in your map representing each of those cities are related to the true distances between the cities.
Stating this more mathematically MDS endeavors to find an isometry between points distributed in a high-dimensional space and a set of points distributed in a low-dimensional plane.
In other words, if we have \(M\) \(D\)-dimensional points, \(\mathbf{X}\), and we can calculate dissimilarities between pairs them, \(D_{ij}\), we can, with an MDS calculation, try to create \(M\) projections, \(\mathbf{x}\), of the high dimensionality points in a \(d\)-dimensional linear space by trying to arrange the projections so that the Euclidean distances between pairs of them, \(d_{ij}\), resemble the dissimilarities between the high dimensional points. In short we minimize:
\[ \chi^2 = \sum_{i \ne j} w_i w_j \left( F(D_{ij}) - f(d_{ij}) \right)^2 \]
where \(F(D_{ij})\) is some transformation of the distance between point \(X^{i}\) and point \(X^{j}\) and \(f(d_{ij})\) is some transformation of the distance between the projection of \(X^{i}\), \(x^i\), and the projection of \(X^{j}\), \(x^j\). \(w_i\) and \(w_j\) are the weights of configurations \(X^i\) and \(^j\) respectively. These weights are calculated using the reweighting and Voronoi polyhedron approaches described in previous sections. A tutorial on dimensionality reduction and how it can be used to analyze simulations can be found in the tutorial lugano-5 and in the following short video.
Within PLUMED running an input to run a dimensionality reduction algorithm can be as simple as:
Click on the labels of the actions for more information on what each action computes
d1: DISTANCE =1,2 The DISTANCE action with label d1 calculates a single scalar value
d2: DISTANCE =3,4 The DISTANCE action with label d2 calculates a single scalar value
d3: DISTANCE =5,6 The DISTANCE action with label d3 calculates a single scalar value
data: COLLECT_FRAMES =1 =d1,d2,d3 The COLLECT_FRAMES action with label data calculates the following quantities:
Quantity | Description |
data.d1 | the label of this action is set by user in the input. See documentation above. |
data.d2 | the label of this action is set by user in the input. See documentation above. |
data.d3 | the label of this action is set by user in the input. See documentation above. |
ss1: EUCLIDEAN_DISSIMILARITIES =data The EUCLIDEAN_DISSIMILARITIES action with label ss1
mds: CLASSICAL_MDS =ss1 =2 The CLASSICAL_MDS action with label mds calculates the following quantities:
Quantity | Description |
mds.coord-1 | the low-dimensional projections of the various input configurations This is the 1th of these quantities |
mds.coord-2 | the low-dimensional projections of the various input configurations This is the 2th of these quantities |
Where we have to use the EUCLIDEAN_DISSIMILARITIES action here in order to calculate the matrix of dissimilarities between trajectory frames. We can even throw some landmark selection into this procedure and perform
Click on the labels of the actions for more information on what each action computes
d1: DISTANCE =1,2 The DISTANCE action with label d1 calculates a single scalar value
d2: DISTANCE =3,4 The DISTANCE action with label d2 calculates a single scalar value
d3: DISTANCE =5,6 The DISTANCE action with label d3 calculates a single scalar value
data: COLLECT_FRAMES =1 =d1,d2,d3 The COLLECT_FRAMES action with label data calculates the following quantities:
Quantity | Description |
data.d1 | the label of this action is set by user in the input. See documentation above. |
data.d2 | the label of this action is set by user in the input. See documentation above. |
data.d3 | the label of this action is set by user in the input. See documentation above. |
matrix: EUCLIDEAN_DISSIMILARITIES =data The EUCLIDEAN_DISSIMILARITIES action with label matrix
ll2: LANDMARK_SELECT_FPS =matrix =300 The LANDMARK_SELECT_FPS action with label ll2
mds: CLASSICAL_MDS =ll2 =2 The CLASSICAL_MDS action with label mds calculates the following quantities:
Quantity | Description |
mds.coord-1 | the low-dimensional projections of the various input configurations This is the 1th of these quantities |
mds.coord-2 | the low-dimensional projections of the various input configurations This is the 2th of these quantities |
osample: PROJECT_ALL_ANALYSIS_DATA =matrix =mds The PROJECT_ALL_ANALYSIS_DATA action with label osample calculates the following quantities:
Quantity | Description |
osample.coord-1 | the low-dimensional projections of the various input configurations This is the 1th of these quantities |
osample.coord-2 | the low-dimensional projections of the various input configurations This is the 2th of these quantities |
Notice here that the final command allows us to calculate the projections of all the non-landmark points that were collected by the action with label matrix.
Dimensionality can be more complicated, however, because the stress function that calculates \(\chi^2\) has to optimized rather carefully using a number of different algorithms. The various algorithms that can be used to optimize this function are described below
CLASSICAL_MDS | Create a low-dimensional projection of a trajectory using the classical multidimensional scaling algorithm. |
OUTPUT_PCA_PROJECTION | This is used to output the projection calculated by principle component analysis |
PCA | Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input. |
PROJECT_ALL_ANALYSIS_DATA | Find projections of all non-landmark points using the embedding calculated by a dimensionality reduction optimization calculation. |
SKETCHMAP_CONJGRAD | Optimize the sketch-map stress function using conjugate gradients. |
SKETCHMAP_POINTWISE | Optimize the sketch-map stress function using a pointwise global optimization algorithm. |
SKETCHMAP_READ | Read in a sketch-map projection from an input file |
SKETCHMAP_SMACOF | Optimize the sketch-map stress function using the SMACOF algorithm. |
SKETCH_MAP | This can be used to output the data that has been stored in an Analysis object. |
SMACOF_MDS | Optimize the multidimensional scaling stress function using the SMACOF algorithm. |
Outputting the results from analysis algorithms
The following methods are available for printing the result output by the various analysis algorithms:
Using the above commands to output the data from any form of analysis is important as the STRIDE with which you output the data to a COLVAR or PDB file controls how frequently the analysis is performed on the collected data . If you specified no stride on the output lines then PLUMED assumes you want to perform analysis on the entire trajectory.
If you use the above commands to output data from one of the Landmark Selection algorithms then only the second will give you information on the atomic positions in your landmark configurations and their associated weights. The first of these commands will give the values of the colvars in the landmark configurations only. If you use the above commands to output data from one of the Dimensionality Reduction algorithms then OUTPUT_ANALYSIS_DATA_TO_COLVAR will give you an output file that contains the projection for each of your input points. OUTPUT_ANALYSIS_DATA_TO_PDB will give you a PDB that contains the position of the input point, the projections and the weight of the configuration.
A nice feature of plumed is that when you use Landmark Selection algorithms or Dimensionality Reduction algorithms the output information is just a vector of variables. As such you can use HISTOGRAM to construct a histogram of the information generated by these algorithms.