CLASSICAL_MDS
This is part of the dimred module
It is only available if you configure PLUMED with ./configure –enable-modules=dimred . Furthermore, this feature is still being developed so take care when using it and report any problems on the mailing list.

Create a low-dimensional projection of a trajectory using the classical multidimensional scaling algorithm.

Multidimensional scaling (MDS) is similar to what is done when you make a map. You start with distances between London, Belfast, Paris and Dublin and then you try to arrange points on a piece of paper so that the (suitably scaled) distances between the points in your map representing each of those cities are related to the true distances between the cities. Stating this more mathematically MDS endeavors to find an isometry between points distributed in a high-dimensional space and a set of points distributed in a low-dimensional plane. In other words, if we have \(M\) \(D\)-dimensional points, \(\mathbf{X}\), and we can calculate dissimilarities between pairs them, \(D_{ij}\), we can, with an MDS calculation, try to create \(M\) projections, \(\mathbf{x}\), of the high dimensionality points in a \(d\)-dimensional linear space by trying to arrange the projections so that the Euclidean distances between pairs of them, \(d_{ij}\), resemble the dissimilarities between the high dimensional points. In short we minimize:

\[ \chi^2 = \sum_{i \ne j} \left( D_{ij} - d_{ij} \right)^2 \]

where \(D_{ij}\) is the distance between point \(X^{i}\) and point \(X^{j}\) and \(d_{ij}\) is the distance between the projection of \(X^{i}\), \(x^i\), and the projection of \(X^{j}\), \(x^j\). A tutorial on this approach can be used to analyze simulations can be found in the tutorial Lugano tutorial: Dimensionality reduction and in the following short video.

Examples

The following command instructs plumed to construct a classical multidimensional scaling projection of a trajectory. The RMSD distance between atoms 1-256 have moved is used to measure the distances in the high-dimensional space.

Click on the labels of the actions for more information on what each action computes
tested on v2.7
data: COLLECT_FRAMES 
ATOMS
the atoms whose positions we are tracking for the purpose of analyzing the data
=1-256 mat: EUCLIDEAN_DISSIMILARITIES
USE_OUTPUT_DATA_FROM
use the output of the analysis performed by this object as input to your new analysis object
=data mds: CLASSICAL_MDS
USE_OUTPUT_DATA_FROM
use the output of the analysis performed by this object as input to your new analysis object
=mat
NLOW_DIM
compulsory keyword number of low-dimensional coordinates required
=2 OUTPUT_ANALYSIS_DATA_TO_COLVAR
USE_OUTPUT_DATA_FROM
use the output of the analysis performed by this object as input to your new analysis object
=mds
FILE
compulsory keyword the name of the file to output to
=rmsd-embed

The following section is for people who are interested in how this method works in detail. A solid understanding of this material is not necessary to use MDS.

Method of optimization

The stress function can be minimized using a standard optimization algorithm such as conjugate gradients or steepest descent. However, it is more common to do this minimization using a technique known as classical scaling. Classical scaling works by recognizing that each of the distances $D_{ij}$ in the above sum can be written as:

\[ D_{ij}^2 = \sum_{\alpha} (X^i_\alpha - X^j_\alpha)^2 = \sum_\alpha (X^i_\alpha)^2 + (X^j_\alpha)^2 - 2X^i_\alpha X^j_\alpha \]

We can use this expression and matrix algebra to calculate multiple distances at once. For instance if we have three points, \(\mathbf{X}\), we can write distances between them as:

\begin{eqnarray*} D^2(\mathbf{X}) &=& \left[ \begin{array}{ccc} 0 & d_{12}^2 & d_{13}^2 \\ d_{12}^2 & 0 & d_{23}^2 \\ d_{13}^2 & d_{23}^2 & 0 \end{array}\right] \\ &=& \sum_\alpha \left[ \begin{array}{ccc} (X^1_\alpha)^2 & (X^1_\alpha)^2 & (X^1_\alpha)^2 \\ (X^2_\alpha)^2 & (X^2_\alpha)^2 & (X^2_\alpha)^2 \\ (X^3_\alpha)^2 & (X^3_\alpha)^2 & (X^3_\alpha)^2 \\ \end{array}\right] + \sum_\alpha \left[ \begin{array}{ccc} (X^1_\alpha)^2 & (X^2_\alpha)^2 & (X^3_\alpha)^2 \\ (X^1_\alpha)^2 & (X^2_\alpha)^2 & (X^3_\alpha)^2 \\ (X^1_\alpha)^2 & (X^2_\alpha)^2 & (X^3_\alpha)^2 \\ \end{array}\right] - 2 \sum_\alpha \left[ \begin{array}{ccc} X^1_\alpha X^1_\alpha & X^1_\alpha X^2_\alpha & X^1_\alpha X^3_\alpha \\ X^2_\alpha X^1_\alpha & X^2_\alpha X^2_\alpha & X^2_\alpha X^3_\alpha \\ X^1_\alpha X^3_\alpha & X^3_\alpha X^2_\alpha & X^3_\alpha X^3_\alpha \end{array}\right] \nonumber \\ &=& \mathbf{c 1^T} + \mathbf{1 c^T} - 2 \sum_\alpha \mathbf{x}_a \mathbf{x}^T_a = \mathbf{c 1^T} + \mathbf{1 c^T} - 2\mathbf{X X^T} \end{eqnarray*}

This last equation can be extended to situations when we have more than three points. In it \(\mathbf{X}\) is a matrix that has one high-dimensional point on each of its rows and \(\mathbf{X^T}\) is its transpose. \(\mathbf{1}\) is an \(M \times 1\) vector of ones and \(\mathbf{c}\) is a vector with components given by:

\[ c_i = \sum_\alpha (x_\alpha^i)^2 \]

These quantities are the diagonal elements of \(\mathbf{X X^T}\), which is a dot product or Gram Matrix that contains the dot product of the vector \(X_i\) with the vector \(X_j\) in element \(i,j\).

In classical scaling we introduce a centering matrix \(\mathbf{J}\) that is given by:

\[ \mathbf{J} = \mathbf{I} - \frac{1}{M} \mathbf{11^T} \]

where \(\mathbf{I}\) is the identity. Multiplying the equations above from the front and back by this matrix and a factor of a \(-\frac{1}{2}\) gives:

\begin{eqnarray*} -\frac{1}{2} \mathbf{J} \mathbf{D}^2(\mathbf{X}) \mathbf{J} &=& -\frac{1}{2}\mathbf{J}( \mathbf{c 1^T} + \mathbf{1 c^T} - 2\mathbf{X X^T})\mathbf{J} \\ &=& -\frac{1}{2}\mathbf{J c 1^T J} - \frac{1}{2} \mathbf{J 1 c^T J} + \frac{1}{2} \mathbf{J}(2\mathbf{X X^T})\mathbf{J} \\ &=& \mathbf{ J X X^T J } = \mathbf{X X^T } \label{eqn:scaling} \end{eqnarray*}

The fist two terms in this expression disappear because \(\mathbf{1^T J}=\mathbf{J 1} =\mathbf{0}\), where \(\mathbf{0}\) is a matrix containing all zeros. In the final step meanwhile we use the fact that the matrix of squared distances will not change when we translate all the points. We can thus assume that the mean value, \(\mu\), for each of the components, \(\alpha\):

\[ \mu_\alpha = \frac{1}{M} \sum_{i=1}^N \mathbf{X}^i_\alpha \]

is equal to 0 so the columns of \(\mathbf{X}\) add up to 0. This in turn means that each of the columns of \(\mathbf{X X^T}\) adds up to zero, which is what allows us to write \(\mathbf{ J X X^T J } = \mathbf{X X^T }\).

The matrix of squared distances is symmetric and positive-definite we can thus use the spectral decomposition to decompose it as:

\[ \Phi= \mathbf{V} \Lambda \mathbf{V}^T \]

Furthermore, because the matrix we are diagonalizing, \(\mathbf{X X^T}\), is the product of a matrix and its transpose we can use this decomposition to write:

\[ \mathbf{X} =\mathbf{V} \Lambda^\frac{1}{2} \]

Much as in PCA there are generally a small number of large eigenvalues in \(\Lambda\) and many small eigenvalues. We can safely use only the large eigenvalues and their corresponding eigenvectors to express the relationship between the coordinates \(\mathbf{X}\). This gives us our set of low-dimensional projections.

This derivation makes a number of assumptions about the how the low dimensional points should best be arranged to minimize the stress. If you use an interactive optimization algorithm such as SMACOF you may thus be able to find a better (lower-stress) projection of the points. For more details on the assumptions made see this website.

Glossary of keywords and components
Description of components

By default the value of the calculated quantity can be referenced elsewhere in the input file by using the label of the action. Alternatively this Action can be used to calculate the following quantities by employing the keywords listed below. These quantities can be referenced elsewhere in the input by using this Action's label followed by a dot and the name of the quantity required from the list below.

Quantity Description
coord the low-dimensional projections of the various input configurations

In addition the following quantities can be calculated by employing the keywords listed below

Quantity Keyword Description
gradient GRADIENT the gradient
vmean VMEAN the norm of the mean vector. The output component can be referred to elsewhere in the input file by using the label.vmean
vsum VSUM the norm of sum of vectors. The output component can be referred to elsewhere in the input file by using the label.vsum
spath SPATH the position on the path
gspath GPATH the position on the path calculated using trigonometry
gzpath GPATH the distance from the path calculated using trigonometry
zpath ZPATH the distance from the path
altmin ALT_MIN the minimum value. This is calculated using the formula described in the description of the keyword so as to make it continuous.
between BETWEEN the number/fraction of values within a certain range. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters.
highest HIGHEST the highest of the quantities calculated by this action
lessthan LESS_THAN the number of values less than a target value. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters.
lowest LOWEST the lowest of the quantities calculated by this action
max MAX the maximum value. This is calculated using the formula described in the description of the keyword so as to make it continuous.
mean MEAN the mean value. The output component can be referred to elsewhere in the input file by using the label.mean
min MIN the minimum value. This is calculated using the formula described in the description of the keyword so as to make it continuous.
moment MOMENTS the central moments of the distribution of values. The second moment would be referenced elsewhere in the input file using label.moment-2, the third as label.moment-3, etc.
morethan MORE_THAN the number of values more than a target value. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters.
sum SUM the sum of values
The data to analyze can be the output from another analysis algorithm
USE_OUTPUT_DATA_FROM use the output of the analysis performed by this object as input to your new analysis object
Compulsory keywords
NLOW_DIM number of low-dimensional coordinates required
Options
SERIAL ( default=off ) do the calculation in serial. Do not use MPI
LOWMEM

( default=off ) lower the memory requirements