This is part of the dimred module
	It is only available if you configure PLUMED with ./configure –enable-modules=dimred . Furthermore, this feature is still being developed so take care when using it and report any problems on the mailing list.

Create a low-dimensional projection of a trajectory using the classical multidimensional scaling algorithm.

Multidimensional scaling (MDS) is similar to what is done when you make a map. You start with distances between London, Belfast, Paris and Dublin and then you try to arrange points on a piece of paper so that the (suitably scaled) distances between the points in your map representing each of those cities are related to the true distances between the cities. Stating this more mathematically MDS endeavors to find an isometry between points distributed in a high-dimensional space and a set of points distributed in a low-dimensional plane. In other words, if we have $M$ $D$-dimensional points, $\mathbf{X}$, and we can calculate dissimilarities between pairs them, $D_{ij}$, we can, with an MDS calculation, try to create $M$ projections, $\mathbf{x}$, of the high dimensionality points in a $d$-dimensional linear space by trying to arrange the projections so that the Euclidean distances between pairs of them, $d_{ij}$, resemble the dissimilarities between the high dimensional points. In short we minimize:

\[ \chi^2 = \sum_{i \ne j} \left( D_{ij} - d_{ij} \right)^2 \]

where $D_{ij}$ is the distance between point $X^{i}$ and point $X^{j}$ and $d_{ij}$ is the distance between the projection of $X^{i}$, $x^i$, and the projection of $X^{j}$, $x^j$. A tutorial on this approach can be used to analyze simulations can be found in the tutorial Lugano tutorial: Dimensionality reduction and in the following short video.

Examples

The following command instructs plumed to construct a classical multidimensional scaling projection of a trajectory. The RMSD distance between atoms 1-256 have moved is used to measure the distances in the high-dimensional space.

Click on the labels of the actions for more information on what each action computes

data: COLLECT_FRAMES ATOMSthe atoms whose positions we are tracking for the purpose of analyzing the data 
=1-256 The COLLECT_FRAMES action with label data
mat: EUCLIDEAN_DISSIMILARITIES USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object 
=data The EUCLIDEAN_DISSIMILARITIES action with label mat
mds: CLASSICAL_MDS USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object 
=mat NLOW_DIMcompulsory keyword 
number of low-dimensional coordinates required 
=2 The CLASSICAL_MDS action with label mds calculates the following quantities:

 Quantity    Description  
mds.coord-1 the low-dimensional projections of the various input configurations  This is the 1th of these quantities
mds.coord-2 the low-dimensional projections of the various input configurations  This is the 2th of these quantities


OUTPUT_ANALYSIS_DATA_TO_COLVAR USE_OUTPUT_DATA_FROMuse the output of the analysis performed by this object as input to your new analysis
object 
=mds FILEcompulsory keyword 
the name of the file to output to 
=rmsd-embed The OUTPUT_ANALYSIS_DATA_TO_COLVAR action with label

The following section is for people who are interested in how this method works in detail. A solid understanding of this material is not necessary to use MDS.

Method of optimization

The stress function can be minimized using a standard optimization algorithm such as conjugate gradients or steepest descent. However, it is more common to do this minimization using a technique known as classical scaling. Classical scaling works by recognizing that each of the distances $D_{ij}$ in the above sum can be written as:

\[ D_{ij}^2 = \sum_{\alpha} (X^i_\alpha - X^j_\alpha)^2 = \sum_\alpha (X^i_\alpha)^2 + (X^j_\alpha)^2 - 2X^i_\alpha X^j_\alpha \]

We can use this expression and matrix algebra to calculate multiple distances at once. For instance if we have three points, $\mathbf{X}$, we can write distances between them as:

\begin{eqnarray*} D^2(\mathbf{X}) &=& \left[ \begin{array}{ccc} 0 & d_{12}^2 & d_{13}^2 \\ d_{12}^2 & 0 & d_{23}^2 \\ d_{13}^2 & d_{23}^2 & 0 \end{array}\right] \\ &=& \sum_\alpha \left[ \begin{array}{ccc} (X^1_\alpha)^2 & (X^1_\alpha)^2 & (X^1_\alpha)^2 \\ (X^2_\alpha)^2 & (X^2_\alpha)^2 & (X^2_\alpha)^2 \\ (X^3_\alpha)^2 & (X^3_\alpha)^2 & (X^3_\alpha)^2 \\ \end{array}\right] + \sum_\alpha \left[ \begin{array}{ccc} (X^1_\alpha)^2 & (X^2_\alpha)^2 & (X^3_\alpha)^2 \\ (X^1_\alpha)^2 & (X^2_\alpha)^2 & (X^3_\alpha)^2 \\ (X^1_\alpha)^2 & (X^2_\alpha)^2 & (X^3_\alpha)^2 \\ \end{array}\right] - 2 \sum_\alpha \left[ \begin{array}{ccc} X^1_\alpha X^1_\alpha & X^1_\alpha X^2_\alpha & X^1_\alpha X^3_\alpha \\ X^2_\alpha X^1_\alpha & X^2_\alpha X^2_\alpha & X^2_\alpha X^3_\alpha \\ X^1_\alpha X^3_\alpha & X^3_\alpha X^2_\alpha & X^3_\alpha X^3_\alpha \end{array}\right] \nonumber \\ &=& \mathbf{c 1^T} + \mathbf{1 c^T} - 2 \sum_\alpha \mathbf{x}_a \mathbf{x}^T_a = \mathbf{c 1^T} + \mathbf{1 c^T} - 2\mathbf{X X^T} \end{eqnarray*}

This last equation can be extended to situations when we have more than three points. In it $\mathbf{X}$ is a matrix that has one high-dimensional point on each of its rows and $\mathbf{X^T}$ is its transpose. $\mathbf{1}$ is an $M \times 1$ vector of ones and $\mathbf{c}$ is a vector with components given by:

\[ c_i = \sum_\alpha (x_\alpha^i)^2 \]

These quantities are the diagonal elements of $\mathbf{X X^T}$, which is a dot product or Gram Matrix that contains the dot product of the vector $X_i$ with the vector $X_j$ in element $i,j$.

In classical scaling we introduce a centering matrix $\mathbf{J}$ that is given by:

\[ \mathbf{J} = \mathbf{I} - \frac{1}{M} \mathbf{11^T} \]

where $\mathbf{I}$ is the identity. Multiplying the equations above from the front and back by this matrix and a factor of a $-\frac{1}{2}$ gives:

\begin{eqnarray*} -\frac{1}{2} \mathbf{J} \mathbf{D}^2(\mathbf{X}) \mathbf{J} &=& -\frac{1}{2}\mathbf{J}( \mathbf{c 1^T} + \mathbf{1 c^T} - 2\mathbf{X X^T})\mathbf{J} \\ &=& -\frac{1}{2}\mathbf{J c 1^T J} - \frac{1}{2} \mathbf{J 1 c^T J} + \frac{1}{2} \mathbf{J}(2\mathbf{X X^T})\mathbf{J} \\ &=& \mathbf{ J X X^T J } = \mathbf{X X^T } \label{eqn:scaling} \end{eqnarray*}

The fist two terms in this expression disappear because $\mathbf{1^T J}=\mathbf{J 1} =\mathbf{0}$, where $\mathbf{0}$ is a matrix containing all zeros. In the final step meanwhile we use the fact that the matrix of squared distances will not change when we translate all the points. We can thus assume that the mean value, $\mu$, for each of the components, $\alpha$:

\[ \mu_\alpha = \frac{1}{M} \sum_{i=1}^N \mathbf{X}^i_\alpha \]

is equal to 0 so the columns of $\mathbf{X}$ add up to 0. This in turn means that each of the columns of $\mathbf{X X^T}$ adds up to zero, which is what allows us to write $\mathbf{ J X X^T J } = \mathbf{X X^T }$.

The matrix of squared distances is symmetric and positive-definite we can thus use the spectral decomposition to decompose it as:

\[ \Phi= \mathbf{V} \Lambda \mathbf{V}^T \]

Furthermore, because the matrix we are diagonalizing, $\mathbf{X X^T}$, is the product of a matrix and its transpose we can use this decomposition to write:

\[ \mathbf{X} =\mathbf{V} \Lambda^\frac{1}{2} \]

Much as in PCA there are generally a small number of large eigenvalues in $\Lambda$ and many small eigenvalues. We can safely use only the large eigenvalues and their corresponding eigenvectors to express the relationship between the coordinates $\mathbf{X}$. This gives us our set of low-dimensional projections.

This derivation makes a number of assumptions about the how the low dimensional points should best be arranged to minimize the stress. If you use an interactive optimization algorithm such as SMACOF you may thus be able to find a better (lower-stress) projection of the points. For more details on the assumptions made see this website.

Glossary of keywords and components

Description of components

By default the value of the calculated quantity can be referenced elsewhere in the input file by using the label of the action. Alternatively this Action can be used to calculate the following quantities by employing the keywords listed below. These quantities can be referenced elsewhere in the input by using this Action's label followed by a dot and the name of the quantity required from the list below.

Quantity	Description
coord	the low-dimensional projections of the various input configurations

In addition the following quantities can be calculated by employing the keywords listed below

Quantity	Keyword	Description
gradient	GRADIENT	the gradient
vmean	VMEAN	the norm of the mean vector. The output component can be referred to elsewhere in the input file by using the label.vmean
vsum	VSUM	the norm of sum of vectors. The output component can be referred to elsewhere in the input file by using the label.vsum
spath	SPATH	the position on the path
gspath	GPATH	the position on the path calculated using trigonometry
gzpath	GPATH	the distance from the path calculated using trigonometry
zpath	ZPATH	the distance from the path
altmin	ALT_MIN	the minimum value. This is calculated using the formula described in the description of the keyword so as to make it continuous.
between	BETWEEN	the number/fraction of values within a certain range. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters.
highest	HIGHEST	the highest of the quantities calculated by this action
lessthan	LESS_THAN	the number of values less than a target value. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters.
lowest	LOWEST	the lowest of the quantities calculated by this action
max	MAX	the maximum value. This is calculated using the formula described in the description of the keyword so as to make it continuous.
mean	MEAN	the mean value. The output component can be referred to elsewhere in the input file by using the label.mean
min	MIN	the minimum value. This is calculated using the formula described in the description of the keyword so as to make it continuous.
moment	MOMENTS	the central moments of the distribution of values. The second moment would be referenced elsewhere in the input file using label.moment-2, the third as label.moment-3, etc.
morethan	MORE_THAN	the number of values more than a target value. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters.
sum	SUM	the sum of values

The data to analyze can be the output from another analysis algorithm

USE_OUTPUT_DATA_FROM

use the output of the analysis performed by this object as input to your new analysis object

Compulsory keywords

NLOW_DIM

number of low-dimensional coordinates required

Options

SERIAL

( default=off ) do the calculation in serial. Do not use MPI

LOWMEM

( default=off ) lower the memory requirements

Quantity	Description
mds.coord-1	the low-dimensional projections of the various input configurations This is the 1th of these quantities
mds.coord-2	the low-dimensional projections of the various input configurations This is the 2th of these quantities