CLASSICAL_MDS

This is part of the analysis module |

Create a low-dimensional projection of a trajectory using the classical multidimensional scaling algorithm.

Multidimensional scaling (MDS) is similar to what is done when you make a map. You start with distances between London, Belfast, Paris and Dublin and then you try to arrange points on a piece of paper so that the (suitably scaled) distances between the points in your map representing each of those cities are related to the true distances between the cities. Stating this more mathematically MDS endeavors to find an isometry between points distributed in a high-dimensional space and a set of points distributed in a low-dimensional plane. In other words, if we have \(M\) \(D\)-dimensional points, \(\mathbf{X}\), and we can calculate dissimilarities between pairs them, \(D_{ij}\), we can, with an MDS calculation, try to create \(M\) projections, \(\mathbf{x}\), of the high dimensionality points in a \(d\)-dimensional linear space by trying to arrange the projections so that the Euclidean distances between pairs of them, \(d_{ij}\), resemble the dissimilarities between the high dimensional points. In short we minimize:

\[ \chi^2 = \sum_{i \ne j} \left( D_{ij} - d_{ij} \right)^2 \]

where \(D_{ij}\) is the distance between point \(X^{i}\) and point \(X^{j}\) and \(d_{ij}\) is the distance between the projection of \(X^{i}\), \(x^i\), and the projection of \(X^{j}\), \(x^j\). A tutorial on this approach can be used to analyse simulations can be found in the tutorial Belfast tutorial: Adaptive variables II and in the following short video.

- The atoms involved can be specified using

ATOMS | the atoms whose positions we are tracking for the purpose of analysing the data. For more information on how to specify lists of atoms see Groups and Virtual Atoms |

- Compulsory keywords

METRIC | ( default=EUCLIDEAN ) how are we measuring the distances between configurations |

STRIDE | ( default=1 ) the frequency with which data should be stored for analysis |

RUN | the frequency with which to run the analysis algorithm. This is not required if you specify USE_ALL_DATA |

LANDMARKS | ( default=ALL ) only use a subset of the data that was collected. For more information on the landmark selection algorithms that are available in plumed see landmarkselection. |

NLOW_DIM | number of low-dimensional coordinates required |

OUTPUT_FILE | file on which to output the final embedding coordinates |

EMBEDDING_OFILE | ( default=dont output ) file on which to output the embedding in plumed input format |

- Options

USE_ALL_DATA | ( default=off ) use the data from the entire trajectory to perform the analysis |

REWEIGHT_BIAS | ( default=off ) reweight the data using all the biases acting on the dynamics. For more information see reweighting. |

WRITE_CHECKPOINT | ( default=off ) write out a checkpoint so that the analysis can be restarted in a later run |

SERIAL | ( default=off ) do the calculation in serial. Do not parallelize |

LOWMEM | ( default=off ) lower the memory requirements |

ARG | the input for this action is the scalar output from one or more other actions. The particular scalars that you will use are referenced using the label of the action. If the label appears on its own then it is assumed that the Action calculates a single scalar value. The value of this scalar is thus used as the input to this new action. If * or *.* appears the scalars calculated by all the proceding actions in the input file are taken. Some actions have multi-component outputs and each component of the output has a specific label. For example a DISTANCE action labelled dist may have three componets x, y and z. To take just the x component you should use dist.x, if you wish to take all three components then use dist.*.More information on the referencing of Actions can be found in the section of the manual on the PLUMED Getting started. Scalar values can also be referenced using POSIX regular expressions as detailed in the section on Regular Expressions. To use this feature you you must compile PLUMED with the appropriate flag. |

FMT | the format that should be used in analysis output files |

TEMP | the system temperature. This is required if you are reweighting or doing free energies. |

REWEIGHT_TEMP | reweight data from a trajectory at one temperature and output the probability distribution at a second temperature. For more information see reweighting. This is not possible during postprocessing. |

- Examples

The following command instructs plumed to construct a classical multidimensional scaling projection of a trajectory. The RMSD distance between atoms 1-256 have moved is used to measure the distances in the high-dimensional space.

CLASSICAL_MDS ... ATOMS=1-256 METRIC=OPTIMAL-FAST USE_ALL_DATA NLOW_DIM=2 OUTPUT_FILE=rmsd-embed ... CLASSICAL_MDS

The following section is for people who are interested in how this method works in detail. A solid understanding of this material is not necessary to use MDS.

The stress function can be minimized using a standard optimization algorithm such as conjugate gradients or steepest descent. However, it is more common to do this minimization using a technique known as classical scaling. Classical scaling works by recognizing that each of the distances $D_{ij}$ in the above sum can be written as:

\[ D_{ij}^2 = \sum_{\alpha} (X^i_\alpha - X^j_\alpha)^2 = \sum_\alpha (X^i_\alpha)^2 + (X^j_\alpha)^2 - 2X^i_\alpha X^j_\alpha \]

We can use this expression and matrix algebra to calculate multiple distances at once. For instance if we have three points, \(\mathbf{X}\), we can write distances between them as:

\begin{eqnarray*} D^2(\mathbf{X}) &=& \left[ \begin{array}{ccc} 0 & d_{12}^2 & d_{13}^2 \\ d_{12}^2 & 0 & d_{23}^2 \\ d_{13}^2 & d_{23}^2 & 0 \end{array}\right] \\ &=& \sum_\alpha \left[ \begin{array}{ccc} (X^1_\alpha)^2 & (X^1_\alpha)^2 & (X^1_\alpha)^2 \\ (X^2_\alpha)^2 & (X^2_\alpha)^2 & (X^2_\alpha)^2 \\ (X^3_\alpha)^2 & (X^3_\alpha)^2 & (X^3_\alpha)^2 \\ \end{array}\right] + \sum_\alpha \left[ \begin{array}{ccc} (X^1_\alpha)^2 & (X^2_\alpha)^2 & (X^3_\alpha)^2 \\ (X^1_\alpha)^2 & (X^2_\alpha)^2 & (X^3_\alpha)^2 \\ (X^1_\alpha)^2 & (X^2_\alpha)^2 & (X^3_\alpha)^2 \\ \end{array}\right] - 2 \sum_\alpha \left[ \begin{array}{ccc} X^1_\alpha X^1_\alpha & X^1_\alpha X^2_\alpha & X^1_\alpha X^3_\alpha \\ X^2_\alpha X^1_\alpha & X^2_\alpha X^2_\alpha & X^2_\alpha X^3_\alpha \\ X^1_\alpha X^3_\alpha & X^3_\alpha X^2_\alpha & X^3_\alpha X^3_\alpha \end{array}\right] \nonumber \\ &=& \mathbf{c 1^T} + \mathbf{1 c^T} - 2 \sum_\alpha \mathbf{x}_a \mathbf{x}^T_a = \mathbf{c 1^T} + \mathbf{1 c^T} - 2\mathbf{X X^T} \end{eqnarray*}

This last equation can be extended to situations when we have more than three points. In it \(\mathbf{X}\) is a matrix that has one high-dimensional point on each of its rows and \(\mathbf{X^T}\) is its transpose. \(\mathbf{1}\) is an \(M \times 1\) vector of ones and \(\mathbf{c}\) is a vector with components given by:

\[ c_i = \sum_\alpha (x_\alpha^i)^2 \]

These quantities are the diagonal elements of \(\mathbf{X X^T}\), which is a dot product or Gram Matrix that contains the dot product of the vector \(X_i\) with the vector \(X_j\) in element \(i,j\).

In classical scaling we introduce a centering matrix \(\mathbf{J}\) that is given by:

\[ \mathbf{J} = \mathbf{I} - \frac{1}{M} \mathbf{11^T} \]

where \(\mathbf{I}\) is the identity. Multiplying the equations above from the front and back by this matrix and a factor of a \(-\frac{1}{2}\) gives:

\begin{eqnarray*} -\frac{1}{2} \mathbf{J} \mathbf{D}^2(\mathbf{X}) \mathbf{J} &=& -\frac{1}{2}\mathbf{J}( \mathbf{c 1^T} + \mathbf{1 c^T} - 2\mathbf{X X^T})\mathbf{J} \\ &=& -\frac{1}{2}\mathbf{J c 1^T J} - \frac{1}{2} \mathbf{J 1 c^T J} + \frac{1}{2} \mathbf{J}(2\mathbf{X X^T})\mathbf{J} \\ &=& \mathbf{ J X X^T J } = \mathbf{X X^T } \label{eqn:scaling} \end{eqnarray*}

The fist two terms in this expression disappear because \(\mathbf{1^T J}=\mathbf{J 1} =\mathbf{0}\), where \(\mathbf{0}\) is a matrix containing all zeros. In the final step meanwhile we use the fact that the matrix of squared distances will not change when we translate all the points. We can thus assume that the mean value, \(\mu\), for each of the components, \(\alpha\):

\[ \mu_\alpha = \frac{1}{M} \sum_{i=1}^N \mathbf{X}^i_\alpha \]

is equal to 0 so the columns of \(\mathbf{X}\) add up to 0. This in turn means that each of the columns of \(\mathbf{X X^T}\) adds up to zero, which is what allows us to write \(\mathbf{ J X X^T J } = \mathbf{X X^T }\).

The matrix of squared distances is symmetric and positive-definite we can thus use the spectral decomposition to decompose it as:

\[ \Phi= \mathbf{V} \Lambda \mathbf{V}^T \]

Furthermore, because the matrix we are diagonalizing, \(\mathbf{X X^T}\), is the product of a matrix and its transpose we can use this decomposition to write:

\[ \mathbf{X} =\mathbf{V} \Lambda^\frac{1}{2} \]

Much as in PCA there are generally a small number of large eigenvalues in \(\Lambda\) and many small eigenvalues. We can safely use only the large eigenvalues and their corresponding eigenvectors to express the relationship between the coordinates \(\mathbf{X}\). This gives us our set of low-dimensional projections.

This derivation makes a number of assumptions about the how the low dimensional points should best be arranged to minimise the stress. If you use an interative optimization algorithm such as SMACOF you may thus be able to find a better (lower-stress) projection of the points. For more details on the assumptions made see this website.