   PAMM
 This is part of the pamm module It is only available if you configure PLUMED with ./configure –enable-modules=pamm . Furthermore, this feature is still being developed so take care when using it and report any problems on the mailing list.

Probabilistic analysis of molecular motifs.

Probabilistic analysis of molecular motifs (PAMM) was introduced in this paper [pamm]. The essence of this approach involves calculating some large set of collective variables for a set of atoms in a short trajectory and fitting this data using a Gaussian Mixture Model. The idea is that modes in these distributions can be used to identify features such as hydrogen bonds or secondary structure types.

The assumption within this implementation is that the fitting of the Gaussian mixture model has been done elsewhere by a separate code. You thus provide an input file to this action which contains the means, covariance matrices and weights for a set of Gaussian kernels, $$\{ \phi \}$$. The values and derivatives for the following set of quantities is then computed:

$s_k = \frac{ \phi_k}{ \sum_i \phi_i }$

Each of the $$\phi_k$$ is a Gaussian function that acts on a set of quantities calculated within a MultiColvar . These might be TORSIONS, DISTANCES, ANGLES or any one of the many symmetry functions that are available within MultiColvar actions. These quantities are then inserted into the set of $$n$$ kernels that are in the the input file. This will be done for multiple sets of values for the input quantities and a final quantity will be calculated by summing the above $$s_k$$ values or some transformation of the above. This sounds less complicated than it is and is best understood by looking through the example given below.

Warning
Mixing MultiColvar actions that are periodic with variables that are not periodic has not been tested
Examples

In this example I will explain in detail what the following input is computing:

Click on the labels of the actions for more information on what each action computes #SETTINGS MOLFILE=regtest/basic/rt32/helix.pdb
MOLINFO MOLTYPEcompulsory keyword ( default=protein )
what kind of molecule is contained in the pdb file - usually not needed since protein/RNA/DNA
are compatible =protein STRUCTUREcompulsory keyword
a file in pdb format containing a reference structure. =M1d.pdb
psi: TORSIONS ATOMS1the atoms involved in each of the torsion angles you wish to calculate. =@psi-2 ATOMS2the atoms involved in each of the torsion angles you wish to calculate. =@psi-3 ATOMS3the atoms involved in each of the torsion angles you wish to calculate. =@psi-4
phi: TORSIONS ATOMS1the atoms involved in each of the torsion angles you wish to calculate. =@phi-2 ATOMS2the atoms involved in each of the torsion angles you wish to calculate. =@phi-3 ATOMS3the atoms involved in each of the torsion angles you wish to calculate. =@phi-4
p: PAMM DATAcompulsory keyword
the multicolvars from which the pamm coordinates are calculated =phi,psi CLUSTERScompulsory keyword
the name of the file that contains the definitions of all the clusters =clusters.pamm MEAN1take the mean of these variables. ={COMPONENT=1}  MEAN2take the mean of these variables. ={COMPONENT=2}
PRINT ARGthe input for this action is the scalar output from one or more other actions. =p.mean-1,p.mean-2 FILEthe name of the file on which to output these quantities =colvar


The best place to start our explanation is to look at the contents of the clusters.pamm file

#! FIELDS height phi psi sigma_phi_phi sigma_phi_psi sigma_psi_phi sigma_psi_psi
#! SET multivariate von-misses
#! SET kerneltype gaussian
2.97197455E-0001     -1.91983118E+0000      2.25029540E+0000      2.45960237E-0001     -1.30615381E-0001     -1.30615381E-0001      2.40239117E-0001
2.29131448E-0002      1.39809354E+0000      9.54585380E-0002      9.61755708E-0002     -3.55657919E-0002     -3.55657919E-0002      1.06147253E-0001
5.06676398E-0001     -1.09648066E+0000     -7.17867907E-0001      1.40523052E-0001     -1.05385552E-0001     -1.05385552E-0001      1.63290557E-0001


This files contains the parameters of two two-dimensional Gaussian functions. Each of these Gaussian kernels has a weight, $$w_k$$, a vector that specifies the position of its center, $$\mathbf{c}_k$$, and a covariance matrix, $$\Sigma_k$$. The $$\phi_k$$ functions that we use to calculate our PAMM components are thus:

$\phi_k = \frac{w_k}{N_k} \exp\left( -(\mathbf{s} - \mathbf{c}_k)^T \Sigma^{-1}_k (\mathbf{s} - \mathbf{c}_k) \right)$

In the above $$N_k$$ is a normalization factor that is calculated based on $$\Sigma$$. The vector $$\mathbf{s}$$ is a vector of quantities that are calculated by the TORSIONS actions. This vector must be two dimensional and in this case each component is the value of a torsion angle. If we look at the two TORSIONS actions in the above we are calculating the $$\phi$$ and $$\psi$$ backbone torsional angles in a protein (Note the use of MOLINFO to make specification of atoms straightforward). We thus calculate the values of our 2 $$\{ \phi \}$$ kernels 3 times. The first time we use the $$\phi$$ and $$\psi$$ angles in the second residue of the protein, the second time it is the $$\phi$$ and $$\psi$$ angles of the third residue of the protein and the third time it is the $$\phi$$ and $$\psi$$ angles of the fourth residue in the protein. The final two quantities that are output by the print command, p.mean-1 and p.mean-2, are the averages over these three residues for the quantities:

$s_1 = \frac{ \phi_1}{ \phi_1 + \phi_2 }$

and

$s_2 = \frac{ \phi_2}{ \phi_1 + \phi_2 }$

There is a great deal of flexibility in this input. We can work with, and examine, any number of components, we can use any set of collective variables and compute these PAMM variables and we can transform the PAMM variables themselves in a large number of different ways when computing these sums.

Glossary of keywords and components
Description of components

When the label of this action is used as the input for a second you are not referring to a scalar quantity as you are in regular collective variables. The label is used to reference the full set of quantities calculated by the action. This is usual when using MultiColvar functions. Generally when doing this the set of PAMM variables will be referenced using the DATA keyword rather than ARG.

This Action can be used to calculate the following scalar quantities directly from the underlying set of PAMM variables. These quantities are calculated by employing the keywords listed below and they can be referenced elsewhere in the input file by using this Action's label followed by a dot and the name of the quantity. The particular PAMM variable that should be averaged in a MEAN command or transformed by a switching function in a LESS_THAN command is specified using the COMPONENT keyword. COMPONENT=1 refers to the PAMM variable in which the first kernel in your input file is on the numerator, COMPONENT=2 refers to PAMM variable in which the second kernel in the input file is on the numerator and so on. The same quantity can be calculated multiple times for different PAMM components by a single PAMM action. In this case the relevant keyword must appear multiple times on the input line followed by a numerical identifier i.e. MEAN1, MEAN2, ... The quantities calculated when multiple MEAN commands appear on the input line can be reference elsewhere in the input file by using the name of the quantity followed followed by a numerical identifier e.g. label.lessthan-1, label.lessthan-2 etc. Alternatively, you can customize the labels of the quantities by using the LABEL keyword in the description of the keyword.

 Quantity Keyword Description altmin ALT_MIN the minimum value. This is calculated using the formula described in the description of the keyword so as to make it continuous. between BETWEEN the number/fraction of values within a certain range. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters. highest HIGHEST the highest of the quantities calculated by this action lessthan LESS_THAN the number of values less than a target value. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters. lowest LOWEST the lowest of the quantities calculated by this action max MAX the maximum value. This is calculated using the formula described in the description of the keyword so as to make it continuous. mean MEAN the mean value. The output component can be referred to elsewhere in the input file by using the label.mean min MIN the minimum value. This is calculated using the formula described in the description of the keyword so as to make it continuous. moment MOMENTS the central moments of the distribution of values. The second moment would be referenced elsewhere in the input file using label.moment-2, the third as label.moment-3, etc. morethan MORE_THAN the number of values more than a target value. This is calculated using one of the formula described in the description of the keyword so as to make it continuous. You can calculate this quantity multiple times using different parameters. sum SUM the sum of values
Compulsory keywords
 DATA the multicolvars from which the pamm coordinates are calculated CLUSTERS the name of the file that contains the definitions of all the clusters REGULARISE ( default=0.001 ) don't allow the denominator to be smaller then this value
Options
 NUMERICAL_DERIVATIVES ( default=off ) calculate the derivatives for these quantities numerically NOPBC ( default=off ) ignore the periodic boundary conditions when calculating distances SERIAL ( default=off ) do the calculation in serial. Do not use MPI LOWMEM ( default=off ) lower the memory requirements TIMINGS ( default=off ) output information on the timings of the various parts of the calculation MEAN take the mean of these variables. The final value can be referenced using label.mean. You can use multiple instances of this keyword i.e. MEAN1, MEAN2, MEAN3... The corresponding values are then referenced using label.mean-1, label.mean-2, label.mean-3... MORE_THAN calculate the number of variables more than a certain target value. This quantity is calculated using $$\sum_i 1.0 - \sigma(s_i)$$, where $$\sigma(s)$$ is a switchingfunction. The final value can be referenced using label.morethan. You can use multiple instances of this keyword i.e. MORE_THAN1, MORE_THAN2, MORE_THAN3... The corresponding values are then referenced using label.morethan-1, label.morethan-2, label.morethan-3... SUM calculate the sum of all the quantities. The final value can be referenced using label.sum. You can use multiple instances of this keyword i.e. SUM1, SUM2, SUM3... The corresponding values are then referenced using label.sum-1, label.sum-2, label.sum-3... LESS_THAN calculate the number of variables less than a certain target value. This quantity is calculated using $$\sum_i \sigma(s_i)$$, where $$\sigma(s)$$ is a switchingfunction. The final value can be referenced using label.lessthan. You can use multiple instances of this keyword i.e. LESS_THAN1, LESS_THAN2, LESS_THAN3... The corresponding values are then referenced using label.lessthan-1, label.lessthan-2, label.lessthan-3... HISTOGRAM calculate how many of the values fall in each of the bins of a histogram. This shortcut allows you to calculates NBIN quantities like BETWEEN. The final value can be referenced using label.histogram. You can use multiple instances of this keyword i.e. HISTOGRAM1, HISTOGRAM2, HISTOGRAM3... The corresponding values are then referenced using label.histogram-1, label.histogram-2, label.histogram-3... MIN calculate the minimum value. To make this quantity continuous the minimum is calculated using $$\textrm{min} = \frac{\beta}{ \log \sum_i \exp\left( \frac{\beta}{s_i} \right) }$$ The value of $$\beta$$ in this function is specified using (BETA= $$\beta$$) The final value can be referenced using label.min. You can use multiple instances of this keyword i.e. MIN1, MIN2, MIN3... The corresponding values are then referenced using label.min-1, label.min-2, label.min-3... MAX calculate the maximum value. To make this quantity continuous the maximum is calculated using $$\textrm{max} = \beta \log \sum_i \exp\left( \frac{s_i}{\beta}\right)$$ The value of $$\beta$$ in this function is specified using (BETA= $$\beta$$) The final value can be referenced using label.max. You can use multiple instances of this keyword i.e. MAX1, MAX2, MAX3... The corresponding values are then referenced using label.max-1, label.max-2, label.max-3... LOWEST this flag allows you to recover the lowest of these variables. The final value can be referenced using label.lowest HIGHEST this flag allows you to recover the highest of these variables. The final value can be referenced using label.highest ALT_MIN calculate the minimum value. To make this quantity continuous the minimum is calculated using $$\textrm{min} = -\frac{1}{\beta} \log \sum_i \exp\left( -\beta s_i \right)$$ The value of $$\beta$$ in this function is specified using (BETA= $$\beta$$). The final value can be referenced using label.altmin. You can use multiple instances of this keyword i.e. ALT_MIN1, ALT_MIN2, ALT_MIN3... The corresponding values are then referenced using label.altmin-1, label.altmin-2, label.altmin-3... BETWEEN calculate the number of values that are within a certain range. These quantities are calculated using kernel density estimation as described on histogrambead. The final value can be referenced using label.between. You can use multiple instances of this keyword i.e. BETWEEN1, BETWEEN2, BETWEEN3... The corresponding values are then referenced using label.between-1, label.between-2, label.between-3... MOMENTS calculate the moments of the distribution of collective variables. The mth moment of a distribution is calculated using $$\frac{1}{N} \sum_{i=1}^N ( s_i - \overline{s} )^m$$, where $$\overline{s}$$ is the average for the distribution. The moments keyword takes a lists of integers as input or a range. Each integer is a value of $$m$$. The final calculated values can be referenced using moment- $$m$$. You can use the COMPONENT keyword in this action but the syntax is slightly different. If you would like the second and third moments of the third component you would use MOMENTS={COMPONENT=3 MOMENTS=2-3}. The moments would then be referred to using the labels moment-3-2 and moment-3-3. This syntax is also required if you are using numbered MOMENT keywords i.e. MOMENTS1, MOMENTS2...