Action: METAINFERENCE

Module isdb
Description Usage
Calculates the Metainference energy for a set of experimental data. used in 1 tutorialsused in 8 eggs

Output components

This action can calculate the values in the following table when the associated keyword is included in the input for the action. These values can be referenced elsewhere in the input by using this Action's label followed by a dot and the name of the value required from the list below.

Name Type Keyword Description
bias scalar default the instantaneous value of the bias potential
sigma scalar default uncertainty parameter
sigmaMean scalar default uncertainty in the mean estimate
neff scalar default effective number of replicas
acceptSigma scalar default MC acceptance for sigma values
acceptScale scalar SCALEDATA MC acceptance for scale value
acceptFT scalar GENERIC MC acceptance for general metainference f tilde value
weight scalar REWEIGHT weights of the weighted average
biasDer scalar REWEIGHT derivatives with respect to the bias
scale scalar SCALEDATA scale parameter
offset scalar ADDOFFSET offset parameter
ftilde scalar GENERIC ensemble average estimator

Input

The arguments that serve as the input for this action are specified using one or more of the keywords in the following table.

Keyword Type Description
ARG scalar the labels of the scalars on which the bias will act
PARARG scalar reference values for the experimental data, these can be provided as arguments without derivatives

Further details and examples

Calculates the Metainference energy for a set of experimental data.

Metainference \cite Bonomi:2016ip is a Bayesian framework to model heterogeneous systems by integrating prior information with noisy, ensemble-averaged data. Metainference models a system and quantifies the level of noise in the data by considering a set of replicas of the system.

Calculated experimental data are given in input as ARG while reference experimental values can be given either from fixed components of other actions using PARARG or as numbers using PARAMETERS. The default behavior is that of averaging the data over the available replicas, if this is not wanted the keyword NOENSEMBLE prevent this averaging.

Metadynamics Metainference \cite Bonomi:2016ge or more in general biased Metainference requires the knowledge of biasing potential in order to calculate the weighted average. In this case the value of the bias can be provided as the last argument in ARG and adding the keyword REWEIGHT. To avoid the noise resulting from the instantaneous value of the bias the weight of each replica can be averaged over a give time using the keyword AVERAGING.

The data can be averaged by using multiple replicas and weighted for a bias if present. The functional form of Metainference can be chosen among four variants selected with NOISE=GAUSS,MGAUSS,OUTLIERS,MOUTLIERS,GENERIC which correspond to modelling the noise for the arguments as a single gaussian common to all the data points, a gaussian per data point, a single long-tailed gaussian common to all the data points, a log-tailed gaussian per data point or using two distinct noises as for the most general formulation of Metainference. In this latter case the noise of the replica-averaging is gaussian (one per data point) and the noise for the comparison with the experimental data can chosen using the keyword LIKELIHOOD between gaussian or log-normal (one per data point), furthermore the evolution of the estimated average over an infinite number of replicas is driven by DFTILDE.

As for Metainference theory there are two sigma values: SIGMA_MEAN0 represent the error of calculating an average quantity using a finite set of replica and should be set as small as possible following the guidelines for replica-averaged simulations in the framework of the Maximum Entropy Principle. Alternatively, this can be obtained automatically using the internal sigma mean optimization as introduced in \cite Lohr:2017gc (OPTSIGMAMEAN=SEM), in this second case sigma_mean is estimated from the maximum standard error of the mean either over the simulation or over a defined time using the keyword AVERAGING. SIGMA_BIAS is an uncertainty parameter, sampled by a MC algorithm in the bounded interval defined by SIGMA_MIN and SIGMA_MAX. The initial value is set at SIGMA0. The MC move is a random displacement of maximum value equal to DSIGMA. If the number of data point is too large and the acceptance rate drops it is possible to make the MC move over mutually exclusive, random subset of size MC_CHUNKSIZE and run more than one move setting MC_STEPS in such a way that MC_CHUNKSIZE*MC_STEPS will cover all the data points.

Calculated and experimental data can be compared modulo a scaling factor and/or an offset using SCALEDATA and/or ADDOFFSET, the sampling is obtained by a MC algorithm either using a flat or a gaussian prior setting it with SCALE_PRIOR or OFFSET_PRIOR.

\par Examples

In the following example we calculate a set of \ref RDC, take the replica-average of them and comparing them with a set of experimental values. RDCs are compared with the experimental data but for a multiplication factor SCALE that is also sampled by MC on-the-fly

\plumedfile RDC ... LABEL=rdc SCALE=0.0001 GYROM=-72.5388 ATOMS1=22,23 ATOMS2=25,27 ATOMS3=29,31 ATOMS4=33,34 ... RDC

METAINFERENCE ... ARG=rdc.* NOISETYPE=MGAUSS PARAMETERS=1.9190,2.9190,3.9190,4.9190 SCALEDATA SCALE0=1 SCALE_MIN=0.1 SCALE_MAX=3 DSCALE=0.01 SIGMA0=0.01 SIGMA_MIN=0.00001 SIGMA_MAX=3 DSIGMA=0.01 SIGMA_MEAN0=0.001 LABEL=spe ... METAINFERENCE

PRINT ARG=spe.bias FILE=BIAS STRIDE=1 \endplumedfile

in the following example instead of using one uncertainty parameter per data point we use a single uncertainty value in a long-tailed gaussian to take into account for outliers, furthermore the data are weighted for the bias applied to other variables of the system.

\plumedfile RDC ... LABEL=rdc SCALE=0.0001 GYROM=-72.5388 ATOMS1=22,23 ATOMS2=25,27 ATOMS3=29,31 ATOMS4=33,34 ... RDC

cv1: TORSION ATOMS=1,2,3,4 cv2: TORSION ATOMS=2,3,4,5 mm: METAD ARG=cv1,cv2 HEIGHT=0.5 SIGMA=0.3,0.3 PACE=200 BIASFACTOR=8 WALKERS_MPI

METAINFERENCE ...

SETTINGS NREPLICAS=2

ARG=rdc.*,mm.bias REWEIGHT NOISETYPE=OUTLIERS PARAMETERS=1.9190,2.9190,3.9190,4.9190 SCALEDATA SCALE0=1 SCALE_MIN=0.1 SCALE_MAX=3 DSCALE=0.01 SIGMA0=0.01 SIGMA_MIN=0.00001 SIGMA_MAX=3 DSIGMA=0.01 SIGMA_MEAN0=0.001 LABEL=spe ... METAINFERENCE \endplumedfile

(See also \ref RDC, \ref PBMETAD).

Syntax

The following table describes the keywords and options that can be used with this action

Keyword Type Default Description
ARG input none the labels of the scalars on which the bias will act
PARARG input none reference values for the experimental data, these can be provided as arguments without derivatives
NOISETYPE compulsory MGAUSS functional form of the noise (GAUSS,MGAUSS,OUTLIERS,MOUTLIERS,GENERIC)
LIKELIHOOD compulsory GAUSS the likelihood for the GENERIC metainference model, GAUSS or LOGN
DFTILDE compulsory 0.1 fraction of sigma_mean used to evolve ftilde
SCALE0 compulsory 1.0 initial value of the scaling factor
SCALE_PRIOR compulsory FLAT either FLAT or GAUSSIAN
OFFSET0 compulsory 0.0 initial value of the offset
OFFSET_PRIOR compulsory FLAT either FLAT or GAUSSIAN
SIGMA0 compulsory 1.0 initial value of the uncertainty parameter
SIGMA_MIN compulsory 0.0 minimum value of the uncertainty parameter
SIGMA_MAX compulsory 10. maximum value of the uncertainty parameter
OPTSIGMAMEAN compulsory NONE Set to NONE/SEM to manually set sigma mean, or to estimate it on the fly
WRITE_STRIDE compulsory 10000 write the status to a file every N steps, this can be used for restart/continuation
NUMERICAL_DERIVATIVES optional false calculate the derivatives for these quantities numerically
PARAMETERS optional not used reference values for the experimental data
NOENSEMBLE optional false don't perform any replica-averaging
REWEIGHT optional false simple REWEIGHT using the latest ARG as energy
AVERAGING optional not used Stride for calculation of averaged weights and sigma_mean
SCALEDATA optional false Set to TRUE if you want to sample a scaling factor common to all values and replicas
SCALE_MIN optional not used minimum value of the scaling factor
SCALE_MAX optional not used maximum value of the scaling factor
DSCALE optional not used maximum MC move of the scaling factor
ADDOFFSET optional false Set to TRUE if you want to sample an offset common to all values and replicas
OFFSET_MIN optional not used minimum value of the offset
OFFSET_MAX optional not used maximum value of the offset
DOFFSET optional not used maximum MC move of the offset
REGRES_ZERO optional not used stride for regression with zero offset
DSIGMA optional not used maximum MC move of the uncertainty parameter
SIGMA_MEAN0 optional not used starting value for the uncertainty in the mean estimate
SIGMA_MAX_STEPS optional not used Number of steps used to optimise SIGMA_MAX, before that the SIGMA_MAX value is used
TEMP optional not used the system temperature - this is only needed if code doesn't pass the temperature to plumed
MC_STEPS optional not used number of MC steps
MC_CHUNKSIZE optional not used MC chunksize
STATUS_FILE optional not used write a file with all the data useful for restart/continuation of Metainference
FMT optional not used specify format for HILLS files (useful for decrease the number of digits in regtests)
SELECTOR optional not used name of selector
NSELECT optional not used range of values for selector [0, N-1]
RESTART optional not used allows per-action setting of restart (YES/NO/AUTO)