EMMI
This is part of the isdb module

Calculate the fit of a structure or ensemble of structures with a cryo-EM density map.

This action implements the multi-scale Bayesian approach to cryo-EM data fitting introduced in Ref. [55] . This method allows efficient and accurate structural modeling of cryo-electron microscopy density maps at multiple scales, from coarse-grained to atomistic resolution, by addressing the presence of random and systematic errors in the data, sample heterogeneity, data correlation, and noise correlation.

The experimental density map is fit by a Gaussian Mixture Model (GMM), which is provided as an external file specified by the keyword GMM_FILE. We are currently working on a web server to perform this operation. In the meantime, the user can request a stand-alone version of the GMM code at massimiliano.bonomi_AT_gmail.com.

When run in single-replica mode, this action allows atomistic, flexible refinement of an individual structure into a density map. Combined with a multi-replica framework (such as the -multi option in GROMACS), the user can model an ensemble of structures using the Metainference approach [18] .

Warning
To use EMMI, the user should always add a MOLINFO line and specify a pdb file of the system.
Note
To enhance sampling in single-structure refinement, one can use a Replica Exchange Method, such as Parallel Tempering. In this case, the user should add the NO_AVER flag to the input line.
EMMI can be used in combination with periodic and non-periodic systems. In the latter case, one should add the NOPBC flag to the input line
Description of components

By default this Action calculates the following quantities. These quantities can be referenced elsewhere in the input by using this Action's label followed by a dot and the name of the quantity required from the list below.

Quantity Description
scoreb Bayesian score

In addition the following quantities can be calculated by employing the keywords listed below

Quantity Keyword Description
acc NOISETYPE MC acceptance for uncertainty
scale REGRESSION scale factor
accscale REGRESSION MC acceptance for scale regression
enescale REGRESSION MC energy for scale regression
anneal ANNEAL annealing factor
The atoms involved can be specified using
ATOMS atoms for which we calculate the density map, typically all heavy atoms. For more information on how to specify lists of atoms see Groups and Virtual Atoms
Compulsory keywords
GMM_FILE file with the parameters of the GMM components
NL_CUTOFF The cutoff in overlap for the neighbor list
NL_STRIDE The frequency with which we are updating the neighbor list
SIGMA_MIN minimum uncertainty
RESOLUTION Cryo-EM map resolution
NOISETYPE functional form of the noise (GAUSS, OUTLIERS, MARGINAL)
Options
NUMERICAL_DERIVATIVES ( default=off ) calculate the derivatives for these quantities numerically
NOPBC ( default=off ) ignore the periodic boundary conditions when calculating distances
NO_AVER

( default=off ) don't do ensemble averaging in multi-replica mode

SIGMA0 initial value of the uncertainty
DSIGMA MC step for uncertainties
MC_STRIDE Monte Carlo stride
ERR_FILE file with experimental or GMM fit errors
OV_FILE file with experimental overlaps
NORM_DENSITY integral of the experimental density
STATUS_FILE write a file with all the data useful for restart
WRITE_STRIDE write the status to a file every N steps, this can be used for restart
REGRESSION regression stride
REG_SCALE_MIN regression minimum scale
REG_SCALE_MAX regression maximum scale
REG_DSCALE regression maximum scale MC move
SCALE scale factor
ANNEAL Length of annealing cycle
ANNEAL_FACT Annealing temperature factor
TEMP temperature
PRIOR exponent of uncertainty prior
WRITE_OV_STRIDE write model overlaps every N steps
WRITE_OV write a file with model overlaps
Examples

In this example, we perform a single-structure refinement based on an experimental cryo-EM map. The map is fit with a GMM, whose parameters are listed in the file GMM_fit.dat. This file contains one line per GMM component in the following format:

#! FIELDS Id Weight Mean_0 Mean_1 Mean_2 Cov_00 Cov_01 Cov_02 Cov_11 Cov_12 Cov_22 Beta
     0  2.9993805e+01   6.54628 10.37820 -0.92988  2.078920e-02 1.216254e-03 5.990827e-04 2.556246e-02 8.411835e-03 2.486254e-02  1
     1  2.3468312e+01   6.56095 10.34790 -0.87808  1.879859e-02 6.636049e-03 3.682865e-04 3.194490e-02 1.750524e-03 3.017100e-02  1
     ...

To accelerate the computation of the Bayesian score, one can:

  • use neighbor lists, specified by the keywords NL_CUTOFF and NL_STRIDE;
  • calculate the restraint every other step (or more).

All the heavy atoms of the system are used to calculate the density map. This list can conveniently be provided using a GROMACS index file.

The input file looks as follows:

# include pdb info
MOLINFO STRUCTURE=prot.pdb

#  all heavy atoms
protein-h: GROUP NDX_FILE=index.ndx NDX_GROUP=Protein-H

# create EMMI score
gmm: EMMI NOPBC SIGMA_MIN=0.01 TEMP=300.0 NL_STRIDE=100 NL_CUTOFF=0.01 GMM_FILE=GMM_fit.dat ATOMS=protein-h

# translate into bias - apply every 2 steps
emr: BIASVALUE ARG=gmm.scoreb STRIDE=2

PRINT ARG=emr.* FILE=COLVAR STRIDE=500 FMT=%20.10f