Action: EMMI
| Module | isdb |
|---|---|
| Description | Usage |
| Calculate the fit of a structure or ensemble of structures with a cryo-EM density map. |
Details and examples
Calculate the fit of a structure or ensemble of structures with a cryo-EM density map.
This action implements the multi-scale Bayesian approach to cryo-EM data fitting introduced in the first paper cited below. This method allows efficient and accurate structural modeling of cryo-electron microscopy density maps at multiple scales, from coarse-grained to atomistic resolution, by addressing the presence of random and systematic errors in the data, sample heterogeneity, data correlation, and noise correlation.
The experimental density map is fit by a Gaussian Mixture Model (GMM), which is provided as an external file specified by the keyword GMM_FILE. We are currently working on a web server to perform this operation. In the meantime, the user can request a stand-alone version of the GMM code at massimiliano.bonomi_AT_gmail.com.
When run in single-replica mode, this action allows atomistic, flexible refinement of an individual structure into a density map. Combined with a multi-replica framework (such as the -multi option in GROMACS), the user can model an ensemble of structures using the Metainference approach that is discussed in the second paper cited below.
To enhance sampling in single-structure refinement, one can use a Replica Exchange Method, such as Parallel Tempering. In this case, the user should add the NO_AVER flag to the input line. To use a replica-based enhanced sampling scheme such as Parallel-Bias Metadynamics (PBMETAD), one should use the REWEIGHT flag and pass the Metadynamics bias using the ARG keyword.
EMMI can be used in combination with periodic and non-periodic systems. In the latter case, one should add the NOPBC flag to the input line
Examples
In this example, we perform a single-structure refinement based on an experimental cryo-EM map. The map is fit with a GMM, whose parameters are listed in the file GMM_fit.dat. This file contains one line per GMM component in the following format:
#! FIELDS Id Weight Mean_0 Mean_1 Mean_2 Cov_00 Cov_01 Cov_02 Cov_11 Cov_12 Cov_22 Beta
0 2.9993805e+01 6.54628 10.37820 -0.92988 2.078920e-02 1.216254e-03 5.990827e-04 2.556246e-02 8.411835e-03 2.486254e-02 1
1 2.3468312e+01 6.56095 10.34790 -0.87808 1.879859e-02 6.636049e-03 3.682865e-04 3.194490e-02 1.750524e-03 3.017100e-02 1
...
To accelerate the computation of the Bayesian score, one can:
- use neighbor lists, specified by the keywords NL_CUTOFF and NL_STRIDE;
- calculate the restraint every other step (or more).
All the heavy atoms of the system are used to calculate the density map. This list can conveniently be provided using a GROMACS index file.
The input file looks as follows:
# include pdb info MOLINFOThis command is used to provide information on the molecules that are present in your system. More details STRUCTUREa file in pdb format containing a reference structure=prot.pdb # all heavy atoms protein-h: GROUPDefine a group of atoms so that a particular list of atoms can be referenced with a single label in definitions of CVs or virtual atoms. More details NDX_FILEthe name of index file (gromacs syntax)=index.ndx NDX_GROUPthe name of the group to be imported (gromacs syntax) - first group found is used by default=Protein-H # create EMMI score gmm: EMMICalculate the fit of a structure or ensemble of structures with a cryo-EM density map. More details NOPBC ignore the periodic boundary conditions when calculating distances SIGMA_MINminimum uncertainty=0.01 TEMPtemperature=300.0 NL_STRIDEThe frequency with which we are updating the neighbor list=100 NL_CUTOFFThe cutoff in overlap for the neighbor list=0.01 GMM_FILEfile with the parameters of the GMM components=GMM_fit.dat ATOMSatoms for which we calculate the density map, typically all heavy atoms=protein-h # translate into bias - apply every 2 steps emr: BIASVALUETakes the value of one variable and use it as a bias More details ARGthe labels of the scalar/vector arguments whose values will be used as a bias on the system=gmm.scoreb STRIDEthe frequency with which the forces due to the bias should be calculated=2 PRINTPrint quantities to a file. More details ARGthe labels of the values that you would like to print to the file=emr.* FILEthe name of the file on which to output these quantities=COLVAR STRIDE the frequency with which the quantities of interest should be output=500 FMT the format that should be used to output real numbers=%20.10f
Input
The arguments and atoms that serve as the input for this action are specified using one or more of the keywords in the following table.
| Keyword | Type | Description |
|---|---|---|
| ARG | scalar | the labels of the values from which the function is calculated |
| ATOMS | atoms | atoms for which we calculate the density map, typically all heavy atoms |
Output components
This action can calculate the values in the following table when the associated keyword is included in the input for the action. These values can be referenced elsewhere in the input by using this Action's label followed by a dot and the name of the value required from the list below.
| Name | Type | Keyword | Description |
|---|---|---|---|
| scoreb | scalar | default | Bayesian score |
| acc | scalar | NOISETYPE | MC acceptance for uncertainty |
| scale | scalar | REGRESSION | scale factor |
| accscale | scalar | REGRESSION | MC acceptance for scale regression |
| enescale | scalar | REGRESSION | MC energy for scale regression |
| anneal | scalar | ANNEAL | annealing factor |
| weight | scalar | REWEIGHT | weights of the weighted average |
| biasDer | scalar | REWEIGHT | derivatives with respect to the bias |
| sigma | scalar | NOISETYPE | uncertainty in the forward models and experiment |
| neff | scalar | default | effective number of replicas |
Full list of keywords
The following table describes the keywords and options that can be used with this action
| Keyword | Type | Default | Description |
|---|---|---|---|
| ARGThis keyword do not have examples | input | none | the labels of the values from which the function is calculated |
| ATOMS | input | none | atoms for which we calculate the density map, typically all heavy atoms |
| GMM_FILE | compulsory | none | file with the parameters of the GMM components |
| NL_CUTOFF | compulsory | none | The cutoff in overlap for the neighbor list |
| NL_STRIDE | compulsory | none | The frequency with which we are updating the neighbor list |
| SIGMA_MIN | compulsory | none | minimum uncertainty |
| RESOLUTIONThis keyword do not have examples | compulsory | none | Cryo-EM map resolution |
| NOISETYPEThis keyword do not have examples | compulsory | none | functional form of the noise (GAUSS, OUTLIERS, MARGINAL) |
| NUMERICAL_DERIVATIVESThis keyword do not have examples | optional | false | calculate the derivatives for these quantities numerically |
| NOPBC | optional | false | ignore the periodic boundary conditions when calculating distances |
| SIGMA0This keyword do not have examples | optional | not used | initial value of the uncertainty |
| DSIGMAThis keyword do not have examples | optional | not used | MC step for uncertainties |
| MC_STRIDEThis keyword do not have examples | optional | not used | Monte Carlo stride |
| ERR_FILEThis keyword do not have examples | optional | not used | file with experimental or GMM fit errors |
| OV_FILEThis keyword do not have examples | optional | not used | file with experimental overlaps |
| NORM_DENSITYThis keyword do not have examples | optional | not used | integral of the experimental density |
| STATUS_FILEThis keyword do not have examples | optional | not used | write a file with all the data useful for restart |
| WRITE_STRIDEThis keyword do not have examples | optional | not used | write the status to a file every N steps, this can be used for restart |
| REGRESSIONThis keyword do not have examples | optional | not used | regression stride |
| REG_SCALE_MINThis keyword do not have examples | optional | not used | regression minimum scale |
| REG_SCALE_MAXThis keyword do not have examples | optional | not used | regression maximum scale |
| REG_DSCALEThis keyword do not have examples | optional | not used | regression maximum scale MC move |
| SCALEThis keyword do not have examples | optional | not used | scale factor |
| ANNEALThis keyword do not have examples | optional | not used | Length of annealing cycle |
| ANNEAL_FACTThis keyword do not have examples | optional | not used | Annealing temperature factor |
| TEMP | optional | not used | temperature |
| PRIORThis keyword do not have examples | optional | not used | exponent of uncertainty prior |
| WRITE_OV_STRIDEThis keyword do not have examples | optional | not used | write model overlaps every N steps |
| WRITE_OVThis keyword do not have examples | optional | not used | write a file with model overlaps |
| AVERAGINGThis keyword do not have examples | optional | not used | Averaging window for weights |
| NO_AVERThis keyword do not have examples | optional | false | don't do ensemble averaging in multi-replica mode |
| REWEIGHTThis keyword do not have examples | optional | false | simple REWEIGHT using the ARG as energy |
References
More information about how this action can be used is available in the following articles:
- M. Bonomi, S. Hanot, C. H. Greenberg, A. Sali, M. Nilges, M. Vendruscolo, R. Pellarin, Bayesian weighing of electron cryo-microscopy data for integrative structural modeling (2017)
- M. Bonomi, C. Camilloni, A. Cavalli, M. Vendruscolo, Metainference: A Bayesian inference method for heterogeneous systems. Science Advances. 2 (2016)