Skip to content

Shortcut: CLUSTER_DISTRIBUTION

Module clusters
Description Usage
Calculate functions of the distribution of properties in your connected components. used in 1 tutorialsused in 4 eggs

Details and examples

Calculate functions of the distribution of properties in your connected components.

This action allows you to calculate the number of atoms in each of the connected components that were detected when you performed DFSCLUSTERING on one of the adjacency matrices computed using the admat. The following example illustrates how you can use this to compute the number of atoms in each of the identified clusters:

Click on the labels of the actions for more information on what each action computes
tested on2.11
cm: CONTACT_MATRIXAdjacency matrix in which two atoms are adjacent if they are within a certain cutoff. More details GROUPspecifies the list of atoms that should be assumed indistinguishable=1-100 SWITCHthe input for the switching function that acts upon the distance between each pair of atoms. Options for this keyword are explained in the documentation for LESS_THAN.={CUBIC D_0=0.45  D_MAX=0.55}
dfs: DFSCLUSTERINGFind the connected components of the matrix using the depth first search clustering algorithm. More details ARGthe input matrix=cm
clust: CLUSTER_DISTRIBUTIONCalculate functions of the distribution of properties in your connected components. More details CLUSTERSthe label of the action that does the clustering=dfs
PRINTPrint quantities to a file. More details ARGthe labels of the values that you would like to print to the file=clust FILEthe name of the file on which to output these quantities=colvar STRIDE the frequency with which the quantities of interest should be output=1

The output from the CLUSTER_DISTRIBUTION action here is a vector with 100 elements. The first element of this vector is the number of atoms in the largest connected component, the second element of the vector is the number of atoms in the second largest connected component and so on (many elements of the vector will be zero).

As illustrated in the inputs below there are multiple shortcuts that allow you to probe the distribution of cluster sizes. For example, the following input calculates how many clusters with more than 10 atoms are present.

Click on the labels of the actions for more information on what each action computes
tested on2.11
cm: CONTACT_MATRIXAdjacency matrix in which two atoms are adjacent if they are within a certain cutoff. More details GROUPspecifies the list of atoms that should be assumed indistinguishable=1-100 SWITCHthe input for the switching function that acts upon the distance between each pair of atoms. Options for this keyword are explained in the documentation for LESS_THAN.={CUBIC D_0=0.45  D_MAX=0.55}
dfs: DFSCLUSTERINGFind the connected components of the matrix using the depth first search clustering algorithm. More details ARGthe input matrix=cm
clust: CLUSTER_DISTRIBUTIONCalculate functions of the distribution of properties in your connected components. This action is a shortcut. More details CLUSTERSthe label of the action that does the clustering=dfs MORE_THANcalculate the number of variables that are more than a certain target value. Options for this keyword are explained in the documentation for MORE_THAN.={RATIONAL D_0=10 R_0=0.0001}
PRINTPrint quantities to a file. More details ARGthe labels of the values that you would like to print to the file=clust.morethan FILEthe name of the file on which to output these quantities=colvar STRIDE the frequency with which the quantities of interest should be output=1

By a similar logic you can compute the number of clusters that were identified as follows:

Click on the labels of the actions for more information on what each action computes
tested on2.11
cm: CONTACT_MATRIXAdjacency matrix in which two atoms are adjacent if they are within a certain cutoff. More details GROUPspecifies the list of atoms that should be assumed indistinguishable=1-100 SWITCHthe input for the switching function that acts upon the distance between each pair of atoms. Options for this keyword are explained in the documentation for LESS_THAN.={CUBIC D_0=0.45  D_MAX=0.55}
dfs: DFSCLUSTERINGFind the connected components of the matrix using the depth first search clustering algorithm. More details ARGthe input matrix=cm
clust: CLUSTER_DISTRIBUTIONCalculate functions of the distribution of properties in your connected components. This action is a shortcut. More details CLUSTERSthe label of the action that does the clustering=dfs LESS_THANcalculate the number of variables that are less than a certain target value. Options for this keyword are explained in the documentation for LESS_THAN.={RATIONAL R_0=0.0001}
nc: CUSTOMCalculate a combination of variables using a custom expression. More details ARGthe values input to this function=clust.lessthan FUNCthe function you wish to evaluate=100-x PERIODICif the output of your function is periodic then you should specify the periodicity of the function=NO
PRINTPrint quantities to a file. More details ARGthe labels of the values that you would like to print to the file=nc FILEthe name of the file on which to output these quantities=colvar STRIDE the frequency with which the quantities of interest should be output=1

This input calculates the number of zeros in the vector output by the CLUSTER_DISTRIBUTION action. If we subtract the number of non-zero elements from the size of the vector we then get the number of clusters.

Lastly, you can calculate the number of clusters that are within a certain range or a set of ranges using the BETWEEN and HISGTOGRAM keywords as indicated below:

Click on the labels of the actions for more information on what each action computes
tested on2.11
cm: CONTACT_MATRIXAdjacency matrix in which two atoms are adjacent if they are within a certain cutoff. More details GROUPspecifies the list of atoms that should be assumed indistinguishable=1-100 SWITCHthe input for the switching function that acts upon the distance between each pair of atoms. Options for this keyword are explained in the documentation for LESS_THAN.={CUBIC D_0=0.45  D_MAX=0.55}
dfs: DFSCLUSTERINGFind the connected components of the matrix using the depth first search clustering algorithm. More details ARGthe input matrix=cm
clust: CLUSTER_DISTRIBUTIONCalculate functions of the distribution of properties in your connected components. This action is a shortcut. More details ...
   CLUSTERSthe label of the action that does the clustering=dfs
   BETWEENcalculate the number of values that are within a certain range. Options for this keyword are explained in the documentation for BETWEEN.={GAUSSIAN LOWER=5 UPPER=6 SMEAR=0.5}
   HISTOGRAMcalculate a discretized histogram of the distribution of values={GAUSSIAN LOWER=6 UPPER=10 NBINS=4 SMEAR=0.5}
...
PRINTPrint quantities to a file. More details ARGthe labels of the values that you would like to print to the file=clust.* FILEthe name of the file on which to output these quantities=colvar STRIDE the frequency with which the quantities of interest should be output=1

This input will output 5 quantities:

  • clust.between tells you the number of connected components that contain between 5 and 6 atoms.
  • clust.between-1 tells you the number of connected components that contain between 6 and 7 atoms.
  • clust.between-2 tells you the number of connected components that contain between 7 and 8 atoms.
  • clust.between-3 tells you the number of connected components that contain between 8 and 9 atoms.
  • clust.between-4 tells you the number of connected components that contain between 9 and 10 atoms.

If you expand the inputs above you can find more details on how these quantities are calculated.

The input provided below shows just how this can be used to perform quite complicated calculations. The input calculates the local q6 Steinhardt parameter on each atom. The coordination number that atoms with a high value for the local q6 Steinhardt parameter have with other atoms that have a high value for the local q6 Steinhardt parameter is then computed. A contact matrix is then computed that measures whether atoms atoms and have a high value for this coordination number and if they are within 3.6 nm of each other. The connected components of this matrix are then found using a depth first clustering algorithm on the corresponding graph. The number of components in this graph that contain more than 27 atoms is then computed. An input similar to this one was used to analyze the formation of a polycrystal of GeTe from amorphous GeTe in the paper cited below

Click on the labels of the actions for more information on what each action computes
tested on2.11
q6: Q6Calculate sixth order Steinhardt parameters. This action is a shortcut. More details SPECIESthe list of atoms for which the symmetry function is being calculated and the atoms that can be in the environments=1-300 SWITCHthe switching function that it used in the construction of the contact matrix. Options for this keyword are explained in the documentation for LESS_THAN.={GAUSSIAN D_0=5.29 R_0=0.01 D_MAX=5.3}
lq6: LOCAL_Q6Calculate the local degree of order around an atoms by taking the average dot product between the q_6 vector on the central atom and the q_6 vector on the atoms in the first coordination sphere. This action is a shortcut. More details SPECIESthe label of the action that computes the Steinhardt parameters for which you would like to calculate local steinhardt parameters=q6 SWITCHThis keyword is used if you want to employ an alternative to the continuous swiching function defined above={GAUSSIAN D_0=5.29 R_0=0.01 D_MAX=5.3}
flq6: MORE_THANUse a switching function to determine how many of the input variables are more than a certain cutoff. More details ARGthe values input to this function=lq6 SWITCHThis keyword is used if you want to employ an alternative to the continuous swiching function defined above={GAUSSIAN D_0=0.19 R_0=0.01 D_MAX=0.2}
cc: COORDINATIONNUMBERCalculate the coordination numbers of atoms so that you can then calculate functions of the distribution of This action is a shortcut. More details SPECIESthe list of atoms for which the symmetry function is being calculated and the atoms that can be in the environments=1-300 SWITCHthe switching function that it used in the construction of the contact matrix. Options for this keyword are explained in the documentation for LESS_THAN.={GAUSSIAN D_0=3.59 R_0=0.01 D_MAX=3.6}
fcc: MORE_THANUse a switching function to determine how many of the input variables are more than a certain cutoff. More details ARGthe values input to this function=cc SWITCHThis keyword is used if you want to employ an alternative to the continuous swiching function defined above={GAUSSIAN D_0=5.99 R_0=0.01 D_MAX=6.0}
mat: CONTACT_MATRIXAdjacency matrix in which two atoms are adjacent if they are within a certain cutoff. More details GROUPspecifies the list of atoms that should be assumed indistinguishable=1-300 SWITCHthe input for the switching function that acts upon the distance between each pair of atoms. Options for this keyword are explained in the documentation for LESS_THAN.={GAUSSIAN D_0=3.59 R_0=0.01 D_MAX=3.6}
dfs: DFSCLUSTERINGFind the connected components of the matrix using the depth first search clustering algorithm. More details ARGthe input matrix=mat
nclust: CLUSTER_DISTRIBUTIONCalculate functions of the distribution of properties in your connected components. This action is a shortcut. More details ...
   CLUSTERSthe label of the action that does the clustering=dfs WEIGHTSuse the vector of values calculated by this action as weights rather than giving each atom a unit weight=fcc
   MORE_THANcalculate the number of variables that are more than a certain target value. Options for this keyword are explained in the documentation for MORE_THAN.={GAUSSIAN D_0=26.99 R_0=0.01 D_MAX=27}
...
PRINTPrint quantities to a file. More details ARGthe labels of the values that you would like to print to the file=nclust.* FILEthe name of the file on which to output these quantities=colvar

Notice how the WEIGHTS keyword is used here in the input to CLUSTER_DISTRIBUTION. By using this keyword here we ensure that the size of each cluster is the sum of the components of the vector fcc that are part of the connected component. The size of each cluster that is output is thus the number of atoms in that cluster that have a COORDINATIONNUMBER that is greater than 4.

Output components

This action can calculate the values in the following table when the associated keyword is included in the input for the action. These values can be referenced elsewhere in the input by using this Action's label followed by a dot and the name of the value required from the list below.

Name Type Keyword Description
lessthan scalar LESS_THAN the number of colvars that have a value less than a threshold
morethan scalar MORE_THAN the number of colvars that have a value more than a threshold
altmin scalar ALT_MIN the minimum value of the cv
min scalar MIN the minimum colvar
max scalar MAX the maximum colvar
between scalar BETWEEN the number of colvars that have a value that lies in a particular interval
highest scalar HIGHEST the largest of the colvars
lowest scalar LOWEST the smallest of the colvars
sum scalar SUM the sum of the colvars
mean scalar MEAN the mean of the colvars

Full list of keywords

The following table describes the keywords and options that can be used with this action

Keyword Type Default Description
CLUSTERS compulsory none the label of the action that does the clustering
WEIGHTS optional not used use the vector of values calculated by this action as weights rather than giving each atom a unit weight

deprecated keywords

The keywords in the following table can still be used with this action but have been deprecated

Keyword Description
LESS_THAN calculate the number of variables that are less than a certain target value
MORE_THAN calculate the number of variables that are more than a certain target value
BETWEEN calculate the number of values that are within a certain range
HISTOGRAM calculate a discretized histogram of the distribution of values