Functions that are used to construct histograms Functions that are used to construct histograms

Constructing histograms is something you learned to do relatively early in life. You perform an experiment a number of times, count the number of times each result comes up and then draw a bar graph that describes how often each of the results came up. This only works when there are a finite number of possible results. If the result a number between 0 and 1 the bar chart is less easy to draw as there are as many possible results as there are numbers between zero and one - an infinite number. To resolve this problem we replace probability, \(P\) with probability density, \(\pi\), and write the probability of getting a number between \(a\) and \(b\) as:

\[ P = \int_{a}^b \textrm{d}x \pi(x) \]

To calculate probability densities from a set of results we use a process called kernel density estimation. Histograms are accumulated by adding up kernel functions, \(K\), with finite spatial extent, that integrate to one. These functions are centered on each of the \(n\)-dimensional data points, \(\mathbf{x}_i\). The overall effect of this is that each result we obtain in our experiments contributes to the probability density in a finite sized region of the space.

Expressing all this mathematically in kernel density estimation we write the probability density as:

\[ \pi(\mathbf{x}) = \sum_i K\left[ (\mathbf{x} - \mathbf{x}_i)^T \Sigma (\mathbf{x} - \mathbf{x}_i) \right] \]

where \(\Sigma\) is an \(n \times n\) matrix called the bandwidth that controls the spatial extent of the kernel. Whenever we accumulate a histogram (e.g. in HISTOGRAM or in METAD) we use this technique.

There is thus some flexibility in the particular function we use for \(K[\mathbf{r}]\) in the above. The following variants are available.

gaussian \(f(r) = \frac{1}{(2 \pi)^{n} \sqrt{|\Sigma^{-1}|}} \exp\left(-0.5 r^2 \right)\)
truncated-gaussian \(f(r) = \frac{1}{(2 \pi)^{n} \sqrt{|\Sigma^{-1}|} \left(\frac{\mathrm{erf}(-6.25/sqrt{2}) - \mathrm{erf}(-6.25/sqrt{2})}{2}\right)^n} \exp\left(-0.5 r^2 \right)\)
triangular \(f(r) = \frac{3}{V} ( 1 - | r | )H(1-|r|) \)
uniform \(f(r) = \frac{1}{V}H(1-|r|)\)

In the above \(H(y)\) is a function that is equal to one when \(y>0\) and zero when \(y \le 0\). \(n\) is the dimensionality of the vector \(\mathbf{x}\) and \(V\) is the volume of an ellipse in an \(n\) dimensional space which is given by:

\begin{eqnarray*} V &=& | \Sigma^{-1} | \frac{ \pi^{\frac{n}{2}} }{\left( \frac{n}{2} \right)! } \qquad \textrm{for even} \quad n \\ V &=& | \Sigma^{-1} | \frac{ 2^{\frac{n+1}{2}} \pi^{\frac{n-1}{2}} }{ n!! } \end{eqnarray*}

In METAD the normalization constants are ignored so that the value of the function at \(r=0\) is equal to one. In addition in METAD we must be able to differentiate the bias in order to get forces. This limits the kernels we can use in this method. Notice also that Gaussian kernels should have infinite support. When used with grids, however, they are assumed to only be non-zero over a finite range. The difference between the truncated-gaussian and regular gaussian is that the truncated gaussian is scaled so that its integral over the grid is equal to one when it is normalized. The integral of a regular gaussian when it is evaluated on a grid will be slightly less that one because of the truncation of a function that should have infinite support.

Glossary of keywords and components