Command line tool: benchmark
| Module | cltools |
|---|---|
| Description | Input |
| run a calculation with a fixed trajectory to find bottlenecks in PLUMED | command line args |
Details
benchmark is a lightweight reimplementation of driver that can be used to run benchmark calculations
The main difference between driver and benchmark is that benchmark generates a trajectory in memory rather than reading a trajectory from a file. This approach is better for timing the overhead of the plumed library. If you do similar benchmarking with driver the timings you get are dominated by the time spent doing the I/O operations that are required to read the trajectory.
Basic usage
If you want to use benchmark you first create a sample plumed.dat file for testing. For example:
WHOLEMOLECULESThis action is used to rebuild molecules that can become split by the periodic boundary conditions. More details ENTITY0the atoms that make up a molecule that you wish to align=1-10000 p: POSITIONCalculate the components of the position of an atom or atoms. More details ATOMthe atom number=1 RESTRAINTAdds harmonic and/or linear restraints on one or more variables. More details ARGthe values the harmonic restraint acts upon=p.x KAPPA specifies that the restraint is harmonic and what the values of the force constants on each of the variables are=1 ATthe position of the restraint=0
You can then run this benchmark using the following command:
plumed benchmarkrun a calculation with a fixed trajectory to find bottlenecks in PLUMED This action has hidden defaults. More details
Notice, that benchmark will read an input file called plumed.dat by default. You can specify a different name for you PLUMED input file
by using the --plumed flag.
Running with a different PLUMED version
If you want to run a benchmark against a previous plumed version in a controlled setting you can do so by using the command:
plumed-runtime benchmarkrun a calculation with a fixed trajectory to find bottlenecks in PLUMED This action has hidden defaults. More details --kernel colon separated path(s) to kernel(s) /path/to/lib/libplumedKernel.so
If you use this command the version of PLUMED that is in your environment calls the version of the library that is specified using the
--kernel flag. Running the benchmark in this way ensures that you are running in a controlled setting, where systematic errors
in the comparison are minimized.
using plumed-runtime
You use the plumed-runtime executable here to avoid conflicts between different
plumed versions. You will find the plumed-runtime executable in your path if you are using the non installed version of plumed,
and in $prefix/lib/plumed if you installed plumed in $prefix,.
Comparing multiple versions
The best way to compare two versions of plumed on the same input is to pass multiple colon-separated kernels as shown below:
plumed-runtime benchmarkrun a calculation with a fixed trajectory to find bottlenecks in PLUMED This action has hidden defaults. More details --kernel colon separated path(s) to kernel(s) /path/to/lib/libplumedKernel.so:/path2/to/lib/libplumedKernel.so:this
Here this means the kernel of the version with which you are running the benchmark. This comparison runs the three
instances simultaneously (alternating them) so that systematic differences in the load of your machine will affect them
to the same extent.
In case the different versions require modified plumed.dat files, or if you simply want to compare two different plumed input files that compute the same thing, you can also use multiple plumed input files:
plumed-runtime benchmarkrun a calculation with a fixed trajectory to find bottlenecks in PLUMED This action has hidden defaults. More details --kernel colon separated path(s) to kernel(s) /path/to/lib/libplumedKernel.so:this --plumed colon separated path(s) to the input file(s) plumed1.dat:plumed2.dat
Similarly, you might want to run two different inputs using the same kernel by using an input like this:
plumed-runtime benchmarkrun a calculation with a fixed trajectory to find bottlenecks in PLUMED This action has hidden defaults. More details --plumed colon separated path(s) to the input file(s) plumed1.dat:plumed2.dat
Profiling
If you want to attach a profiler to the process on the fly, you might find it convenient to use --nsteps -1.
This options ensures that the simulation runs forever unless interrupted with CTRL-C. When interrupted, the result of the timers
should be displayed anyway.
You can also set a maximum time for the calculating by using the --maxtime flag.
If you run a profiler when testing multiple PLUMED versions it can be difficult to determine which function is from
each version. We therefore recommended you recompile separate PLUMED instances with a separate C++ namespace (-DPLMD=PLUMED_version_1)
so that you will be able to distinguish them. In addition, compiling with CXXFLAGS="-g -O3" will make the profiling
report more complete and will likely highlight lines of code that are particularly computationally demanding.
MPI runs
You can also run a benchmark that emulates a domain decomposition if plumed has been compiled with MPI
and you run with mpirun and a command like the one shown below:
mpirun -npRun instances of PLUMED on this number of MPI processes 4 plumed-runtime benchmarkrun a calculation with a fixed trajectory to find bottlenecks in PLUMED This action has hidden defaults. More details
If you load separate PLUMED instances as discussed above, they should all be compiled against the same MPI version. Notice that when using MPI signals (CTRL-C) might not work.
Since some of the data transfer could happen asynchronously, you might want to use the --sleep option
to simulate a lag between the prepareCalc and performCalc actions. This part of the calculation will not contribute
to the output timings, but will obviously slow down your test.
Output
In the output you will see the usual reports about timings produced by the internal timers of the tested plumed instances.
In addition, this tool monitors the timing externally, with some slightly different criterion:
- First, the initialization (construction of the input) will be shown with a separate timer, as well as the timing for the first step.
- Second, the timer corresponding to the calculation will be split in three parts, reporting execution of the first 20% (warm-up) and the next two blocks of 40% each.
- Finally, you might notice some discrepancies because some of the actions that are usually not expensive are not included in the internal timers. The external timer will thus provide a better estimate of the total elapsed time that includes everything.
The internal timers are still useful to monitor what happens at the different stages
of the calculattion. If you want more detailed information you can also use a
DEBUG action with the DETAILED_TIMERS, to determine how much time is spnt in each action.
When you run multiple version, a comparative analisys of the time spent within PLUMED in the various instances will be done. For each PLUMED instance you run, this analysis shows the ratio between the total time each PLUMED instance ran for and the total time the first PLUMED instance ran for. In other words, the first time that the first PLUMED instance ran for is used as the basis for comparisons. Errors on these estimates of the timings are calculated using bootstrapping and the warm-up phase is discarded in the analysis.
Syntax
The following table describes the command line options that are available for this tool
| Keyword | Description |
|---|---|
| --help/-h | print this help |
| --plumed | colon separated path(s) to the input file(s) |
| --kernel | colon separated path(s) to kernel(s) |
| --natoms | the number of atoms to use for the simulation |
| --nsteps | number of steps of MD to perform (-1 means forever) |
| --maxtime | maximum number of seconds (-1 means forever) |
| --sleep | number of seconds of sleep, mimicking MD calculation |
| --atom-distribution | the kind of possible atomic displacement at each step |
| --dump-trajectory | dump the trajectory to this file |
| --domain-decomposition | simulate domain decomposition, implies --shuffle |
| --shuffled | reshuffle atoms |
| --ixyz | the trajectory in xyz format |
| --igro | the trajectory in gro format |
| --idlp4 | the trajectory in DL_POLY_4 format |
| --ixtc | the trajectory in xtc format (xdrfile implementation) |
| --itrr | the trajectory in trr format (xdrfile implementation) |
| --mf_dcd | molfile: the trajectory in dcd format |
| --mf_crd | molfile: the trajectory in crd format |
| --mf_crdbox | molfile: the trajectory in crdbox format |
| --mf_gro | molfile: the trajectory in gro format |
| --mf_g96 | molfile: the trajectory in g96 format |
| --mf_trr | molfile: the trajectory in trr format |
| --mf_trj | molfile: the trajectory in trj format |
| --mf_xtc | molfile: the trajectory in xtc format |
| --mf_pdb | molfile: the trajectory in pdb format |
| --repeatX | number of time to align the read trajectory along the fist box component, ingnored with a atomic distribution |
| --repeatY | number of time to align the read trajectory along the second box component, ingnored with a atomic distribution |
| --repeatZ | number of time to align the read trajectory along the third box component, ingnored with a atomic distribution |