Rst-wham: Difference between revisions

3,230 bytes added ,  19 April 2016
no edit summary
No edit summary
No edit summary
 
(13 intermediate revisions by the same user not shown)
Line 13: Line 13:


'''Description'''
'''Description'''
These programs (wham and wham-2d) implement the Weighted Histogram Analysis Method of Kumar, et al ("Multidimensional free-energy calculations using the weighted histogram analysis method", J. Comput. Chem., 16:1339-1350, 1995). The code generally follows the notation used by Benoit Roux ("The calculation of the potential of mean force using computer simulations", Comput. Phys. Comm., 91:275-282, 1995). Consult these papers for the theoretical background and justification for the method.  
 
These programs (wham and wham-2d) implement the [http://membrane.urmc.rochester.edu/sites/default/files/wham/doc.html Weighted Histogram Analysis Method of Kumar, et al ] ("Multidimensional free-energy calculations using the weighted histogram analysis method", J. Comput. Chem., 16:1339-1350, 1995). The code generally follows the notation used by Benoit Roux ("The calculation of the potential of mean force using computer simulations", Comput. Phys. Comm., 91:275-282, 1995). Consult these papers for the theoretical background and justification for the method.  




Line 53: Line 54:
{| style="margin-left: 2em; width: 95%;"
{| style="margin-left: 2em; width: 95%;"
|- style="vertical-align: top; background-color: #e6e6e6;"
|- style="vertical-align: top; background-color: #e6e6e6;"
|<span style="color: blue;">metadata</span><br/><span style="margin-left: 30px;">Each line of the metadata file should be blank, begin with a "#" (marking a comment), or have the following format:
|<span style="color: blue;">metadata</span><br/><span style="margin-left: 0px;">Each line of the metadata file should be blank, begin with a "#" (marking a comment), or have the following format:


::<code>/path/to/timeseries/file    loc_win_min  spring  [correl time] [temperature]</code>
::<code>/path/to/timeseries/file    loc_win_min  spring  [correl time] [temperature]</code>
Line 59: Line 60:
This first field is the name of one of the time series files (more on this in a moment). The second field, loc_win_min, is the location of the minimum of the biasing potential for this simulation, a floating point number. The third field, spring, is the spring constant for the biasing potential used in this simulation, assuming the biasing potential is of the format
This first field is the name of one of the time series files (more on this in a moment). The second field, loc_win_min, is the location of the minimum of the biasing potential for this simulation, a floating point number. The third field, spring, is the spring constant for the biasing potential used in this simulation, assuming the biasing potential is of the format


Many simulation packages, including TINKER, AMBER, and CHARMM, do not include the [1/2] when they specify spring constants for their restraint terms. This is a common source of error (I'd love to change my code to match the other packages' behavior, but then experienced users who don't read the manual would get messed up). Also, the units for the spring constant must match those for the time series. So, if your time series is a distance recorded in Ångstroms, the spring constant must be in kcal/mol-Å2. AMBER users should take care when using angular restraints: the specification and output of angles is in degrees, but AMBER's spring constants use kcal/mol-rad2.
::<math> V= \frac{1}{2}k (x-x_0)^2  </math>
 
Many simulation packages, including TINKER, AMBER, and CHARMM, do not include the [1/2] when they specify spring constants for their restraint terms. This is a common source of error (I'd love to change my code to match the other packages' behavior, but then experienced users who don't read the manual would get messed up). Also, the units for the spring constant must match those for the time series. So, if your time series is a distance recorded in Ångstroms, the spring constant must be in kcal/mol-Å2. AMBER users should take care when using angular restraints: the specification and output of angles is in degrees, but AMBER's spring constants use kcal/mol-rad^2.


The fourth argument ("correl time") specifies the decorrelation time for your time series, in units of time steps. It is only used when generating fake data sets for Monte Carlo bootstrap error analysis, where it modulates the number of points per fake data set. This argument is optional, and is ignored if you don't do error analysis. If you're doing multiple temperatures but not bootstrapping, set it to any integer value as a placeholder, and it'll be ignored. See section for more discussion about how to do bootstrapping.
The fourth argument ("correl time") specifies the decorrelation time for your time series, in units of time steps. It is only used when generating fake data sets for Monte Carlo bootstrap error analysis, where it modulates the number of points per fake data set. This argument is optional, and is ignored if you don't do error analysis. If you're doing multiple temperatures but not bootstrapping, set it to any integer value as a placeholder, and it'll be ignored. See section for more discussion about how to do bootstrapping.
Line 67: Line 70:
The time series files must follow one of two formats, depending on whether the temperature was specified in the metadata file. If no temperature was specified, the file should contain two columns, where the first is the time (which isn't actually used), and the second is the position of the system along the reaction coordinate. Both numbers should be in floating point format. Lines beginning with "#" are ignored as comments. Additional columns of data are ignored.
The time series files must follow one of two formats, depending on whether the temperature was specified in the metadata file. If no temperature was specified, the file should contain two columns, where the first is the time (which isn't actually used), and the second is the position of the system along the reaction coordinate. Both numbers should be in floating point format. Lines beginning with "#" are ignored as comments. Additional columns of data are ignored.
If the simulation temperature is specified, there must be a third column of data, containing the system's potential energy at that time point. It should be a floating point value.  
If the simulation temperature is specified, there must be a third column of data, containing the system's potential energy at that time point. It should be a floating point value.  
</span>
|- style="vertical-align: top;"
|<span style="color: blue;">freefile</span><br/><span style="margin-left: 0px;">
The first line of the output file contains echoes command line. The next line or two contain comments describing the periodicity used and the number of simulation windows present. While the calculation is running, it will print out lines that look like the following:
#Iteration 10:  0.106019
#Iteration 20:  0.062269
#Iteration 30:  0.039890
#Iteration 40:  0.027003
This specifies the current iteration number, and the average change in the F values for the current iteration. This number is not used for deciding when the calculation has converged; rather, the maximum change, as opposed to the average, is used.
Every 100 iterations, the current version of the PMF is dumped into the output file. These lines look like
-178.000000    0.014212        4909.138943
-174.000000    0.062631        4525.390035
-170.000000    0.227076        3432.434076
-166.000000    0.494262        2190.487110
-162.000000    0.817734        1271.708620
The first column is the value of the reaction coordinate, the second is the value of the PMF, and the third is the unnormalized probability distribution.
Once the calculation has converged, wham will produce output resembling
# Dumping simulation biases, in the metadata file order
# Window  F (free energy units)
# 0    0.000004
# 1    -4.166136
# 2    -3.241052
# 3    -4.475215
# 4    -6.324340
# 5    -7.128731
These are the final F values from the wham calculation, and can be used for computing weighted averages for properties other than the free energy.
You may have noticed that all of the lines except the free energies are preceded by "#". This allows you to check convergence of your wham calculation by simply plotting the output file in gnuplot. If the free energy curves have stopped changing, your tolerance is small enough.
If you specified a nonzero number of Monte Carlo bootstrap error analysis trials, you will see lines that resemble
#MC trial 0: 990 iterations
#MC trial 1: 973 iterations
#MC trial 2: 970 iterations
#MC trial 3: 981 iterations
#MC trial 4: 984 iterations
at the end of the file.
The free energy data file is written when the calculation converges, and resembles:
#Coor          Free            +/-            Prob            +/-
-178.000000    0.014386        0.000098        0.106389        0.000017
-174.000000    0.068560        0.000151        0.097128        0.000025
-170.000000    0.250825        0.000350        0.071496        0.000042
-166.000000    0.523786        0.000294        0.045186        0.000022
The first column is the value of the reaction coordinate, the second is the free energy. The third is the statistical uncertainty of the free energy (which is only meaningful if you performed Monte Carlo bootstrapping). The fourth and fifth columns are the probability and it's associated statistical uncertainty. Again, the latter is only meaningful if bootstrapping is performed. See section for further discussion of error estimation.
</span>
</span>
|}
|}
144

edits