Rst-wham-2d: Difference between revisions

From PMFLib Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(5 intermediate revisions by the same user not shown)
Line 9: Line 9:


<span style="color: maroon;">rst-wham-2d</span>
<span style="color: maroon;">rst-wham-2d</span>
<span style="color: purple;">Px</span><span style="color: blue;"><nowiki>[=0|pi|val]</nowiki></span>
 
<span style="color: purple;">hist_min_x hist_max_x num_bins_x</span>
::<span style="color: purple;">Px</span><span style="color: blue;"><nowiki>[=0|pi|val]</nowiki></span><span style="color: purple;">hist_min_x hist_max_x num_bins_x</span>
<span style="color: purple;">Py</span><span style="color: blue;"><nowiki>[=0|pi|val]</nowiki></span>
 
<span style="color: purple;">hist_min_y hist_max_y num_bins_y
::<span style="color: purple;">Py</span><span style="color: blue;"><nowiki>[=0|pi|val]</nowiki></span><span style="color: purple;">hist_min_y hist_max_y num_bins_y</span>
tol temperature numpad metadatafile freefile usemask</span>
 
::<span style="color: purple;">tol temperature numpad metadatafile freefile usemask</span>


'''Description'''
'''Description'''
Line 23: Line 24:


{| style="margin-left: 2em; width: 75%;"
{| style="margin-left: 2em; width: 75%;"
 
The command line arguments largely have the same meaning as they do for the one dimensional [[ rst-wham ]] program.
|- style="vertical-align: top;"
| <span style="color: purple;"><nowiki>P|Ppi|Pval</nowiki></span><br/>
<span style="margin-left: 30px;">The first (optional) argument specifies the periodicity of the reaction coordinate. For a nonperiodic reaction coordinate (a distance, for example), it should be left out. "P" means that the reaction coordinate has a periodicity of 360, appropriate for angles. "Ppi" specifies a periodicity of 2*pi, appropriate for angles measured in radians. "Pval" specifies periodicty of some arbitrary amount, val, which should be an integer or floating point number. For example, "P180.0" would be appropriate for an angle with twofold symmetry.</span>
|- style="vertical-align: top; background-color: #e6e6e6;"
| <span style="color: purple;">hist_min hist_max</span><br/>
<span style="margin-left: 30px;">Specify the boundaries of the histogram. As a rule, all data points outside the range (hist_min, hist_max) are silently ignored. The only exception is that if an entire trajectory is outside the range, the program halts with an error message. The solution is to remove that file from the metadata file. hist_min and hist_max should be floating point numbers. </span>
|- style="vertical-align: top;"
| <span style="color: purple;">num_bins</span><br/>
<span style="margin-left: 30px;">Specifies the number of bins in the histogram, and as a result the number of points in the final PMF. It should be an integer.</span>
|- style="vertical-align: top; background-color: #e6e6e6;"
| <span style="color: purple;">tol</span><br/>
<span style="margin-left: 30px;">tol is the convergence tolerance for the WHAM calculations. Specifically, the WHAM iteration is considered to be converged when no Fi value for any simulation window changes by more than tol on consecutive iterations. As the program runs, it prints the average change in the F values for the most recent iteration. Obviously, this number will be smaller than tol before the computation converges, because convergence is triggered by the largest change as opposed to the average.</span>
|- style="vertical-align: top;"
| <span style="color: purple;">temperature</span><br/>
<span style="margin-left: 30px;">temperature is a floating point number representing the temperature in Kelvin at which the weighted histogram calculation is performed. This does not have to be the temperature at which the simulations were performed (see below for discussion). </span>
|- style="vertical-align: top; background-color: #e6e6e6;"
| <span style="color: purple;">numpad</span><br/>
<span style="margin-left: 30px;">numpad specifies the number of "padding" values that should be printed for periodic PMFs. This number should be set to 0 for aperiodic reaction coordinates. It doesn't actually affect the calculation in any way. Rather, it just alters the final printout of the free energy, to make plotting of periodic reaction coordinates simpler. This is more important for wham-2d than wham.</span>
|- style="vertical-align: top;"
|- style="vertical-align: top;"
| <span style="color: purple;">metadatafile</span><br/>
| <span style="color: purple;">Px (Py)</span><br/>
<span style="margin-left: 30px;">metadatafile specifies the name of the metadata file. The format of this file is described below.</span>
<span style="margin-left: 30px;">The periodicity arguments are not optional.
"Px" by itself indicates that the first dimension of the reaction coordinate has a period of 360. "Px=0" turns off periodicity. "Px=pi" specifies a period of 2*pi, and "Px=val" allows you to choose an arbitrary value for the period.</span>
|- style="vertical-align: top; background-color: #e6e6e6;"
|- style="vertical-align: top; background-color: #e6e6e6;"
| <span style="color: purple;">freefile </span><br/>
| <span style="color: purple;">hist_min_x hist_max_x num_bins_x</span><br/>
<span style="margin-left: 30px;">freefile is the name used for the file containing the final PMF and probability distribution.</span>
<span style="margin-left: 30px;">behave exactly like hist_min, hist_max, and num_bins do in the 1 dimensional program. Py, hist_min_y, etc., behave the same as Px, hist_min_x, etc., except they control the second coordinate of the PMF.</span>
|- style="vertical-align: top;"
|- style="vertical-align: top;"
| <span style="color: purple;">num_MC_trials randSeed</span><br/>
| <span style="color: purple;">use_mask</span><br/>
<span style="margin-left: 30px;">num_MC_trials and randSeed (Optional) are both related to the performance of Monte Carlo bootstrap error analysis. If these values are not supplied, error analysis is not performed. num_MC_trials should be an integer specifying the number of fake data sets which should be generated. randSeed is an integer which controls the random number seed - the value you pick should be irrelevant, but I let the user set it primarily for debugging purposes.</span>
<span style="margin-left: 30px;">expects an integer value, and if its values is non-zero turns on the automasking feature, which causes bins for which there is no sample data to be excluded from the wham calculation.</span>
|}
|}


Line 57: Line 41:
{| style="margin-left: 2em; width: 95%;"
{| style="margin-left: 2em; width: 95%;"
|- style="vertical-align: top; background-color: #e6e6e6;"
|- style="vertical-align: top; background-color: #e6e6e6;"
|<span style="color: blue;">metadata</span><br/><span style="margin-left: 0px;">Each line of the metadata file should be blank, begin with a "#" (marking a comment), or have the following format:
|<span style="color: blue;">metadata</span><br/><span style="margin-left: 0px;">As with regular 1 dimensional wham, each line of the metadata file should either be blank, begin with a "#", or have the following format


::<code>/path/to/timeseries/file   loc_win_min  spring  [correl time] [temperature]</code>
::<code>/path/to/timeseries/file loc_win_x loc_win_y spring_x spring_y [correl time] [temp]</code>


This first field is the name of one of the time series files (more on this in a moment). The second field, loc_win_min, is the location of the minimum of the biasing potential for this simulation, a floating point number. The third field, spring, is the spring constant for the biasing potential used in this simulation, assuming the biasing potential is of the format
This first field is the name of one of the time series files. loc_win_x and loc_win_y are the locations of the minimum of the biasing terms in the first and second dimensions of the reaction coordinate. spring_x and spring_y are the spring constants used for the biasing potential in this simulation, assuming the biasing potential is of the format


::<math> V= \frac{1}{2}k (x-x_0)^2 </math>
::<math> V= \frac{1}{2}(k_x (x-x_0)^2 + k_y(y-y_0)^2 )</math>


Many simulation packages, including TINKER, AMBER, and CHARMM, do not include the [1/2] when they specify spring constants for their restraint terms. This is a common source of error (I'd love to change my code to match the other packages' behavior, but then experienced users who don't read the manual would get messed up). Also, the units for the spring constant must match those for the time series. So, if your time series is a distance recorded in Ångstroms, the spring constant must be in kcal/mol-Å2. AMBER users should take care when using angular restraints: the specification and output of angles is in degrees, but AMBER's spring constants use kcal/mol-rad^2.
The sixth argument ("correl time") specifies the decorrelation time for your time series, in units of time steps. It is only used when generating fake data sets for Monte Carlo bootstrap error analysis, where it modulates the number of points per fake data set. This argument is optional, and is ignored if you don't do error analysis. If you're doing multiple temperatures but not bootstrapping, set it to any integer value as a placeholder, and it'll be ignored. See section for more discussion about how to do bootstrapping.


The fourth argument ("correl time") specifies the decorrelation time for your time series, in units of time steps. It is only used when generating fake data sets for Monte Carlo bootstrap error analysis, where it modulates the number of points per fake data set. This argument is optional, and is ignored if you don't do error analysis. If you're doing multiple temperatures but not bootstrapping, set it to any integer value as a placeholder, and it'll be ignored. See section for more discussion about how to do bootstrapping.
Finally, the last field is the temperature this simulation was run at. If not supplied, the temperature specified on the command line is used. In the present version of the code, you must either leave the temperature unspecified for all simulations or specify it for all simulations.


Finally, the last (optional) field is the temperature for this simulation. If not supplied, the temperature specified on the command line is used. In the present version of the code, you must either leave the temperature unspecified for all simulations or specify it for all simulations.
The time series files must follow one of two formats, depending on whether the temperature was specified in the metadata file. If no temperature was specified, the file should contain three columns, where the first is the time (which isn't actually used), and the second and third are the position of the system along the x and y reaction coordinates, respectively. Both numbers should be in floating point format. Lines beginning with "#" are ignored as comments. Additional columns of data are ignored.


The time series files must follow one of two formats, depending on whether the temperature was specified in the metadata file. If no temperature was specified, the file should contain two columns, where the first is the time (which isn't actually used), and the second is the position of the system along the reaction coordinate. Both numbers should be in floating point format. Lines beginning with "#" are ignored as comments. Additional columns of data are ignored.
If the simulation temperature is specified, there must be a fourth column of data, containing the system's potential energy at that time point. It should be a floating point value. See the section on replica exchange for more details.  
If the simulation temperature is specified, there must be a third column of data, containing the system's potential energy at that time point. It should be a floating point value.  
</span>
</span>


|- style="vertical-align: top;"
|- style="vertical-align: top;"
|<span style="color: blue;">freefile</span><br/><span style="margin-left: 0px;">
|<span style="color: blue;">freefile</span><br/><span style="margin-left: 0px;">
The first line of the output file contains echoes command line. The next line or two contain comments describing the periodicity used and the number of simulation windows present. While the calculation is running, it will print out lines that look like the following:
The output largely resembles that for wham, except with more columns. The first line echoes the command line, followed by a specification of the periodicity, and the number of windows. The iteration lines have the same meaning. When the current value for the PMF is dumped, the format looks like
 
#Iteration 10:  0.106019
#Iteration 20:  0.062269
#Iteration 30:  0.039890
#Iteration 40:  0.027003
 
This specifies the current iteration number, and the average change in the F values for the current iteration. This number is not used for deciding when the calculation has converged; rather, the maximum change, as opposed to the average, is used.
Every 100 iterations, the current version of the PMF is dumped into the output file. These lines look like


  -178.000000     0.014212       4909.138943
  -172.500000     -172.500000    1.968750       15.394489
  -174.000000     0.062631       4525.390035
  -172.500000     -157.500000    2.574512       5.522757
  -170.000000     0.227076       3432.434076
  -172.500000    -142.500000     3.147538       2.094142
  -166.000000     0.494262        2190.487110
  -172.500000     -127.500000     3.505869       1.141952
-162.000000     0.817734       1271.708620


The first column is the value of the reaction coordinate, the second is the value of the PMF, and the third is the unnormalized probability distribution.
where the first two columns are the values of the first and second dimensions of the reaction coordinate, the third column is the PMF, and the last column is the unnormalized probability.
Once the calculation has converged, wham will produce output resembling
Once the calculation has converged, wham will produce output resembling


  # Dumping simulation biases, in the metadata file order  
  # Dumping simulation biases, in the metadata file order  
  # Window  F (free energy units)
  # Window  F (free energy units)
  # 0     0.000004
  #0       -0.000004
  # 1     -4.166136
  #1       -0.156869
  # 2     -3.241052
  #2       -0.534845
  # 3     -4.475215
  #3       -2.445469
# 4    -6.324340
# 5    -7.128731


These are the final F values from the wham calculation, and can be used for computing weighted averages for properties other than the free energy.
These are the final F values from the wham calculation, and can be used for computing weighted averages for properties other than the free energy.
You may have noticed that all of the lines except the free energies are preceded by "#". This allows you to check convergence of your wham calculation by simply plotting the output file in gnuplot. If the free energy curves have stopped changing, your tolerance is small enough.
If you specified a nonzero number of Monte Carlo bootstrap error analysis trials, you will see lines that resemble
If you specified a nonzero number of Monte Carlo bootstrap error analysis trials, you will see lines that resemble


Line 118: Line 89:
The free energy data file is written when the calculation converges, and resembles:
The free energy data file is written when the calculation converges, and resembles:


  #Coor          Free            +/-             Prob            +/-
  -232.500000    -232.500000     4.812986       0.003185       0.000001       0.000000
-178.000000     0.014386       0.000098       0.106389       0.000017
  -232.500000    -217.500000     4.830312       0.003741       0.000001       0.000000
  -174.000000     0.068560       0.000151       0.097128       0.000025
  -232.500000     -202.500000     4.898622       0.001009       0.000000       0.000000
  -170.000000     0.250825        0.000350        0.071496        0.000042
-166.000000     0.523786       0.000294       0.045186       0.000022


The first column is the value of the reaction coordinate, the second is the free energy. The third is the statistical uncertainty of the free energy (which is only meaningful if you performed Monte Carlo bootstrapping). The fourth and fifth columns are the probability and it's associated statistical uncertainty. Again, the latter is only meaningful if bootstrapping is performed. See section for further discussion of error estimation.  
The first two columns are the locations along the first and second dimensions of the reaction coordinate. The third is the free energy, while the fourth is the statistical uncertainty in the free energy. The fifth and sixth columns are the normalized probability and its statistical uncertainty. The two uncertainty columns will be zero if you did not use Monte Carlo bootstrapping.  
</span>
</span>
|}
|}

Latest revision as of 13:39, 19 April 2016

Navigation: Documentation / Methods / Restrained Dynamics / RST:Utilities / rst-wham-2d


RST Utilities

Name

rst-wham-2d

Synopsis

rst-wham-2d

Px[=0|pi|val]hist_min_x hist_max_x num_bins_x
Py[=0|pi|val]hist_min_y hist_max_y num_bins_y
tol temperature numpad metadatafile freefile usemask

Description

These programs (wham and wham-2d) implement the Weighted Histogram Analysis Method of Kumar, et al ("Multidimensional free-energy calculations using the weighted histogram analysis method", J. Comput. Chem., 16:1339-1350, 1995). The code generally follows the notation used by Benoit Roux ("The calculation of the potential of mean force using computer simulations", Comput. Phys. Comm., 91:275-282, 1995). Consult these papers for the theoretical background and justification for the method.


Arguments

The command line arguments largely have the same meaning as they do for the one dimensional rst-wham program.
Px (Py)

The periodicity arguments are not optional. "Px" by itself indicates that the first dimension of the reaction coordinate has a period of 360. "Px=0" turns off periodicity. "Px=pi" specifies a period of 2*pi, and "Px=val" allows you to choose an arbitrary value for the period.

hist_min_x hist_max_x num_bins_x

behave exactly like hist_min, hist_max, and num_bins do in the 1 dimensional program. Py, hist_min_y, etc., behave the same as Px, hist_min_x, etc., except they control the second coordinate of the PMF.

use_mask

expects an integer value, and if its values is non-zero turns on the automasking feature, which causes bins for which there is no sample data to be excluded from the wham calculation.

File formats

metadata
As with regular 1 dimensional wham, each line of the metadata file should either be blank, begin with a "#", or have the following format
/path/to/timeseries/file loc_win_x loc_win_y spring_x spring_y [correl time] [temp]

This first field is the name of one of the time series files. loc_win_x and loc_win_y are the locations of the minimum of the biasing terms in the first and second dimensions of the reaction coordinate. spring_x and spring_y are the spring constants used for the biasing potential in this simulation, assuming the biasing potential is of the format

The sixth argument ("correl time") specifies the decorrelation time for your time series, in units of time steps. It is only used when generating fake data sets for Monte Carlo bootstrap error analysis, where it modulates the number of points per fake data set. This argument is optional, and is ignored if you don't do error analysis. If you're doing multiple temperatures but not bootstrapping, set it to any integer value as a placeholder, and it'll be ignored. See section for more discussion about how to do bootstrapping.

Finally, the last field is the temperature this simulation was run at. If not supplied, the temperature specified on the command line is used. In the present version of the code, you must either leave the temperature unspecified for all simulations or specify it for all simulations.

The time series files must follow one of two formats, depending on whether the temperature was specified in the metadata file. If no temperature was specified, the file should contain three columns, where the first is the time (which isn't actually used), and the second and third are the position of the system along the x and y reaction coordinates, respectively. Both numbers should be in floating point format. Lines beginning with "#" are ignored as comments. Additional columns of data are ignored.

If the simulation temperature is specified, there must be a fourth column of data, containing the system's potential energy at that time point. It should be a floating point value. See the section on replica exchange for more details.

freefile

The output largely resembles that for wham, except with more columns. The first line echoes the command line, followed by a specification of the periodicity, and the number of windows. The iteration lines have the same meaning. When the current value for the PMF is dumped, the format looks like

-172.500000     -172.500000     1.968750        15.394489
-172.500000     -157.500000     2.574512        5.522757
-172.500000     -142.500000     3.147538        2.094142
-172.500000     -127.500000     3.505869        1.141952

where the first two columns are the values of the first and second dimensions of the reaction coordinate, the third column is the PMF, and the last column is the unnormalized probability. Once the calculation has converged, wham will produce output resembling

# Dumping simulation biases, in the metadata file order 
# Window  F (free energy units)
#0       -0.000004
#1       -0.156869
#2       -0.534845
#3       -2.445469 

These are the final F values from the wham calculation, and can be used for computing weighted averages for properties other than the free energy. If you specified a nonzero number of Monte Carlo bootstrap error analysis trials, you will see lines that resemble

#MC trial 0: 990 iterations
#MC trial 1: 973 iterations
#MC trial 2: 970 iterations
#MC trial 3: 981 iterations
#MC trial 4: 984 iterations

at the end of the file. The free energy data file is written when the calculation converges, and resembles:

-232.500000     -232.500000     4.812986        0.003185        0.000001        0.000000
-232.500000     -217.500000     4.830312        0.003741        0.000001        0.000000
-232.500000     -202.500000     4.898622        0.001009        0.000000        0.000000

The first two columns are the locations along the first and second dimensions of the reaction coordinate. The third is the free energy, while the fourth is the statistical uncertainty in the free energy. The fifth and sixth columns are the normalized probability and its statistical uncertainty. The two uncertainty columns will be zero if you did not use Monte Carlo bootstrapping.