Regression Test Manager¶
The test manager for PFLOTRAN is a python program that is responsible for reading a configuration file, identifying the tests declared in the file, running PFLOTRAN on the appropriate input files, and then comparing the results to a known gold standard output file.
Running the Test Manager¶
The test manager can be run in two ways, either as part of the build
system using make
or manually.
There are two options for calling the test manager through make:
make check
and make test
. The check
target runs a small set
of tests that verify that PFLOTRAN is built and running on a given
system. This would be run by user to verify that their installation of
PFLOTRAN is working. The test
target runs a fuller set of regression
tests intended to identify when changes to the code cause significant
changes to PFLOTRAN’s results.
$ cd $PFLOTRAN_DIR/regression_tests
$ make check
or
$ cd $PFLOTRAN_DIR/regression_tests
$ make test
When finished, it is useful to remove all of the outfiles generated by
running the regression tests with the command make clean-tests
:
$ cd $PFLOTRAN_DIR/regression_tests
$ make clean-tests
Calling the test manager through make relies on make variables from PETSc to determine the correct version of python to use, if PFLOTRAN was build with MPI, and optional configurations such as unstructured meshes. The version of python used to call the test manager can be changed from the command line by specifying python:
$ cd ${PFLOTRAN_DIR}/src/pflotran
$ make PYTHON=/opt/local/bin/python3.3 check
To call the test manager manually:
$ cd ${PFLOTRAN_DIR}/regression_tests
$ python regression_tests.py \
--executable ../src/pflotran/pflotran \
--config-file shortcourse/copper_leaching/cu_leaching.cfg \
--tests cu_leaching
Some important command line arguments when running manually are:
executable: the path to the PFLOTRAN executable
mpiexec: the name of the executable for launching parallel jobs, (mpiexec, mpirun, aprun, etc).
config-file: the path to the configuration file containing the tests you want to run
recursive-search: the path to a directory. The test manager searches the directory and all its sub-directories for configuration files.
tests: a list of test names that should be run
suites: a list of test suites that should be run
update: indicate that the the gold standard test file for a given test should be updated to the current output.
new-tests: indicate that the test is new and current output should be used for gold standard test file.
check-performance: include the performance metrics (
SOLUTION
blocks) in regression checks.
The full list of command line options and a brief description can be
found by running with the --help
flag:
$ python regression_tests.py --help
Test output¶
The test manager produces (fairly terse) screen output that includes a progress bar and the status of each test. A legend is provided to help decipher the screen output, and a more detailed explanation of failures and errors can be found in the test log file. Example screen output follows:
Test log file : pflotran-tests-2021-12-21_10-05-24.testlog
Running pflotran regression tests :
Legend
. - success
F - failed regression test (results are outside error tolerances)
M - failed regression test (results are FAR outside error tolerances)
G - general error
U - user error
V - simulator failure (e.g. failure to converge)
X - simulator crash
T - time out error
C - configuration file [.cfg] error
I - missing information (e.g. missing files)
B - pre-processing error (e.g. error in simulation setup scripts
A - post-processing error (e.g. error in solution comparison)
S - test skipped
W - warning
? - unknown
.............................................................................
..............................M....................................FF..F.....
.............FFFFFF...FFFF...X..............................................F
FF........FFF.FFFFFFFFFFFFF..F...............................................
...U..........................................F............
------------------------------------------------------------------------------
Regression test summary:
Total run time: 135.333 [s]
Total tests : 365
Tests run : 365
Failed : 35
Errors : 2
Users should not be surprised if regression test results produce many F failures. Regression test tolerances are set very tight to catch miniscule changes to simulation results (i.e. default absolute and relative error tolerance: 1.e-12). The correct results stored in .regression.gold files are based on a specific OS and compiler (e.g. Ubuntu, GNU compiler, no optimization). A change in operating system or compiler optization settings will generate very small differences in the solution. However, larger discrepancies (denoted by M) or errors are concerning and should be discussed with developers.
The test directories contain any files generated by PFLOTRAN during the
run. Screen output for each test is contained in the file
\${TEST\_NAME}.stdout
.
Configuration Files¶
The regression test manager reads tests specified in a series of
configuration files in standard cfg
(or windows ini
file)
format. They consist of a series of sections with key-value pairs:
[section-name]
key = value
Section names should be all lower case, and spaces must be replaced by a
hyphen or underscore. Comments are specified by a \#
character.
A test is declared as a section in the configuration file. It is assumed that there will be a PFLOTRAN input file with the same name as the test section. The key-value pairs in a test section define how the test is run and the output is compared to the gold standard file.
[calcite-kinetics]
#look for an input file named `calcite-kinetics.in'
np = 2
timeout = 30.0
concentration = 1.0e-10 absolute
np = N, (optional), indicates a parallel test run with N processors. Default is serial. If mpiexec in not provided on the command line, then parallel tests are skipped.
timeout = N, (optional), indicates that the test should be allowed to run for N seconds before it is killed. Default is 60.0 seconds.
TYPE = TOLERANCE COMPARISON, indicates that data in the regression file of type TYPE should be compared using a tolerance of TOLERANCE. Know data types are listed below.
The data types and default tolerances are:
time = 5 percent
concentration = \(1\times 10^{-12}\) absolute
generic = \(1\times 10^{-12}\) absolute
discrete = 0 absolute
rate = \(1\times 10^{-12}\) absolute
volume_fraction = \(1\times 10^{-12}\) absolute
pressure = \(1\times 10^{-12}\) absolute
saturation = \(1\times 10^{-12}\) absolute
residual = \(1\times 10^{-12}\) absolute
The default tolerances are deliberately set very tight, and are expected to be overridden on a per-test or per configuration file basis. There are three known comparisons: “absolute”, for absolute differences (\(\delta=|c-g|\)), “relative” for relative differences (\(\delta={|c-g|}/{g}\)), and “percent” for specifying a percent difference (\(\delta=100\cdot{|c-g|}/{g}\)).
In addition there are two optional sections in configuration files. The section “default-test-criteria” specifies the default criteria to be used for all tests in the current file. Criteria specified in a test section override these value. A section name “suites“ defines aliases for a group of tests.
[suites]
standard = test-1 test-2 test-3
standard_parallel = test-4 test-5 test-6
Common test suites are standard
and standard_parallel
, used by
make test
, and domain specific test suites, geochemistry
,
flow
, transport
, mesh
, et cetra.
Creating New Regression Tests¶
We want running tests to become a habit for developers so that
make pflotran
is always followed by make test.
With that in
mind, ideal test cases are small and fast (< 0.1 seconds),
and operate on a small
subsection of the code so it is easier to diagnose where a problem has
occurred. While it may (will) be necessary to create some platform
specific tests, we want as many tests as possible to be platform
independent and widely used. There is a real danger in having test
output become stale if it requires special access to a particular piece
of hardware, operating system or compiler to run.
The steps for creating new regression tests are:
Create the PFLOTRAN input file, and get the simulation running correctly.
Tell PFLOTRAN to generate a regression file by adding a regression block to the input file, e.g.:
REGRESSION CELL_IDS 1 3978 / CELLS_PER_PROCESS 4 VARIABLES LIQUID_PRESSURE GAS_SATURATION / END
Add the test to the configuration file
Refine the tolerances so that they will be tight enough to identify problems, but loose enough that they do not create a lot of false positives and discourage users and developers from running the tests.
Add the test to the appropriate test suite.
Add the configuration file, input file and “gold” file to revision control.
Updating Test Results¶
The output from PFLOTRAN should be fairly stable, and we consider the current output to be “correct”. Changes to regression output should be rare, and primarily done for bug fixes. Updating the test results is simply a matter of replacing the gold standard file with a new file. This can be done with a simple rename in the file system:
mv test_1.regression test_1.regression.gold
Or using the regression test manager:
$ python regression_tests.py --executable ../src/pflotran/pflotran \
--config-file my_test.cfg --tests test_1 --update
Updating through the regression test manager ensures that the output is from your current executable rather than a stale file.
Please document why you updated gold standard files in your revision control commit message.