Detailed Description

Author: Mark Gates

Script to run testers with various matrix sizes.

See also the run_summarize.py script, which post-processes the output, sorting it into errors (segfaults, etc.), accuracy failures, and known failures. run_summarize.py can apply a larger tolerance without re-running the tests.

Small sizes are chosen around block sizes (e.g., 30...34 around 32) to detect bugs that occur at the block size, and the switch over from LAPACK to MAGMA code. Tall and wide sizes are chosen to exercise different aspect ratios, e.g., nearly square, 2:1, 10:1, 1:2, 1:10. The -h or –help option provides a summary of the options.

Output to file vs. console

When output is redirected to a file, it prints a short summary to stderr on the console and all other output to the file. For example:

  ./run_tests.py --lu --precision s --small > lu.txt
  testing_sgesv_gpu -c                      ok
  testing_sgetrf_gpu -c2                    ok
  testing_sgetf2_gpu -c                     ok
  testing_sgetri_gpu -c                     ** 45 tests failed
  testing_sgetrf_mgpu -c2                   ok
  testing_sgesv -c                          ok
  testing_sgetrf -c2                        ok

  ****************************************************************************************************
  summary
  ****************************************************************************************************
    282 tests in 7 commands passed
     45 tests failed accuracy test
      0 errors detected (crashes, CUDA errors, etc.)
  routines with failures:
      testing_sgetri_gpu -c

When using –interactive with output to console (TTY), it pauses after each test. At the pause, typing "M" re-makes and re-runs that tester, while typing enter goes to the next tester. For example (some output suppressed with ... for brevity):

  ./run_tests.py --lu --precision s --small --interactive
  ****************************************************************************************************
  ./testing_sgesv_gpu -c -n 1:20:1 ...
  ****************************************************************************************************
      N  NRHS   CPU Gflop/s (sec)   GPU Gflop/s (sec)   ||B - AX|| / N*||A||*||X||
  ================================================================================
      1     1     ---   (  ---  )      0.00 (   0.00)   9.26e-08   ok
      2     1     ---   (  ---  )      0.00 (   0.00)   1.32e-08   ok
      3     1     ---   (  ---  )      0.00 (   0.00)   8.99e-09   ok
  ...
    ok
  [enter to continue; M to make and re-run]

  ****************************************************************************************************
  ./testing_sgetri_gpu -c -n 1:20:1 ...
  ****************************************************************************************************
  % MAGMA 1.4.0 svn compiled for CUDA capability >= 3.0
  % CUDA runtime 6000, driver 6000. MAGMA not compiled with OpenMP.
  % device 0: GeForce GT 750M, 925.5 MHz clock, 2047.6 MB memory, capability 3.0
  Usage: ./testing_sgetri_gpu [options] [-h|--help]

      N   CPU Gflop/s (sec)   GPU Gflop/s (sec)   ||R||_F / (N*||A||_F)
  =================================================================
      1      0.00 (   0.00)      0.00 (   0.00)   6.87e+01   failed
      2      0.00 (   0.00)      0.00 (   0.00)   2.41e+00   failed
      3      0.01 (   0.00)      0.00 (   0.00)   1.12e+00   failed
  ...
    ** 45 tests failed
  [enter to continue; M to make and re-run]

  ...

  ****************************************************************************************************
  summary
  ****************************************************************************************************
    282 tests in 7 commands passed
     45 tests failed accuracy test
      0 errors detected (crashes, CUDA errors, etc.)
  routines with failures:
      testing_sgetri_gpu -c

What tests are run

The –blas, –aux, –chol, –hesv, –lu, –qr, –syev, –sygv, –geev, –svd, –batched options run particular sets of tests. By default, all tests are run, except batched because we don't want to run batched with, say, N=1000. –mgpu runs only multi-GPU tests from the above sets. These may be negated with –no-blas, –no-aux, etc.

The –start option skips all testers before the given one, then continues with testers from there. This is helpful to restart a set of tests. For example:

  ./run_tests.py --start testing_spotrf > output.log

If specific testers are named on the command line, only those are run. For example:

  ./run_tests.py testing_spotrf testing_sgetrf

The -p/–precision option controls what precisions are tested, the default being "sdcz" for all four precisions. For example, to run single and double:

  ./run_tests.py -p sd

The -s/–small, -m/–medium, -l/–large options control what sizes are tested, the default being all three sets. -s/–small does small tests, N < 300. -m/–medium does medium tests, N < 1000. -l/–large does large tests, N > 1000. For example, running small and medium tests:

  ./run_tests.py -s -m

Specific tests can be chosen using –itype, –version, -U/–upper, -L/–lower, -J/–jobz, -D/–diag, and –fraction. For instance:

   ./run_tests.py testing_ssygvdx_2stage -L -JN --itype 1 -s --no-mgpu

What is checked

The –memcheck option runs cuda-memcheck. This is very helpful for finding memory bugs (reading & writing outside allocated memory). It is, however, slow.

The –tol option sets the tolerance to verify accuracy. This is 30 by default, which may be too tight for some testers. Setting it somewhat higher (e.g., 50 or 100) filters out spurious accuracy failures. Also see the run_summarize.py script, which parses the testers output and can filter out tests using a higher tolerance after the fact, without re-running them.

Run with default tolerance tol=30.

  ./run_tests.py -s -m testing_sgemv > run-gemv.txt
  testing_sgemv -c                          ** 7 tests failed
  testing_sgemv -T -c                       ok
  testing_sgemv -C -c                       ok

  ****************************************************************************************************
  summary
  ****************************************************************************************************
    302 tests in 3 commands passed
      7 tests failed accuracy test
      0 errors detected (crashes, CUDA errors, etc.)
  routines with failures:
      testing_sgemv -c

Post-process with tolerance tol2=100. Numbers in {braces} are ratio = error/epsilon, which should be < tol. Here, the ratio is just slightly larger {31.2 to 37.4} than the default tol=30.

  ./run_summarize.py --tol2 100 run-gemv.txt
  single epsilon 5.96e-08,  tol2 100,  tol2*eps 5.96e-06,  30*eps 1.79e-06,  100*eps 5.96e-06,  1000*eps 5.96e-05
  double epsilon 1.11e-16,  tol2 100,  tol2*eps 1.11e-14,  30*eps 3.33e-15,  100*eps 1.11e-14,  1000*eps 1.11e-13
  ########################################################################################################################
  okay tests:                                          3 commands,    302 tests


  ########################################################################################################################
  errors (segfault, etc.):                             0 commands,      0 tests


  ########################################################################################################################
  failed tests (error > tol2*eps):                     0 commands,      0 tests


  ########################################################################################################################
  suspicious tests (tol2*eps > error > tol*eps):       1 commands,      7 tests
  ./testing_sgemv
     63 10000      0.19 (   6.73)       1.65 (   0.76)      8.58 (   0.15)   1.86e-06 {   31.2}    1.11e-06 {   18.6}   suspect
     64 10000      0.19 (   6.73)       1.68 (   0.76)     14.36 (   0.09)   2.17e-06 {   36.4}    1.14e-06 {   19.1}   suspect
     65 10000      0.19 (   6.72)       1.43 (   0.91)      8.73 (   0.15)   2.23e-06 {   37.4}    1.09e-06 {   18.3}   suspect
     31 10000      0.09 (   6.70)       1.25 (   0.49)      6.33 (   0.10)   1.93e-06 {   32.4}    8.65e-07 {   14.5}   suspect
     32 10000      0.10 (   6.68)       1.35 (   0.47)     11.00 (   0.06)   2.15e-06 {   36.1}    9.14e-07 {   15.3}   suspect
     33 10000      0.10 (   6.72)       1.24 (   0.53)      9.85 (   0.07)   2.19e-06 {   36.7}    1.07e-06 {   18.0}   suspect
     10 10000      0.03 (   6.58)       0.52 (   0.39)      5.71 (   0.04)   2.23e-06 {   37.4}    1.11e-06 {   18.6}   suspect



  ########################################################################################################################
  known failures:                                      0 commands,      0 tests


  ########################################################################################################################
  ignored errors (e.g., malloc failed):                0 commands,      0 tests


  ########################################################################################################################
  other (lines that did not get matched):              0 commands,      0 tests

The –dev option sets which GPU device to use.

By default, a wide range of sizes and shapes (square, tall, wide) are tested, as applicable. The -n option overrides these.

For multi-GPU codes, –ngpu specifies the number of GPUs, default 2. Most testers accept –ngpu -1 to test the multi-GPU code on a single GPU. (Using –ngpu 1 will usually invoke the single-GPU code.)