This directory contains several examples showing how to 
write fault tolerant MPI applications with FT-MPI,
in both C and Fortran. In case of any questions, 
please contact

   harness@cs.utk.edu or ftmpi@cs.utk.edu

   or view the FT-MPI project webpage at 
   http://icl.cs.utk.edu/ftmpi

1. Example pi-ft, fpi-ft:
   uses the files solver.c, solver.h, slave.c
   respectivly fsolver.f, solver.inc, fslave.f

   This is a simple master/slave application, which
   calculates pi in a parallel, fault tolerant manner.
   The master distributes the work to slaves, collects
   the local results and calculates the global result.
   ATTENTION: at least one slave process has to be alive!
              This is not a requirement of FT-MPI, but helps
	      to keep this example simple and readable
 

   The master has to keep track of the state of each process.
   Each slave-process can have any of the following states:
   - AVAILABLE: this processes does not have any work
                currently assigned to it
   - WORKING: this processes has received its peace if the
	      work 
   - FINISHED: this processes has received already the
	       "everything is done" message and will call
               MPI_Finalize
   - FAULT: this processes has died, its work is marked as
            lost and will be redistributed

   The master and the slave processes both check the return
   value of each MPI-call. On error, everybody calls its
   error handler (master: recover_master, slaves: recover_slave).
   Then, the same operation is called again, etc.


   If the master processes has died, the respawned master restarts
   the work from the beginning.

   This example also includes a routine which shows, how the application
   and each processes can detect in a portable way, whether they
   have been respawned after an error or not, how many processes
   have been respawned etc. (see routine checkwhodied ). An alternative
   (maybe simpler, however non-portable version) would be to check
   on each processes the return value of MPI_Init. If the return code
   of MPI_Init is MPI_INIT_RESTARTED_NODE, than this processes was
   not part of the initial set of processes on this application.


2. pi-ft-errh, fpi-ft-errh:
   uses the files solvererrh.c, solvererrh.h, slaveerrh.c
   respectively fsolvererrh.f, solver.inc,  fslaveerrh.f

   Basically the same functionality like the previous examples, however
   this version avoids the checking of the return value of each 
   MPI-call, but uses MPI errorhandlers instead.
  
   For beeing able to use error handler, the slave had to be extended
   to keep track of its own state (AVAILABLE, SLDONE ). The master
   also had to introduce two new states (RECEIVED, HALT). Additionally,
   rules had to be introduced for changing the state of processes 
   (e.g. you can reach certain states only from certain other 
   states).


3. pi-ft-gr:
   using solvergr.c, solvergr.h slavegr.c

   graphical example of pi-ft using MPE for visualizing the behaviour,
   NOT WORKING YET.

