Class Runner
The Runner program is targeted at three use cases:
-
Sequential jobs on a cluster parallel computer. Each job is a
sequential (single-threaded) program. The Runner program is running on a
cluster parallel computer with N nodes and one CPU per node. Run the
Runner program as follows:
java -Dpj.nn=N edu.rit.pj.job.Runner . . .
The Runner program runs with one process per node and one thread per process. -
Sequential jobs on a hybrid parallel computer. Each job is a
sequential (single-threaded) program. The Runner program is running on a
hybrid SMP cluster parallel computer with N nodes and C total
CPUs. (For example, on a hybrid parallel computer with 10 nodes and 4 CPUs
per node, C = 40.) Run the Runner program as follows:
java -Dpj.nn=N -Dpj.np=C edu.rit.pj.job.Runner . . .
The Runner program runs with multiple processes per node and one thread per process. -
SMP parallel jobs on a hybrid parallel computer. Each job is an SMP
parallel (multi-threaded) program. The Runner program is running on a hybrid
SMP cluster parallel computer with N nodes and multiple CPUs per node.
Run the Runner program as follows:
java -Dpj.nn=N edu.rit.pj.job.Runner . . .
The Runner program runs with one process per node and multiple threads per process, typically as many threads as there are CPUs on the node.
All these processes form a worker team. The Runner program uses the job generator specified on the command line to create jobs and sends each job to a worker team process to be executed.
When the Runner program starts, it prints the job generator constructor expression on the standard output. Whenever a job starts or finishes, the Runner program prints a log message on the standard output consisting of the job's number and description.
Checkpointing. It is recommended to redirect the Runner program's standard output into a checkpoint file. If a failure occurs before the Runner program finishes running all the jobs, the checkpoint file contains a record of the job generator that was used as well as which jobs did and did not finish. To resume the Runner program where it left off, specify the checkpoint file name on the command line instead of a job generator constructor expression. The Runner program reads the checkpoint file to determine the job generator and the jobs that finished. The Runner program then generates and runs the jobs that did not finish.
Usage: java edu.rit.pj.job.Runner { generator | file }
generator = Job generator constructor expression
file = Checkpoint file name
- Version:
- 22-Oct-2010
- Author:
- Alan Kaminsky
-
Method Summary