Your Software in Jug Master

There are a number of expectations about how your software should operate. For example, it should exit with status zero in order for the job to be marked as complete.

These sorts of details are described in the following sections.

Packaging Software

You can package your software in a tar (or zip or tgz or tar.gz). If you have access to a shared filesystem from all worker nodes, you may also simply have your software package(s) pre-installed there. Your software package may optionally contain a subdirectory named "package". The contents of package that have special meaning to JugMaster are:


package
   install
   setup.sh
   commands

The install program is called once after unpacking the files into their own directory. The first command-line argument is the location of the unpacked files. The install script should make any changes to the installed files so that they will work from their present location.

The script setup.sh is sourced at runtime to load any environment variables (such as PATH and LD_LIBRARY_PATH) needed in order to run the software. The first argument to this script is the location of the installed package, just like the argument to install. The working directory, when the script is called, will be the job's runtime directory.

The commands file lists the commands that should be run by JugWorker when running a job that uses this software. This frees the user from having to specify run_command or whatever other command types that you define in the package/commands file. The format of the file is a list of key=value pairs. Example:

run_command = my_run_command
stage_out_command = my_stage_out_command

Any $1 in the command will be replaced by the software installation path.

Note that the entire package directory is optional. However, without at least setup.sh or commands, it would be difficult to make use of a software package that is not pre-installed, since it is placed at a location that is not known in advance.

Example

A typical software package has the following directory layout:


package
  setup.sh
bin
  my_program
lib
  my_lib.so

The setup script would need to setup PATH and LD_LIBRARY_PATH like this:


PATH=$1/bin:$PATH
export PATH
LD_LIBRARY_PATH=$1/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH

See also the section on File Attributes, since some of these affect how software is managed.

Working Directory and Environment

Before your job starts, a fresh working directory is created for it. This will be the current working directory when the job is started. Any input files specified in the job description will have been copied into this directory as well.

The environment for the job is set up from the job description and also by sourcing all setup.sh scripts for the job's software packages. The environment of JugWorker is also inherited by the job.

Executable

The command that is run to start your job is expected to be stored in the environment variable JUG_RUN. By default, this has a value of jug_run.

You may also perform some operation at stage-in or stage-out time. The commands to execute should be specified in JUG_STAGE_IN and JUG_STAGE_OUT. The advantage of doing file movement here rather than in the execution step is that the file movement can be done alongside the running of another job.

Output

Output files to be stored by JugMaster should be placed in a directory named output in your job's working directory. Subdirectories of output are permitted as well.

If your job creates output files in this manner, you should also run some storage workers to handle the files.

Exit Status

Any non-zero exit status is considered to be an unexpected transient failure. The worker that ran the job will try to gracefully exit and the job will be rescheduled to run again.

Monitoring Results

When your job finishes, it can report some data that is monitored by JugMaster. It does this by writing key=value pairs into the file jug_monitor.vars

Any variables you write will be saved in the database when the job finishes and can be viewed via JugCGI. Some special variables have additional meaning. Currently, there is just one special variable:

work_done: This could be a count of events processed or iterations done--whatever makes sense for your job. The total work done will be summarized at the batch level, so you can view it with JugCGI or jug_batch_status --work_done.

If your job does its own output storage, you can record information about the output files in jug_stage_out.log. The format of this file is key=value pairs, one per line. The variables that you may use are listed below.

source_name = logical file name
storage_name = logical file name actually stored
storage_hfn = hostname:/path/to/stored/file
storage_url = url of stored file
storage_size = size of file
storage_md5 = md5 checksum of file

The record for each file must begin with source_name. All other attributes are optional and may come in any order. The storage_name attribute is only needed in the obscure case where you wish to indicate that the logical name of a file is being changed in the process of storing it. Logical names may include some path information (e.g. "data/output.ntpl" or "log/cmkin.log").

If you write your own storage handler, to store the files that execution jobs place in their output directory, you will write out a file in the same format as described above, except the file is named jug_storage.log. The file size and md5 checksum are automatically recorded (and verified) for you when the files are retrieved from the execution worker.