US CMS MOP Sandbox Mode

In the past, US CMS MOP relied upon a network filesystem being directly accessible to all jobs running at a particular site. In sandbox mode, each job runs in its own local scratch directory, the "sandbox". When the job starts, the CMS software is installed in the sandbox and input files are uploaded. When the job finishes, output files are copied back to the point of submission.

Currently, it is a requirement that you be using the Condor job manager in order to use sandbox mode, because it relies on Condor's file transfer mechanism.

The main advantage of the sandbox is greater freedom in where the jobs run. At the University of Wisconsin, we allow jobs to opportunistically "flock" into other large Condor pools, which greatly expands the number of available CPUs. We cannot require that machines in other pools be able to directly access our own shared disks.

Sandbox mode also removes an all-too-common point of failure: the network filesystem itself. Jobs can continue running uninterrupted even in the face of network outtages. It also is simply more efficient to use local scratch space than for all jobs on a large grid to be writing continuously to a small number of disk servers.

How to Enable Sandbox Mode

The specific settings that need to be changed on the MOP Master are likely to change. Currently, this is what you need to do:

  1. Insert into site.vars: NO_SHARED_FS='Y'
  2. Change MOP_REMOTE_DAR_ROOT to point to cms_dar.tgz (ie install this file on the head node and point this variable to it).
  3. In site.site: append /_insert_dir_ to the setting for remote_initialdir.
  4. Also insert the following line into site.site:
    globusrsl = (condor_submit=(transfer_files ALWAYS)(transfer_input_files input_files.tar,cms_dar.tgz))
    
  5. You must be using Condor version 6.8 or later on all machines in the pool. Previous Condor versions had a bug that caused the job to die after installing the CMS software in the sandbox.

The tarball cms_dar.tgz can be created from the CMS DAR distribution using the following steps:

  1. Get the DAR file.
  2. Unpack all of the packages (but do not run convert_env).
  3. Tar and gzip it up.
  4. You may also need to create a symlink named 6.158 pointing to cmsim/cms125.