Globus and AFS Recipe

Dan Bradley, University of Wisconsin
October, 2003

This is not a full guide on the subject of Globus and AFS. It is simply a short recipe used to get a Grid3 site working with AFS as the sole shared filesystem.

The Problem

Access to AFS can be authorized via an AFS token or by client machine IP. By default, jobs that come in through the Globus gatekeeper, do not have an AFS token, and even if they did, most batch systems do not propagate tokens from the head node to the worker node.

The easy solution is therefore to simply create an AFS group containing all of the machines in your cluster. You then grant this group full access to the home directories of grid users and any other directories they will use. This means that any process running on any of the machines in this group will have access to the grid directories. Obviously, this is not ideal, but it's the first step in this recipe, so bail out now if you don't like it!

Here's an example of how you might set up the AFS group:

pts createuser 129.104.28.0
pts creategroup grid3 system:administrators
pts adduser 128.104.28.0 grid3
fs setacl -dir /afs/uw-grid.wisc.edu/grid3 -acl grid3 rlidkw

In the above example, all machines with an IP address beginning with 129.104.28 are added as members of a group named "grid3". Note that when you create a new group with host IP members, the AFS file servers do not immediately know about it. You either need to wait for a while (hours?) or restart the appropriate file server(s).

The home directories of the grid users should also be given the same access settings. All directories where grid jobs need to write should be given 0777 permissions or you may run into subtle problems with processes that look at ownership of directories and incorrectly conclude that they are not writable. For the same reason, do not use sticky bits unless you have full creation and forwarding of AFS tokens in the batch system.

Note that the GASS Cache and other job files which are located in ~user/.globus should probably not be placed in AFS, though it is possible to keep them there. If the user home directory is in AFS, you can symlink .globus to a different (possibly local) filesystem and then enable Condor's file-transfer mechanisms to shuffle files back and forth between the execution machine and the GASS cache. Read about that here. This takes care of the standard input/output, but it does not solve the general problem of all file I/O that the job may need to do, so that is why you still may want to use AFS for providing access to other files.

Now jobs that get spawned by the gatekeeper will have access to the files they need to read and write. However, the ownership of these files will show up as the anonymous user, since access to AFS is being authenticated by host IP alone. Some Globus components, such as globus-url-copy and GRAM file staging will refuse to operate, because the user's grid-proxy, which is stored in ~/.globus/.gass_cache, will not appear to be owned by the correct user id.

The Solution

A program exists called gssklog. It takes a grid proxy, authenticates with a gssklogd server, and obtains AFS tokens. You can tell globus-gatekeeper to run this before executing the jobmanager. The result is that files such as the grid proxy file will be created with appropriate ownership.

In addition, fork jobs (which always run directly on the head node) will have real credentials, so if you know that certain directory trees will be modified via fork jobs alone, you can turn off write access on the host ACL to those areas. If your batch system supports propagation of AFS tokens, then you would be able to turn off the host ACL altogether.

Installing gssklog

You may need to modify the following specific instructions to suit your environment. You may even be able to find a ready-made binary RPM for your system.

Obtain the latest source. I used gssklog-0.10.

Untar, configure, and make:

./configure \
--with-ccopts=-DDEFAULT_PORT=751 \
--with-gss-lib-dir=/afs/hep.wisc.edu/cms/sw/vdt_1.1.11/globus/lib \
--with-static=gsi22 \
--with-flavor=gcc32dbg \
--enable-server \
--with-afs=/tmp/openafs-devel/usr

The source for gssklog contains a patch for the gatekeeper so that gssklog can be installed in globus/libexec and called automatically. However, this is not required. A different method of invoking gssklog will be described here.

You need to install gssklogd on one or more of your AFS servers. Since it uses GSI to authenticate requests, you will need to set up /etc/grid-security on the server. You need grid-security/certificates, grid-security/hostkey.pem, and grid-security/hostcert.pem. In addition to the hostcert, you need a "gssklog" service cert. If you use the DOE CA, follow the instructions for Grid3: iVDGL RA.

In /etc/grid-security, you also need to create afsgrid-mapfile. This maps grid subject names to AFS user names. Unless you are doing something strange, you can just copy the grid-mapfile from your gatekeeper and use it unmodified as the afsgrid-mapfile.

Start gssklogd as root. You can test it with something like this:

unlog
grid-proxy-init
gssklog \
  -principal uscms01 \
  -cell hep.wisc.edu \
  -server rosemary.hep.wisc.edu \
  -lifetime 48
tokens

If all is well, then you can go ahead and configure the gatekeeper to use gssklog.

On the gatekeeper machine, create /etc/grid-security/globus-kmapfile. The contents should be of this form:

"/O=doesciencegrid.org/OU=People/CN=Dan Bradley 268953" /vdt/globus/libexec/gssklog -principal uscms01 -cell hep.wisc.edu -server rosemary.hep.wisc.edu -lifetime 99 -setpag

You can produce that from the grid-mapfile with a command such as this:

GLOBUS_LOCATION=/vdt/globus
AFS_CELL=hep.wisc.edu
AFS_SERVER="-server rosemary.hep.wisc.edu"
LIFETIME=99

sed 's|\(".*"\) \(.*\)$|\1 '"$GLOBUS_LOCATION/libexec/gssklog -principal \2 -cell $AFS_CELL $AFS_SERVER -lifetime $LIFETIME -setpag |g" \
< /etc/grid-security/grid-mapfile \
> /etc/grid-security/globus-kmapfile

Now, make sure gssklog exists wherever the kmapfile is expecting it and add the following line to globus/etc/globus-gatekeeper.conf:

-k -globuskmap /etc/grid-security/globus-kmapfile

Now you should be able to verify that fork jobs have tokens and that files are created in ~/.globus/.gass_cache with the correct user id.

globus-job-run garlic.hep.wisc.edu/jobmanager-fork `which tokens`

Additional References

Globus and AFS, Francesco Prelz, 2001