Dan Bradley, University of Wisconsin
This is not a full guide on the subject of Globus and AFS. It is simply a short recipe used to get a Grid3 site working with AFS as the sole shared filesystem.
Access to AFS can be authorized via an AFS token or by client machine IP. By default, jobs that come in through the Globus gatekeeper, do not have an AFS token, and even if they did, most batch systems do not propagate tokens from the head node to the worker node.
The easy solution is therefore to simply create an AFS group containing all of the machines in your cluster. You then grant this group full access to the home directories of grid users and any other directories they will use. This means that any process running on any of the machines in this group will have access to the grid directories. Obviously, this is not ideal, but it's the first step in this recipe, so bail out now if you don't like it!
Here's an example of how you might set up the AFS group:
pts createuser 126.96.36.199 pts creategroup grid3 system:administrators pts adduser 188.8.131.52 grid3 fs setacl -dir /afs/uw-grid.wisc.edu/grid3 -acl grid3 rlidkw
In the above example, all machines with an IP address beginning with 129.104.28 are added as members of a group named "grid3". Note that when you create a new group with host IP members, the AFS file servers do not immediately know about it. You either need to wait for a while (hours?) or restart the appropriate file server(s).
The home directories of the grid users should also be given the same access settings. All directories where grid jobs need to write should be given 0777 permissions or you may run into subtle problems with processes that look at ownership of directories and incorrectly conclude that they are not writable. For the same reason, do not use sticky bits unless you have full creation and forwarding of AFS tokens in the batch system.
Note that the GASS Cache and other job files which are located in
~user/.globus should probably not be placed in
AFS, though it is possible to keep them there. If the user home
directory is in AFS, you can symlink
.globus to a
different (possibly local) filesystem and then enable Condor's
file-transfer mechanisms to shuffle files back and forth between the
execution machine and the GASS cache. Read about that here. This takes care of the
standard input/output, but it does not solve the general problem of
all file I/O that the job may need to do, so that is why you still may
want to use AFS for providing access to other files.
Now jobs that get spawned by the gatekeeper will have access to the
files they need to read and write. However, the ownership of these
files will show up as the anonymous user, since access to AFS is being
authenticated by host IP alone. Some Globus components, such as
globus-url-copy and GRAM file staging will refuse to operate, because
the user's grid-proxy, which is stored in
~/.globus/.gass_cache, will not appear to be owned by the
correct user id.
A program exists called gssklog. It takes a grid proxy, authenticates with a gssklogd server, and obtains AFS tokens. You can tell globus-gatekeeper to run this before executing the jobmanager. The result is that files such as the grid proxy file will be created with appropriate ownership.
In addition, fork jobs (which always run directly on the head node) will have real credentials, so if you know that certain directory trees will be modified via fork jobs alone, you can turn off write access on the host ACL to those areas. If your batch system supports propagation of AFS tokens, then you would be able to turn off the host ACL altogether.
You may need to modify the following specific instructions to suit your environment. You may even be able to find a ready-made binary RPM for your system.
Obtain the latest source. I used gssklog-0.10.
Untar, configure, and make:
./configure \ --with-ccopts=-DDEFAULT_PORT=751 \ --with-gss-lib-dir=/afs/hep.wisc.edu/cms/sw/vdt_1.1.11/globus/lib \ --with-static=gsi22 \ --with-flavor=gcc32dbg \ --enable-server \ --with-afs=/tmp/openafs-devel/usr
The source for gssklog contains a patch for the gatekeeper so that gssklog can be installed in globus/libexec and called automatically. However, this is not required. A different method of invoking gssklog will be described here.
You need to install gssklogd on one or more of your AFS servers.
Since it uses GSI to authenticate requests, you will need to set up
/etc/grid-security on the server. You need
grid-security/hostcert.pem. In addition to the hostcert,
you need a "gssklog" service cert. If you use the DOE CA, follow the
instructions for Grid3: iVDGL RA.
/etc/grid-security, you also need to create
afsgrid-mapfile. This maps grid subject names to AFS user names.
Unless you are doing something strange, you can just copy the
grid-mapfile from your gatekeeper and use it unmodified as the
Start gssklogd as root. You can test it with something like this:
unlog grid-proxy-init gssklog \ -principal uscms01 \ -cell hep.wisc.edu \ -server rosemary.hep.wisc.edu \ -lifetime 48 tokens
If all is well, then you can go ahead and configure the gatekeeper to use gssklog.
On the gatekeeper machine, create
/etc/grid-security/globus-kmapfile. The contents should
be of this form:
"/O=doesciencegrid.org/OU=People/CN=Dan Bradley 268953" /vdt/globus/libexec/gssklog -principal uscms01 -cell hep.wisc.edu -server rosemary.hep.wisc.edu -lifetime 99 -setpag
You can produce that from the grid-mapfile with a command such as this:
GLOBUS_LOCATION=/vdt/globus AFS_CELL=hep.wisc.edu AFS_SERVER="-server rosemary.hep.wisc.edu" LIFETIME=99 sed 's|\(".*"\) \(.*\)$|\1 '"$GLOBUS_LOCATION/libexec/gssklog -principal \2 -cell $AFS_CELL $AFS_SERVER -lifetime $LIFETIME -setpag |g" \ < /etc/grid-security/grid-mapfile \ > /etc/grid-security/globus-kmapfile
Now, make sure gssklog exists wherever the kmapfile is expecting it
and add the following line to
-k -globuskmap /etc/grid-security/globus-kmapfile
Now you should be able to verify that fork jobs have tokens and that files are created in ~/.globus/.gass_cache with the correct user id.
globus-job-run garlic.hep.wisc.edu/jobmanager-fork `which tokens`