Most Grid'5000 jobs start with the same tasks: deploy the nodes, copy the SSH keys, and check that the nodes are working properly. Those tasks are often performed manually, which causes two problems :
- loss of time. It takes about 10 mins at the beginning of each job to do that manually ;
- it is hard to start jobs in an non-interactive way (using OAR submissions).
Katapult is a simple script allowing to automatically deploy nodes, check that they work properly, and then run a specified script on them.
It can be used in two ways :
- To prepare one's nodes at the beginning of the job automatically :
oarsub -t deploy -l nodes=20,walltime=3 -r '2006-12-02 09:00:00' './katapult --deploy-env sid-x64-base-1.0 --copy-ssh-key --sleep'
Katapult deploys the nodes, and copies the SSH key. Then, it runs a sleep(). The user can then read Katapult's output (OAR*.stdout) to learn in which temporary file are stored the "correct" nodes, and execute his know script or interactive commands using this information.
- To automate totally an experiment using deployment:
oarsub -t deploy -l nodes=1,walltime=1 './katapult --min-nodes 2 --deploy-env sid-x64-base-1.0 --copy-ssh-key /home/grenoble/lnussbaum/scripttorunwithgoodnodes param1 param2'
The script is ran by Katapult, with the environment variables $GOOD_NODES et $BAD_NODES defined and pointing to files containing the lists of nodes. After running this script, the job ends.
Currently, oargrid doesn't support to pass parameters to the command it executes (FIXME: maybe it's fixed in oargrid2). This makes executing katapult from oargrid difficult. The best solution is probably to use another script (without parameters) to call katapult.
The canonical version of katapult is in the grid5000-code SVN repository, and is available in $PATH on every Grid'5000 frontend.