Monday 19 September 2011

Three Little Words

There are these three little words. For some people, these words bring feelings of fulfilment and contentment. For others, they bring nothing but frustration.

Those three little words are:

  Proof of Concept

For that part of the e-Research Community interested in how research will be done in future, A proof of concept is evidence that it is possible to do something new and interesting, using something new and interesting. It might change the way research is done next decade. It is more than enough for a published paper and a presentation at All Hands.

And it is of bog-all use to those for whom e-Research is simply a means to an end. They just want something that works now and works reliably.

There is always a gap between the potentially useful and the actually useful. When you can build something that bridges that gap, you can enable research that would not otherwise be done.

Which brings me to slightly embarrassing news that our project to deploy the ARC middleware in front of the local High Performance Computing service has been a complete success... as a proof of concept.

We have shown that it is possible to deploy ARC services in front of what we should now be calling Oracle Grid Engine.

With some inventive use of ssh copies in prolog and epilog scripts --- that this can be made to work even where there is no file-space shared between the grid 'front end' and the HPC cluster.

We also know that you can support parallel tasks  using ARCs Runtime Environment mechanism --- there are examples at the bottom of the (slightly out of date) Nordugrid documentation --- and make use of to the LCAS/LCMAPS authentication system used by other grid software.

Which is nice....

Whether it is going to be useful is a completely different question.  We do not yet know if the local communities who are best placed to use it --- the rather incongruous pairing of Solar Physics and Social Science --- will want to do so.

Epilogue: Prologs and Epilogs


A quick technical note on faking a shared directory via Grid Engine prolog and epilog scripts.

The scripts run just before the start and just after the end of every job.

ARC-the-middleware obligingly changes directory to the 'shared' scratch directory before submitting the job. This mean that prolog and epilog scripts are presented with the path to this directory in the $SGE_O_WORKDIR environment variable.

The recipe is along the lines of...

  • Create a ssh keypair for each user - to be used solely for transfers from HPC backend to grid front end
  • Copy the private key to a safe place on the HPC back end, readable only by the user. We will call this $GRID_KEYS.
  • Use the public key to create a per-user authorized_key file on the grid front end in somewhere like
       /etc/ssh/authorized_keys.d/$USER
    and change the /etc/ssh/sshd_config (again on the grid-front-end) to set.
        AuthorizedKeysFile  /etc/ssh/authorized_keys.d/%u
  • Add code to prolog and epilog to use scp (or rdist) with the -i $GRID_KEYS/$USER to pull files from $SGE_O_WORKDIR at the beginning of the job and push them back at the end.




No comments: