Developing PATRIC Code at Argonne

This is a quick guide for PATRIC developers working on the internal systems at Argonne National Laboratory.

Access

One must have a Unix login account at Argonne to be able to develop in the PATRIC environment. Contact one of the PATRIC staff to get this if you believe you should be working there.

To get to the internal systems where the PATRIC environment runs one must first log in to the bastion host login.mcs.anl.gov. From there the internal machines are visible. A useful machine to use is holly.mcs.anl.gov.

Environment

The PATRIC code deployments are in two parts. A runtime library incorporates all the basic software required to run the PATRIC-specific code (language interpreters and associated runtime libraries; bioinformatic analysis tools; libraries and development tools that do not come with the base OS; etc).

The deployment contains all the PATRIC-specific code that typically changes much more often and is part of the PATRIC release cycle.

On the Argonne machines, the runtime library is typically located on local disk on the machines that run PATRIC components in the path /disks/patric-common/runtime.

On the Argonne systems that have PATRIC deployments installed, the standard deployment location is /disks/p3/deployment. The software deployment infrastructure creates user environment configuration scripts to set up the environment for the execution of the PATRIC scripts and the associated runtime routines. These may be sourced in your environment via the command:

source /disks/p3/deployment/user-env.sh

For any application that touches the workspace you will need to be logged into PATRIC. Use the p3-login command to log in:

$ p3-login olson
Password: *********
Logged in with username olson@patricbrc.org

This will authenticate you with the PATRIC authentication service and create a file in your home directory .patric_token containing the authentication token.

Checking Application Output

If you know the identifier for a given PATRIC job, you can find the standard output and error files, its exit status if complete, and the host it ran on by viewing the application service output logs. These are found on the system beech.mcs.anl.gov in the directory /disks/p3/task_status/<job-id>. For example, if I have a job 388b0885-3357-402c-aad3-01e4717c640c I can find the details thus:

$ cd /disks/p3/task_status/388b0885-3357-402c-aad3-01e4717c640c
$ ls -l
total 28
-rw-rw-r--. 1 p3 p3    2 Aug 17 04:02 exitcode
-rw-rw-r--. 1 p3 p3   19 Aug 16 23:52 hostname
-rw-rw-r--. 1 p3 p3    5 Aug 16 23:52 pid
-rw-rw-r--. 1 p3 p3 6258 Aug 17 04:02 stderr
-rw-rw-r--. 1 p3 p3    0 Aug 17 04:02 stderr.EOF
-rw-rw-r--. 1 p3 p3 6366 Aug 16 23:53 stdout
-rw-rw-r--. 1 p3 p3    0 Aug 17 04:02 stdout.EOF
$ cat exitcode
0
$ cat hostname
redwood.mcs.anl.gov

Here I can see the job exited sucessfully (exit code 0) and ran on host redwood.mcs.anl.gov. The files stdout and stderr contain logs of the standard output and error streams from the application. These will be updated as the application runs, so one can check the status of a long-running application by viewing them.

Testing Existing Applications

One of the common uses for using raw access to the Argonne systems is the testing of service backends without going through the job submission infrastructure. To do this one must understand a little about how the application service works.

Each application is defined by an application specification document. This document specifies the inputs that the service expectes. For example, the phylogenetic tree application assumes the following configuration:

{
  "script": "App-PhylogeneticTree",
  "label": "Compute phylogenetic tree",
  "id": "PhylogeneticTree",
  "description": "Computes a phylogenetic tree given a set of in-group and out-group genomes",
  "parameters": [
    {
      "required": 1,
      "desc": "Path to which the output will be written. ",
      "type": "folder",
      "default": null,
      "label": "Output Folder",
      "id": "output_path"
    },
    {
      "required": 1,
      "desc": "Basename for the generated output files.",
      "type": "wsid",
      "default": null,
      "label": "File Basename",
      "id": "output_file"
    },
    {
      "id": "in_genome_ids",
      "label": "In-group genomes",
      "allow_multiple": true,
      "required": 1,
      "type": "list",
      "default": []
    },
    {
      "id": "out_genome_ids",
      "label": "Out-group genomes",
      "allow_multiple": true,
      "required": 1,
      "type": "list",
      "default": []
    },
    {
      "id": "full_tree_method",
      "required": 0,
      "default": "ml",
      "label": "Full tree method",
      "desc": "Full tree method",
      "type": "string"
    },
    {
      "id": "refinement",
      "required": 0,
      "default": "yes",
      "label": "Automated progressive refinement",
      "desc": "Automated progressive refinement",
      "type": "string"
    }
  ]
}

The application specifications may be found in the app_service repository on GitHub.

Each application service is implemented by a program named App-ApplicationName. Thus the phylogenetic tree application is called App-PhylogeneticTree. Sources for the applications are also found in the app_service repository on GitHub.

All of the application scripts accept the same parameters, described by its usage statement:

$ App-PhylogeneticTree -h
 Usage: /disks/p3/deployment/plbin/App-PhylogeneticTree.pl app-service-url app-definition.json param-values.json [stdout-file stderr-file]

The app-definition.json parameter is the application specification document mentioned above. The param-values.json parameter is another JSON file that defines the actual values of the parameters as defined in the specification document.

An example of a parameters file for the phylogenetic tree application is the following:

$ cat tree.in`
{
   "in_genome_ids": [
       "66976.18",
       "1262772.3",
       "1262773.3"
   ],
   "out_genome_ids": [
       "66976.17"
   ],
   "output_path": "/olson@patricbrc.org/test",
   "output_file": "tree-15",
   "full_tree_method": "ft",
   "refinement": "no"
}

Here, we request a phylogentic tree with three in-group genomes and one out-group genome, with the output to be written to the folder /olson@patricbrc.org/test in the PATRIC workspace with the output name to be tree-15. The full tree method request is FastTree, and no refinement is requested.

We may run this application as follows. We give the application script a bogus first parameter; in production execution that is a URL that will result in the standard output and error streams to be fed in realtime to the application service where it is logged and available for display in the PATRIC website.

$ App-PhylogeneticTree xx /disks/p3/deployment/services/app_service/app_specs/PhylogeneticTree.json tree.in
Process tree $VAR1 = {
          'parameters' => [
                            {
                              'id' => 'output_path',
                              'type' => 'folder',
                              'desc' => 'Path to which the output will
                          be written. ',
                              'default' => undef,
                              'required' => 1,
                              'label' => 'Output Folder'
                            },
[....]

We see the execution beginning here. There is a fairly large amount of debugging output from both the application service infrastructure as well as the tools invoked by the application service infrastructure to accomplish the computation desired.