PATRIC March 2015 Release Features Genome Assembly and Annotation Services and over 6,000 New Genomes

Published on 2015-03-27 00:00:00

This March Release begins the first stage of integration of new and updated analysis services in PATRIC, including addition of a new Genome Assembly Service, Genome Annotation Service, Differential Expression Service and enhanced Workspace to facilitate use of these services.

NOTE: You must have a PATRIC user account and be logged in to use the services and workspace.

Genome Assembly Service:

The new Genome Assembly Service can be used to perform an automated genome assembly using the latest computational tools. Single or multiple assemblers can be invoked to compare results. The assembly service attempts to select the best assembly, i.e. assembly with the smallest number of contigs and the longest average contig length. Several assembly workflows or “recipes” are available. These workflows have been tuned and tested to fit certain data types or desired analysis criteria such as throughput or rigor. The assembly service’s flexible nature also enables the rapid design and emulation of other popular protocols. The PATRIC Genome Assembly Service can be accessed at https://www.patricbrc.org/app/Assembly.

Genome Annotation Service:

The new Genome Annotation Service utilizes the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. The original RAST (Rapid Annotation using Subsystem Technology) annotation engine was built to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. RASTtk extends RAST by offering a choice of tools for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. The batch submission interface will be made available in PATRIC in future release. The PATRIC Genome Annotation Service can be accessed at https://www.patricbrc.org/app/Assembly.

Differential Expression Service:

As part of the workspace upgrade, we have migrated the functionality to upload and transform expression data to a service, named “Differential Expression Service”. This service allows users to upload differential expression data into their private workspace and compare it with other expression data available in PATRIC. The service supports gene expression, protein expression, and phenotype array data in the form of log ratios, generated by comparing samples, conditions, or time points. The PATRIC Differential Expression Service can be accessed at https://www.patricbrc.org/app/Expression.

Enhanced Workspace:

We have implemented an enhanced workspace in PATRIC to support the new genome assembly and annotation services. The workspace now allows users to upload their data files (e.g., sequence reads, assembled genomes, etc.), run the desired analysis services, store the resulting output files, and download the results. It also provides basic tools for organizing and managing files in the workspace. The Workspace can be accessed at https://www.patricbrc.org/login. An overview of the workspace is available at Workspace User Guide.

Antimicrobial Resistance and other Clinical Metadata:

Included in this release are 3085 Streptococcus genomes that we collected from the Sanger Welcome Trust. These genomes were part of a study from the Maela refugee camp in Thailand that analyzed the specific SNP patterns associated with B-lactam resistance (PMC4125147, PMC3970364). This data set is particularly special because the genomes are split roughly 50/50 B-lactam resistant vs. B-lactam susceptible so it is the largest data set that we have for studying different machine learning algorithms for AMR classification.

The release also includes curated AMR metadata for 595 genomes from the TB-ARC MRC SA Initiative, Broad Institute (broadinstitute.org). For this metadata, we assigned ‘Susceptible,’ ‘Resistant,’ or ‘Intermediate’ values to the Antimicrobial Resistance field based on values obtained from AMR panel data or phenotype information, as indicated Antimicrobial Resistance Evidence field. The AMR panel data and additional metadata are available for download from the PATRIC FTP site.

Changes to the FTP site:

We have completed reorganization and updating of the PATRIC FTP site. It now contains downloadable updated FTP files for all genomes and genome annotations.

New Genomes and Annotations:

In this release, at total of 6178 new bacterial genomes have been added to PATRIC. Out of these, 3039 genomes were collected from NCBI GenBank. 3085 Streptococcus genomes were collected from the Sanger Welcome Trust, 110 Mycobacterium tuberculosis genomes were gathered from NCBI SRA database, and additional 12 genomes were provided by PATRIC users/collaborators, which are not publicly available elsewhere. This release also includes 38 additional archaea genomes.