PATRIC December 2015 Release Features over 20,000 New Genomes and PacBio Assembly Support¶

Published on 2015-12-22 00:00:00

New Genomes and Annotation System Enhancements

In this release, we have added 23,052 new genomes from NCBI GenBank, bringing the total number of genomes in PATRIC to over 54,000. The full list of available bacterial genomes can be accessed from the Genomes Tab for all bacteria, and from the Genomes Data Landing Page.

As part of this Data Release, we have implemented an enhanced annotation system and process that allows us to annotate and release new genomes continuously—in a near real-time mode. With this enhancement, we will be able to maintain currency with other public genome sources such as NCBI, as well as quickly incorporate new genomes from other sources.

Other recent enhancements to the annotation system include support for the following key features:

Public/private flag for any genome submission
Efficient submission of large batch of genomes
Submission of GenBank files
Parsing of minimum metadata, i.e., genome name, taxon id, genetic code, directly from GenBank files
Parsing features from original GenBank files and preserving them as ID synonyms
Carrying forward additional “misc” features such as repeats, binding sites, miscellaneous RNAs, etc. annotated in the GenBank file to PATRIC annotations
Assigning functions based on the latest build of k-mers from Core-SEED
Assigning new protein families—PLFams and PGfams
Computation of MLSTs
Automated parsing of genome metadata from NCBI BioProject and BioSample records
Automated parsing of antibiogram metadata from NCBI BioSample records
Loading antibiogram metadata into the corresponding new PATRIC Solr database core
Automated incorporation of strain name into genome name, if not already present
Automated incrementing of genome counts for corresponding taxonomy nodes (for public genomes)

Assembly Service Enhancements

Also in this release, we have enabled experimental support for assembly of genomes sequenced using PacBio technology and Oxford Nanopore technologies. PATRIC users can now take advantage of the long reads generated on these platforms and assemble them into longer contigs, and in some cases complete genomes. Both raw bax.h5 read files and filtered FASTQ files from PacBio and FASTQ from Oxford Nanopore are accepted as input. The PATRIC Genome Assembly Service can be accessed at https://www.patricbrc.org/app/Assembly. NOTE: You must have a PATRIC user account and be logged in to use the services and workspace.