PATRIC May 2014 Release: New Easily Accessible and Manually Curated Specialty Genes Data

Published on 2014-05-23 00:00:00


Incorporation of curated Specialty Genes Data:

Specialty Genes refers to genes that are of particular interest to infectious disease researchers.  Specialty gene classes currently incorporated in PATRIC are:

  • Antibiotic Resistance: the ability of bacteria to develop resistance to antibiotics through gene mutation or acquisition of antibiotic resistance genes.

  • Drug Targets: the proteins being targeted by known/approved/experimental small molecule drugs.

  • Human Homologs: the bacterial proteins that share high sequence similarity with human proteins.

  • Virulence Factors: the gene products that enable bacteria to establish itself on or within a host organism and enhance its potential to cause disease.

For each specialty genes class, PATRIC:

  • Collects reference gene sets from popular and community recognized external databases, many of which are manually curated based on literature.

  • Creates our own database of manually curated reference gene sets based on literature, as needed, to provide more accurate and comprehensive information for NIAID select pathogens.

  • As part of genome annotation, maps reference genes to their homologs based on high sequence similarity using BLASTP, and thus, providing consistent annotation of specialty genes across all bacterial genomes.

  • Makes specialty genes more accessible and usable by providing specialized analysis and search tools at PATRIC to allow researchers to quickly select potential targets for the development of drugs, vaccines, and therapeutics.

Current Number and Source of Specialty Genes Incorporated in PATRIC

Specialty Genes Class

Source

Genes

Antibiotic Resistance

ARDB

91068

Antibiotic Resistance

CARD

244359

Drug Target

DrugBank

1200659

Drug Target

TTD

275374

Human Homolog

Human

631343

Virulence Factor

PATRIC_VF

894122

Virulence Factor

VFDB

737069

Virulence Factor

Victors

1298446

View more details about our data sources and mapping processes in Specialty Genes FAQs.


*New Taxon and Genome-Specific Specialty Gene Lists*

Accessed via the Specialty Genes Tab on any taxon or genome overview page, these taxon and genome-specific tables provide the following:

  • Information about PATRIC genes, such as Genome Name, PATRIC and RefSeq Locus Tags, Gene Names, and Products.

    • Information about the matching specialty gene in the reference database, such as Property, Source Database Name, Source ID, Classification, and PubMed references. Source IDs are linked to the corresponding pages on the Source Database websites where you can access more information.  PubMed links take you to the corresponding references listed at PubMed.

    • Summaries of sequence similarity from BLASTP hit, such as Percent Query Coverage, Percent Subject Coverage, and Percent Identity.

_*Note_: Genes designated as “Literature” in the Evidence column have been experimentally verified, while those designated as “BLASTP” are identified based on sequence homology.  Learn more in Specialty Gene FAQs.

Gene Lists provide in-depth filtering options of the data based on Property, Source, Evidence, and BLAST Hits parameters.  Access our Specialty Genes List for All Bacteria in PATRIC.


*New Antibiotic Resistance and Specialty Genes Data Summary Pages:*

View summaries of selected genomes, related tools and tutorials, and diagrams of how we curate, map, and integrate Specialty Genes on Specialty Genes Data Summary and Antibiotic Resistance Data Summary pages.

For the Antibiotic Resistance Data Summary, we plan to also incorporate associated, available metadata such as antibiotic susceptibility testing results in the future.


*New Antibiotic Resistance and Specialty Genes Search Tools:*

Search for all classes of Specialty Genes for organism(s) of interest based on taxonomy, special property class, and keyword using Specialty Genes Search.

Specifically search for antibiotic resistance genes based on taxonomy, source, and keywords using Antibiotic Resistance Search.

_*Note_: Search results are displayed in a Specialty Genes List, described above.


*Extensive Manual Curation of Virulence Factors by the PATRIC Team:*

PATRIC_VF is a manually curated virulence factor database, which contains the genes identified as playing a role in virulence in certain organisms.  Each PATRIC_VF gene is linked to one or more journal articles that establish its virulence based on experimental evidence.  The associated assertion sentence(s) from the journal article are included.

Current Number of Manually Curated PATRIC_VF by Genus

Genus

Genes

Mycobacterium

700

Salmonella

751

Escherichia

278

Listeria

263

Shigella

127

Bartonella

34

Access and filter the entire PATRIC_VF database directly.

How Our Curators Identify the PATRIC_VF Genes:

PubMed searches using the genus name and the term “virulence” are used as a first pass to identify genes that are associated with the virulence of the organism.  Papers are examined, and if they provide direct evidence of the gene’s importance in virulence, the PubMed ID and the gene, genome, and host names are collected from the article as well as sentences that identify the role the gene has in virulence.  In addition, curators assign a “Virulence Factor Category” from an internally derived nomenclature.

Once this information is collected, a search of the PATRIC database is initiated to find the genome that matches the one described in the article.  If the same genome is located, there is a search for the gene described in the paper.  If both match, there is a direct link between the published source and the gene.  If the gene cannot be found, it’s not assigned to PATRIC_VF.  If the genome is not found, a search is initiated for the same gene in a different genome.  If this is identified, an indirect link is established between the gene and the published article.  This is indicated by the source genome and the PATRIC genome having different strain names.