Database¶
Data Types¶
See Data Section for list and description of data types supported.
Database Schema¶
The primary data store for PATRIC is a distributed SolrCloud database. The SolrCloud configuration allows “collections” to be defined using xml schema files. These collection definitions, combined with the data indexes are deployed on one or more SolrCloud hosts. Multiple machines can utilize the same set of collections and/or indexes to address scalability, high availability and higher throughput and query performance by allowing more concurrent queries.
Below is a list of the collections and the main fields/attributes available in each collection. Refer to https://github.com/PATRIC3/patric_solr_cloud for the most updated configurations and schemas.
Solr Cores/Collections:
antibiotics
enzyme_class_ref
feature_sequence
gene_ontology_ref
genome
genome_amr
genome_feature
genome_sequence
id_ref
misc_niaid_sgc
model_complex_role
model_compound
model_reaction
model_template_biomass
model_template_reaction
pathway
pathway_ref
ppi
protein_family_ref
sp_gene
sp_gene_evidence
sp_gene_ref
structured_assertion
subsystem
subsystem_ref
taxonomy
transcriptomics_experiment
transcriptomics_gene
transcriptomics_sample
Data Dictionaries¶
PATRIC uses several data dictionaries to support controlled vocabularies for certain biological entities and annotations, which provide consistent naming across data from heterogenous sources for more efficient search and query.
The following Collections store such data dictionaries and related information.
enzyme_class_ref: Information about EC numbers, enzyme names, and their hirarchical classification.
gene_ontology_ref: Information about Gene Ontology terms, their description, and their hirarchical classification.
id_ref: External database identified references, obtained from UniProt ID Mapping service.
model_complex_role: Relationships between molecular complexes and their functinal roles. Part of the Biochemistry Database from ModelSEED.
model_compound: Information about compounds associated with metabolic pathways. Part of the Biochemistry Database from ModelSEED.
model_reaction: Information about reactions involved in metabolic pathways. Part of the Biochemistry Database from ModelSEED.
model_template_biomass: Model template biomass. Part of the Biochemistry database from ModelSEED and used for metabolic modeling and FBA.
model_template_reaction: Model template reactions. Part of the Biochemistry database from ModelSEED and used for metabolic modeling and FBA.
pathway_ref: Relationship between EC numbers and metabolic pathways and their location on the pathway maps from KEGG.
protein_family_ref: Information about the PATRIC Global and Local Protein Families and their functauional roles.
sp_gene_ref: Specialty gene refernece datasets, collected and curated from external sources as described in the Data Section.
subsystem_ref: Information about the Subsystems, their classification, and corresponding functinal roles.
taxonomy: Information about taxoinomnic classification from NCBI Taxonomy database.