Proteome Comparison Service

Overview

The Proteome Comparison Service performs protein sequence-based genome comparison using bidirectional BLASTP. This service allows users to select up to eight genomes, either public or private, and compare them to a user-selected or supplied reference genome. The proteome comparison result is displayed as an interactive circular genome view and is downloadable as a print-quality image or tabular comparison results.

Using the Proteome Comparison Service

The Proteome Comparison submenu option under the Services main menu (Protein Tools category) opens the Proteome Comparison input form (shown below). Note: You must be logged into PATRIC to use this service.

Proteome Comparison Service Menu

Options

Proteome Comparison Service Input Form

Parameters

Advanced parameters:

Minimum % coverage: Minimum percent sequence coverage of query and subject in blast. Use up or down arrows to change the value. The default value is 30%.

BLAST E value: Maximum BLAST E value. The default value is 1e-5.

Minimum % Identity: Minimum percent sequence identity of query and subject in BLAST. Use up or down arrows to change the value. The default value is 10%.

Output Folder

The workspace folder where results will be placed.

Output Name

Name used to uniquely identify results.

Reference Genome Selection

Select a reference genome from the genome list or a FASTA file or a feature group. Only one reference is allowed.

Select a genome

Type or select a genome name from the genome list.

Or a FASTA file

Select or upload an external genome file in protein FASTA format.

Or a feature group

Select a feature group from the workspace to show comparison of specific proteins instead of all proteins in a genome.

Comparison Genomes Selection

Select up to total of 9 genomes from the genome list or FASTA files or a feature groups and use the plus buttons to place the genomes to the table .

Select genome

Type or select a genome name from the genome list.

And/or select FASTA file

Select or upload an external genome file in protein FASTA format.

And/or select feature group

Select a feature group from the workspace.

Buttons

Reset: Resets the input form to default values

Submit: Launches the proteome comparison job. Upon completion, the result is displayed as an interactive circular genome view and is downloadable as a print-quality image or tabular comparison results.

Output Results

Proteome Comparison Service Results Page

The Proteome Comparison Service generates several files that are deposited in the Private Workspace in the designated Output Folder. These include

  • circos.svg - a Scalable Vector Graphics (SVG) image showing the proteome comparison in a cicular view.

  • circos_final.html - a webpage displaying the SVG file of the proteome comparison result.

  • comp_genome_X.txt - the list of features in the comparison genome. The service generates a comp_genome_X.txt file for each comparison genome (where X is the genome number).

  • genome_comparison.xls - an Excel file of the comparison results containing the best BLAST hits for each compared genome to the reference genome. The columns in the table are as follows (for each genome):

    • _contig - accession number for contig in reference genome

    • _gene - order number for gene in the genome

    • _aa_length - size in amino acids

    • _patric_id - PATRIC locus tag

    • _locus_tag - RefSeq locus tag

    • _gene_name - gene name

    • _plfam_id - PATRIC local protein family

    • _pgfam_id - PATRIC global protein family

    • _function - functional annotation

    • _start - start location of the gene on the contig

    • _end - end location of the gene on the contig

    • _strand - strand that gene is located on

    • _hit - type of BLAST hit: bi-directional, uni-directional, or missing (for comparison genomes only)

  • genome_comparison.json - JSON format of the comparison results

  • genome_comparison.txt - text file format of the comparison results

  • karyotype.txt - the karyotype file of the Circos output that defines genome contig id, size and color.

  • large.tiles.txt - the large tile file of the Circos output that provides the tile track information

  • legend.txt - the color legend for the percent protein sequence identity of the bidirectional and unidirectional best hits.

  • ref_genome.txt - the list of features in the reference genome.

Action buttons

After selecting one of the output files by clicking it, a set of options becomes available in the vertical green Action Bar on the right side of the table, listed below. To find out more about the different options for using these output files, see the Variation Analysis Service Tutorial.

  • Hide/Show: Toggles (hides) the right-hand side Details Pane.

  • Guide: Link to the corresponding User Guide

  • Download: Downloads the selected item.

  • View: Displays the content of the file, typically as plain text or rendered html, depending on filetype. Only available for html, txt, and tsv file types.

  • Delete: Deletes the file.

  • Rename: Allows renaming of the file.

  • Copy: Copies the selected items to the clipboard.

  • Move: Allows moving of the file to another folder.

  • Edit Type: Allows changing of the type of the file in terms of how PATRIC interprets the content and uses it in other services or parts of the website. Allowable types include unspecified, contigs, nwk, reads, differential expression input data, and differential expression input metadata.

More details are available in the Action Buttons user guide.

Proteome Comparison Viewer

Proteome Comparison Service View Icon

Clicking on the View icon at the upper right portion of the job result page will display an interactive circular viewer of of the comparison results, with color-coding for protein percent identity relative to the best hit on the reference genome. Mousing over a feature (gene) will display its PATRIC locus tag, and clicking on it will display the PATRIC feature page for that gene.

Proteome Comparison Interactive Viewer

References

  1. Overbeek, R., et al., The SEED and the Rapid Annota on of microbial genomes using Subsystems Technology (RAST). Nucleic acids research, 2014. 42(D1): p. D206‑D214.