Taxonomic Classification Service

Overview

The Taxonomic Classification Service accepts reads or contigs from sequencing of a metagenomic sample and uses Kraken 2 to assign the reads to taxonomic bins, providing an initial profile of the possible constituent organisms present in the sample.

See also

Using the Taxonomic Classification Service

The Taxonomic Classification submenu option under the Services main menu (Metagenomics category) opens the Taxonomic Classification input form (shown below). Note: You must be logged into PATRIC to use this service.

Taxonomic Classification Menu

Options

Taxonomic Classification Input Form

Start With

The service can accept either read files or assembled contigs. If the “Read File” option is selected, the form will provide controls to allow input of read files or SRA accession numbers. If the “Assembled Contigs” option is selected, the form will change to allow input of a contig file.

Input File

Depending on the option chosen above (Read File or Assembled Contigs), the Input File section will request read files or assembled contigs, respectively.

Paired read library

Read File 1 & 2: Many paired read libraries are given as file pairs, with each file containing half of each read pair. Paired read files are expected to be sorted such that each read in a pair occurs in the same Nth position as its mate in their respective files. These files are specified as READ FILE 1 and READ FILE 2. For a given file pair, the selection of which file is READ 1 and which is READ 2 does not matter.

Single read library

Read File: The fastq file containing the reads.

SRA run accession

Allows direct upload of read files from the NCBI Sequence Read Archive to the PATRIC Assembly Service. Entering the SRR accession number and clicking the arrow will add the file to the selected libraries box for use in the assembly.

Selected libraries

Read files placed here will contribute to a single assembly.

Parameters

Algorithm

Database

Reference taxonomic database used by the algorithm.

  • All genomes - Standard Kraken 2 database containing distinct 31-mers, based on completed microbial genomes from NCBI.

  • RDP (SSU rRNA) - The Ribosomal Database Project (RDP), a naïve Bayesian-based classification for bacterial 16S rRNA sequences.

  • SILVA (SSU rRNA) - Comprehensive database of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services.

Output Folder

The workspace folder where results will be placed.

Output Name

Name used to uniquely identify results.

Output Results

Taxonomic Classification Output Files

The Taxonomic Classification Service generates several files that are deposited in the Private Workspace in the designated Output Folder. To reivThese include

  • TaxonomicReport.html - A web-browser-friendly report that summarizes the results of the service (see description and image below)

  • chart.html - Link to Krona-based interactive chart showing the taxonomic classification distribution (see image below)

  • classified_1.fastq.gz - reads that were classified by Kraken 2 (only if Save Classified Sequences option is chosen)

  • classified_2.fastq.gz - reads that were classified by Kraken 2 (only if Save Classified Sequences option is chosen)

  • full_report.txt - Full Kraken 2 report; includes zero counts (see Kraken 2 Output Formats)

  • output.txt.gz - Per-read Kraken 2 output file

  • report.txt - Kraken 2 report; suppresses zero counts

  • unclassified_1.fastq.gz - reads that were not classified by Kraken 2 (only if Save Unclassified Sequences option is chosen)

  • unclassified_2.fastq.gz - reads that were not classified by Kraken 2 (only if Save Unclassified Sequences option is chosen)

Taxonomic Report Output

Kraken 2 Taxonomic Classification Report

This page is a web-friendly report that summarizes the output of Kraken 2. It provides a link to the input data, an interactive chart view (see description below), and a summary table of the top hits. The columns in the table are as follows:

  • Pct Coverage - Percentage of fragments covered by the clade rooted at this taxon

  • Frags in Clade - Number of fragments covered by the clade rooted at this taxon

  • Frags in Taxon - Number of fragments assigned directly to this taxon

  • Rank - A rank code, indicating (U)nclassified, (R)oot, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. Taxa that are not at any of these 10 ranks have a rank code that is formed by using the rank code of the closest ancestor rank with a number indicating the distance from that rank. E.g., “G2” is a rank code indicating a taxon is between genus and species and the grandparent taxon is at the genus rank.

  • NCBI Taxon ID - NCBI taxonomic ID number

  • Scientific Name - Indented scientific name. Clicking on one of these names will display the corresponding taxon page in the PATRIC website.

Taxonomic Chart

Krona-based interactive Taxonomic Classification Chart

This interactive chart provides a visual representation of the reads mapping to each taxon. Clicking on a taxon within the pie chart will provide a summary of the reads mapping to that specific selection on the upper right corner.

Action buttons

After selecting one of the output files by clicking it, a set of options becomes available in the vertical green Action Bar on the right side of the table. These include

  • Hide/Show: Toggles (hides) the right-hand side Details Pane.

  • Guide Link to the corresponding User Guide

  • Download: Downloads the selected item.

  • View Displays the content of the file, typically as plain text or rendered html, depending on filetype.

  • Delete Deletes the file.

  • Rename Allows renaming of the file.

  • Copy: Copies the selected items to the clipboard.

  • Move Allows moving of the file to another folder.

More details are available in the Action Buttons user guide.

References

  • Ondov BD, Bergman NH, and Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30; 12(1):385.

  • Maidak, Bonnie L., et al. “The RDP (ribosomal database project).” Nucleic acids research 25.1 (1997): 109-110.

  • Wood DE, Salzberg SL: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 2014, 15:R46.

  • Yilmaz P, Parfrey LW, Yarza P, Gerken J, Pruesse E, Quast C, Schweer T, Peplies J, Ludwig W, Glöckner FO. The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks. Nucleic Acids Res. 2014; 42(Database issue):643–8.