Taxonomy Assignment
Once ASVs are identified, each sequence is compared against a reference database of known 16S rRNA sequences to predict a taxonomic label. This step moves you from a table of anonymous sequences to one where each row has a distinct name. With 16S rRNA sequencing data, the classifier can generally give a good prediction to about the genus level for most sequences.
Taxonomic classification
Taxonomy is a hierarchical system:
Domain → Phylum → Class → Order → Family → Genus → Species
For example: Bacteria → Firmicutes → Bacilli → Lactobacillales → Lactobacillaceae → Lactobacillus → L. acidophilus
QIIME2 classifies sequences using a trained Naïve Bayes classifier against a reference database, usually SILVA or Greengenes2. The classifier compares each ASV to reference sequences and returns the deepest taxonomic level at which it can make a confident assignment, along with a probability score.
Key caveats:
- Species-level calls are unreliable. Genus is typically the deepest reliable level for most taxa.
- “Unclassified” is a real result. ASVs that don’t match any reference sequence may represent genuinely novel or understudied taxa.
- Database choice matters. SILVA, Greengenes2, and NCBI vary in their coverage and update frequency. Use the database that is most appropriate for your sample type and most current.
Reference databases
The reference database you choose affects which taxa you can name and at what resolution.
| Database | Notes |
|---|---|
| Silva | Most comprehensive; updated regularly; used for bacteria, archaea, and some eukaryotes |
| Greengenes2 | Updated 2022; improved integration with Earth Microbiome Project data |
| NCBI RefSeq | Broad coverage; useful for linking to NCBI records |
| UNITE | Fungi-specific; required if your primers target ITS instead of 16S |
R
For more information about taxonomy assignment with DADA2 in R, look here: DADA2 documentation.
# Assign taxonomy (against Silva)
taxa <- assignTaxonomy(seqtab_nochim, "silva_nr_v138_train_set.fa.gz")QIIME2
For more information about taxonomy classification in QIIME2, look here: documentation. Pre-trained classifiers for common primer sets and databases are available on the QIIME2 data resources page.
# Classify representative sequences using a pre-trained Naive Bayes classifier
qiime feature-classifier classify-sklearn \
--i-classifier silva-138-99-515-806-nb-classifier.qza \
--i-reads rep-seqs.qza \
--o-classification taxonomy.qza
# Visualize taxonomy as an interactive stacked bar chart
qiime taxa barplot \
--i-table table.qza \
--i-taxonomy taxonomy.qza \
--m-metadata-file metadata.tsv \
--o-visualization taxa-bar-plots.qzv