Diversity Analysis

Diversity analysis allows us to look at the unique players within a sample and compare them between samples. This section covers the two main branches of diversity analysis (alpha and beta).

Alpha diversity: within-sample diversity

NoteThe landlocked city

Alpha diversity is like looking at the diversity of last names within our city. We might be interested in how many unique last names there are and how many people have these last names compared to each other. We can also take the alpha diversity of one city and compare it to another to see if they have similar diversity of last names in the city, though we cannot tell if the have the actual same family groups in the city.

Alpha diversity describes the diversity of a single sample, how many different taxa are present, and how evenly distributed the reads are among them. Different alpha diversity metrics capture different aspects of the data.

Metric What it captures Notes
Observed features Raw count of unique ASVs/OTUs Sensitive to sequencing depth; should be compared only at equal rarefaction depth
Shannon index Both richness and evenness Penalizes communities dominated by one taxon
Simpson Evenness-weighted diversity Less sensitive to rare taxa than Shannon
Faith’s PD Phylogenetic diversity, total branch length of the community’s phylogenetic tree Accounts for how evolutionarily distinct the taxa are
Chao1 Estimated true richness, accounting for undetected rare taxa Useful when comparing samples at different depths

Beta diversity: between-sample diversity

NoteComparing the landlocked city and the beach front city

Using beta diversity allows us to get one number that tells us how similar two cities are in who makes up the cities. Some of them consider only the different family names that are in the city, some of them actually consider the family names as well as how many people had those family names. Some might even consider whether the family names have the same origin, grouping German last names together, Japanese last names together, and so on.

We get one score for each two city comparison, with scores closer to one indicating that the people that make up the cities are more similar and closer to zero indicating that the people who make up the cities are very different.

Beta diversity describes how different community composition is between samples. It requires a distance (or dissimilarity) metric, a number between 0 (identical communities) and some maximum (completely different communities).

Metric What it measures Considers
Bray-Curtis Shared taxa and their abundances Abundances; ignores phylogeny
Jaccard Shared presence/absence Presence/absence only
UniFrac (unweighted) Phylogenetic distance, presence/absence Phylogeny; ignores abundances
UniFrac (weighted) Phylogenetic distance with abundance weighting Phylogeny + abundances
Aitchison Euclidean distance in CLR space Compositionally aware

Choice of distance metric can impact the results, and different metrics emphasize different aspects of community structure. Often, people run multiple metrics for the same data, as they describe slightly different results.

Ordination: visualizing beta diversity

With a distance matrix in hand, ordination techniques reduce the high-dimensional distance information to 2 or 3 dimensions for visualization. This is important because as more samples are added the number of beta diversity values goes up quickly and makes it hard, if not impossible, for us to look at the matrix and understand what is going on under the hood.

Principal Coordinates Analysis (PCoA) is the most common approach. This produces a scatter plot where samples that are more similar in community composition cluster together. The axes represent the major sources of variation in the distance matrix. The first axis explains the most variance, and so on.

R

For more information about using phyloseq and vegan in R, look here: phyloseq documentation

library(vegan)
library(phyloseq)

# Compute Bray-Curtis distance matrix
dist_bc <- distance(physeq, method = "bray")

# Ordinate (PCoA)
ord <- ordinate(physeq, method = "PCoA", distance = dist_bc)

# Plot, colored by metadata variable
plot_ordination(physeq, ord, color = "treatment_group") +
  theme_bw()

# Alpha diversity: compute and compare Shannon index across groups
alpha_div <- estimate_richness(physeq, measures = c("Shannon", "Observed", "Chao1"))
kruskal.test(Shannon ~ treatment_group, data = cbind(alpha_div, sample_data(physeq)))

QIIME2

For more information about diversity analysis in QIIME2, look here: documentation

# Run core diversity metrics (alpha + beta) at a chosen rarefaction depth
# Requires a rooted phylogenetic tree for Faith's PD and UniFrac metrics
qiime diversity core-metrics-phylogenetic \
  --i-phylogeny rooted-tree.qza \
  --i-table table.qza \
  --p-sampling-depth 10000 \
  --m-metadata-file metadata.tsv \
  --output-dir core-metrics-results

# Visualize PCoA interactively with Emperor
qiime emperor plot \
  --i-pcoa core-metrics-results/bray_curtis_pcoa_results.qza \
  --m-metadata-file metadata.tsv \
  --o-visualization bray-curtis-emperor.qzv