Diversity Analysis

Diversity analysis allows us to look at the unique players within a sample and compare them between samples. This section covers the two main branches of diversity analysis (alpha and beta).

Alpha diversity: within-sample diversity

The landlocked city

Alpha diversity is like looking at the diversity of last names within our city. We might be interested in how many unique last names there are and how many people have these last names compared to each other. We can also take the alpha diversity of one city and compare it to another to see if they have similar diversity of last names in the city, though we cannot tell if the have the actual same family groups in the city.

Alpha diversity describes the diversity of a single sample, how many different taxa are present, and how evenly distributed the reads are among them. Different alpha diversity metrics capture different aspects of the data.

Metric	What it captures	Notes
Observed features	Raw count of unique ASVs/OTUs	Sensitive to sequencing depth; should be compared only at equal rarefaction depth
Shannon index	Both richness and evenness	Penalizes communities dominated by one taxon
Simpson	Evenness-weighted diversity	Less sensitive to rare taxa than Shannon
Faith’s PD	Phylogenetic diversity, total branch length of the community’s phylogenetic tree	Accounts for how evolutionarily distinct the taxa are
Chao1	Estimated true richness, accounting for undetected rare taxa	Useful when comparing samples at different depths

Beta diversity: between-sample diversity

Comparing the landlocked city and the beach front city

Using beta diversity allows us to get one number that tells us how similar two cities are in who makes up the cities. Some of them consider only the different family names that are in the city, some of them actually consider the family names as well as how many people had those family names. Some might even consider whether the family names have the same origin, grouping German last names together, Japanese last names together, and so on.

We get one score for each two city comparison, with scores closer to one indicating that the people that make up the cities are more similar and closer to zero indicating that the people who make up the cities are very different.

Beta diversity describes how different community composition is between samples. It requires a distance (or dissimilarity) metric, a number between 0 (identical communities) and some maximum (completely different communities).

Metric	What it measures	Considers
Bray-Curtis	Shared taxa and their abundances	Abundances; ignores phylogeny
Jaccard	Shared presence/absence	Presence/absence only
UniFrac (unweighted)	Phylogenetic distance, presence/absence	Phylogeny; ignores abundances
UniFrac (weighted)	Phylogenetic distance with abundance weighting	Phylogeny + abundances
Aitchison	Euclidean distance in CLR space	Compositionally aware

Choice of distance metric can impact the results, and different metrics emphasize different aspects of community structure. Often, people run multiple metrics for the same data, as they describe slightly different results.

Ordination: visualizing beta diversity

With a distance matrix in hand, ordination techniques reduce the high-dimensional distance information to 2 or 3 dimensions for visualization. This is important because as more samples are added the number of beta diversity values goes up quickly and makes it hard, if not impossible, for us to look at the matrix and understand what is going on under the hood.

Principal Coordinates Analysis (PCoA) is the most common approach. This produces a scatter plot where samples that are more similar in community composition cluster together. The axes represent the major sources of variation in the distance matrix. The first axis explains the most variance, and so on.

Code and tool examples

For more information about using phyloseq and vegan in R, look here: phyloseq documentation

library(vegan)
library(phyloseq)

# Compute Bray-Curtis distance matrix
dist_bc <- distance(physeq, method = "bray")

# Ordinate (PCoA)
ord <- ordinate(physeq, method = "PCoA", distance = dist_bc)

# Plot, colored by metadata variable
plot_ordination(physeq, ord, color = "treatment_group") +
  theme_bw()

# Alpha diversity: compute and compare Shannon index across groups
alpha_div <- estimate_richness(physeq, measures = c("Shannon", "Observed", "Chao1"))
kruskal.test(Shannon ~ treatment_group, data = cbind(alpha_div, sample_data(physeq)))

QIIME2

For more information about diversity analysis in QIIME2, look here: documentation

# Run core diversity metrics (alpha + beta) at a chosen rarefaction depth
# Requires a rooted phylogenetic tree for Faith's PD and UniFrac metrics
qiime diversity core-metrics-phylogenetic \
  --i-phylogeny rooted-tree.qza \
  --i-table table.qza \
  --p-sampling-depth 10000 \
  --m-metadata-file metadata.tsv \
  --output-dir core-metrics-results

# Visualize PCoA interactively with Emperor
qiime emperor plot \
  --i-pcoa core-metrics-results/bray_curtis_pcoa_results.qza \
  --m-metadata-file metadata.tsv \
  --o-visualization bray-curtis-emperor.qzv