Diversity Analysis
Diversity analysis allows us to look at the unique players within a sample and compare them between samples. This section covers the two main branches of diversity analysis (alpha and beta).
Alpha diversity: within-sample diversity
Alpha diversity is like looking at the diversity of last names within our city. We might be interested in how many unique last names there are and how many people have these last names compared to each other. We can also take the alpha diversity of one city and compare it to another to see if they have similar diversity of last names in the city, though we cannot tell if the have the actual same family groups in the city.
Alpha diversity describes the diversity of a single sample, how many different taxa are present, and how evenly distributed the reads are among them. Different alpha diversity metrics capture different aspects of the data.
| Metric | What it captures | Notes |
|---|---|---|
| Observed features | Raw count of unique ASVs/OTUs | Sensitive to sequencing depth; should be compared only at equal rarefaction depth |
| Shannon index | Both richness and evenness | Penalizes communities dominated by one taxon |
| Simpson | Evenness-weighted diversity | Less sensitive to rare taxa than Shannon |
| Faith’s PD | Phylogenetic diversity, total branch length of the community’s phylogenetic tree | Accounts for how evolutionarily distinct the taxa are |
| Chao1 | Estimated true richness, accounting for undetected rare taxa | Useful when comparing samples at different depths |
Beta diversity: between-sample diversity
Using beta diversity allows us to get one number that tells us how similar two cities are in who makes up the cities. Some of them consider only the different family names that are in the city, some of them actually consider the family names as well as how many people had those family names. Some might even consider whether the family names have the same origin, grouping German last names together, Japanese last names together, and so on.
We get one score for each two city comparison, with scores closer to one indicating that the people that make up the cities are more similar and closer to zero indicating that the people who make up the cities are very different.
Beta diversity describes how different community composition is between samples. It requires a distance (or dissimilarity) metric, a number between 0 (identical communities) and some maximum (completely different communities).
| Metric | What it measures | Considers |
|---|---|---|
| Bray-Curtis | Shared taxa and their abundances | Abundances; ignores phylogeny |
| Jaccard | Shared presence/absence | Presence/absence only |
| UniFrac (unweighted) | Phylogenetic distance, presence/absence | Phylogeny; ignores abundances |
| UniFrac (weighted) | Phylogenetic distance with abundance weighting | Phylogeny + abundances |
| Aitchison | Euclidean distance in CLR space | Compositionally aware |
Choice of distance metric can impact the results, and different metrics emphasize different aspects of community structure. Often, people run multiple metrics for the same data, as they describe slightly different results.
Ordination: visualizing beta diversity
With a distance matrix in hand, ordination techniques reduce the high-dimensional distance information to 2 or 3 dimensions for visualization. This is important because as more samples are added the number of beta diversity values goes up quickly and makes it hard, if not impossible, for us to look at the matrix and understand what is going on under the hood.
Principal Coordinates Analysis (PCoA) is the most common approach. This produces a scatter plot where samples that are more similar in community composition cluster together. The axes represent the major sources of variation in the distance matrix. The first axis explains the most variance, and so on.
R
For more information about using phyloseq and vegan in R, look here: phyloseq documentation
library(vegan)
library(phyloseq)
# Compute Bray-Curtis distance matrix
dist_bc <- distance(physeq, method = "bray")
# Ordinate (PCoA)
ord <- ordinate(physeq, method = "PCoA", distance = dist_bc)
# Plot, colored by metadata variable
plot_ordination(physeq, ord, color = "treatment_group") +
theme_bw()
# Alpha diversity: compute and compare Shannon index across groups
alpha_div <- estimate_richness(physeq, measures = c("Shannon", "Observed", "Chao1"))
kruskal.test(Shannon ~ treatment_group, data = cbind(alpha_div, sample_data(physeq)))QIIME2
For more information about diversity analysis in QIIME2, look here: documentation
# Run core diversity metrics (alpha + beta) at a chosen rarefaction depth
# Requires a rooted phylogenetic tree for Faith's PD and UniFrac metrics
qiime diversity core-metrics-phylogenetic \
--i-phylogeny rooted-tree.qza \
--i-table table.qza \
--p-sampling-depth 10000 \
--m-metadata-file metadata.tsv \
--output-dir core-metrics-results
# Visualize PCoA interactively with Emperor
qiime emperor plot \
--i-pcoa core-metrics-results/bray_curtis_pcoa_results.qza \
--m-metadata-file metadata.tsv \
--o-visualization bray-curtis-emperor.qzv