Q&A 14 How do you compare alpha diversity across 3 or more groups?
14.1 Explanation
When comparing microbial richness across more than two groups, you can use: - ANOVA: if data are normally distributed - Kruskal-Wallis test: non-parametric alternative
This Q&A tests whether OTU richness differs by body location (e.g., gut, skin).
14.2 Python Code
import pandas as pd
from scipy.stats import f_oneway, kruskal
# Load data
otu_df = pd.read_csv("data/otu_table_filtered.tsv", sep="\t", index_col=0)
meta_df = pd.read_csv("data/sample_metadata.tsv", sep="\t")
# Compute richness
richness = pd.DataFrame({
"sample_id": otu_df.columns,
"richness": (otu_df > 0).sum(axis=0).values
})
data = pd.merge(richness, meta_df, on="sample_id")
# Split richness by location
groups = [group["richness"].values for name, group in data.groupby("location")]
# ANOVA
f_stat, f_pval = f_oneway(*groups)
# Kruskal-Wallis
kw_stat, kw_pval = kruskal(*groups)
print(f"ANOVA p-value: {f_pval:.4f}")
print(f"Kruskal-Wallis p-value: {kw_pval:.4f}")
14.3 R Code
library(tidyverse)
otu_df <- read.delim("data/otu_table_filtered.tsv", row.names = 1)
meta_df <- read.delim("data/sample_metadata.tsv")
# Compute richness
richness <- colSums(otu_df > 0)
richness_df <- data.frame(sample_id = names(richness), richness = richness)
merged <- left_join(richness_df, meta_df, by = "sample_id")
# ANOVA
anova_res <- aov(richness ~ location, data = merged)
# Kruskal-Wallis
kruskal_res <- kruskal.test(richness ~ location, data = merged)
summary(anova_res)
Df Sum Sq Mean Sq F value Pr(>F)
location 1 2.5 2.5 0.676 0.435
Residuals 8 29.6 3.7
Kruskal-Wallis rank sum test
data: richness by location
Kruskal-Wallis chi-squared = 0.71553, df = 1, p-value = 0.3976