Q&A 14 How do you compare alpha diversity across 3 or more groups?

14.1 Explanation

When comparing microbial richness across more than two groups, you can use: - ANOVA: if data are normally distributed - Kruskal-Wallis test: non-parametric alternative

This Q&A tests whether OTU richness differs by body location (e.g., gut, skin).

14.2 Python Code

import pandas as pd
from scipy.stats import f_oneway, kruskal

# Load data
otu_df = pd.read_csv("data/otu_table_filtered.tsv", sep="\t", index_col=0)
meta_df = pd.read_csv("data/sample_metadata.tsv", sep="\t")

# Compute richness
richness = pd.DataFrame({
    "sample_id": otu_df.columns,
    "richness": (otu_df > 0).sum(axis=0).values
})
data = pd.merge(richness, meta_df, on="sample_id")

# Split richness by location
groups = [group["richness"].values for name, group in data.groupby("location")]

# ANOVA
f_stat, f_pval = f_oneway(*groups)

# Kruskal-Wallis
kw_stat, kw_pval = kruskal(*groups)

print(f"ANOVA p-value: {f_pval:.4f}")
print(f"Kruskal-Wallis p-value: {kw_pval:.4f}")

14.3 R Code

library(tidyverse)

otu_df <- read.delim("data/otu_table_filtered.tsv", row.names = 1)
meta_df <- read.delim("data/sample_metadata.tsv")

# Compute richness
richness <- colSums(otu_df > 0)
richness_df <- data.frame(sample_id = names(richness), richness = richness)
merged <- left_join(richness_df, meta_df, by = "sample_id")

# ANOVA
anova_res <- aov(richness ~ location, data = merged)

# Kruskal-Wallis
kruskal_res <- kruskal.test(richness ~ location, data = merged)

summary(anova_res)
            Df Sum Sq Mean Sq F value Pr(>F)
location     1    2.5     2.5   0.676  0.435
Residuals    8   29.6     3.7               
kruskal_res

    Kruskal-Wallis rank sum test

data:  richness by location
Kruskal-Wallis chi-squared = 0.71553, df = 1, p-value = 0.3976