Q&A 12 How do you statistically compare OTU richness between groups?

12.1 Explanation

Once you’ve calculated alpha diversity (e.g., observed OTUs per sample), it’s common to test whether groups differ significantly. This can help you determine if an experimental condition affects microbial richness.

Typical tests include: - T-test (for two groups, normally distributed data) - Wilcoxon rank-sum test (non-parametric) - ANOVA or Kruskal-Wallis (for 3+ groups)

Here we demonstrate group comparison for richness using appropriate statistical tests.

12.2 Python Code

import pandas as pd
from scipy.stats import ttest_ind, mannwhitneyu

# Load richness and metadata
otu_df = pd.read_csv("data/otu_table_filtered.tsv", sep="\t", index_col=0)
meta_df = pd.read_csv("data/sample_metadata.tsv", sep="\t")

# Compute richness
richness = pd.DataFrame({
    "sample_id": otu_df.columns,
    "richness": (otu_df > 0).sum(axis=0).values
})
data = pd.merge(richness, meta_df, on="sample_id")

# Split by group
control = data[data["group"] == "Control"]["richness"]
treatment = data[data["group"] == "Treatment"]["richness"]

# T-test (assumes normality)
t_stat, t_pval = ttest_ind(control, treatment)

# Wilcoxon (non-parametric)
w_stat, w_pval = mannwhitneyu(control, treatment)

print(f"T-test p-value: {t_pval:.4f}")
print(f"Wilcoxon test p-value: {w_pval:.4f}")

12.3 R Code

library(tidyverse)

otu_df <- read.delim("data/otu_table_filtered.tsv", row.names = 1)
meta_df <- read.delim("data/sample_metadata.tsv")

# Calculate richness
richness <- colSums(otu_df > 0)
richness_df <- data.frame(sample_id = names(richness), richness = richness)
merged <- left_join(richness_df, meta_df, by = "sample_id")

# T-test
t_test <- t.test(richness ~ group, data = merged)

# Wilcoxon test
wilcox_test <- wilcox.test(richness ~ group, data = merged)

t_test$p.value
[1] 0.2666018
wilcox_test$p.value
[1] 0.2447901