Q&A 13 How do you test for correlation between alpha diversity and age?
13.1 Explanation
In many studies, you may want to examine whether microbial diversity is associated with continuous metadata like age, BMI, or pH.
Correlation tests help assess linear or monotonic relationships between variables: - Pearson correlation: for linear relationships (assumes normality) - Spearman correlation: for monotonic (rank-based) associations (non-parametric)
This Q&A demonstrates testing correlation between richness and age.
13.2 Python Code
import pandas as pd
from scipy.stats import pearsonr, spearmanr
# Load OTU and metadata
otu_df = pd.read_csv("data/otu_table_filtered.tsv", sep="\t", index_col=0)
meta_df = pd.read_csv("data/sample_metadata.tsv", sep="\t")
# Compute richness
richness = pd.DataFrame({
"sample_id": otu_df.columns,
"richness": (otu_df > 0).sum(axis=0).values
})
data = pd.merge(richness, meta_df, on="sample_id")
# Pearson correlation
pearson_corr, pearson_pval = pearsonr(data["richness"], data["age"])
# Spearman correlation
spearman_corr, spearman_pval = spearmanr(data["richness"], data["age"])
print(f"Pearson r: {pearson_corr:.3f}, p = {pearson_pval:.4f}")
print(f"Spearman rho: {spearman_corr:.3f}, p = {spearman_pval:.4f}")
13.3 R Code
library(tidyverse)
otu_df <- read.delim("data/otu_table_filtered.tsv", row.names = 1)
meta_df <- read.delim("data/sample_metadata.tsv")
# Compute richness
richness <- colSums(otu_df > 0)
richness_df <- data.frame(sample_id = names(richness), richness = richness)
merged <- left_join(richness_df, meta_df, by = "sample_id")
# Pearson correlation
cor.test(merged$richness, merged$age, method = "pearson")
Pearson's product-moment correlation
data: merged$richness and merged$age
t = -0.099992, df = 8, p-value = 0.9228
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.6504867 0.6078167
sample estimates:
cor
-0.03533044
Spearman's rank correlation rho
data: merged$richness and merged$age
S = 179.17, p-value = 0.8135
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.08589604