Q&A 13 How do you test for correlation between alpha diversity and age?

13.1 Explanation

In many studies, you may want to examine whether microbial diversity is associated with continuous metadata like age, BMI, or pH.

Correlation tests help assess linear or monotonic relationships between variables: - Pearson correlation: for linear relationships (assumes normality) - Spearman correlation: for monotonic (rank-based) associations (non-parametric)

This Q&A demonstrates testing correlation between richness and age.

13.2 Python Code

import pandas as pd
from scipy.stats import pearsonr, spearmanr

# Load OTU and metadata
otu_df = pd.read_csv("data/otu_table_filtered.tsv", sep="\t", index_col=0)
meta_df = pd.read_csv("data/sample_metadata.tsv", sep="\t")

# Compute richness
richness = pd.DataFrame({
    "sample_id": otu_df.columns,
    "richness": (otu_df > 0).sum(axis=0).values
})
data = pd.merge(richness, meta_df, on="sample_id")

# Pearson correlation
pearson_corr, pearson_pval = pearsonr(data["richness"], data["age"])

# Spearman correlation
spearman_corr, spearman_pval = spearmanr(data["richness"], data["age"])

print(f"Pearson r: {pearson_corr:.3f}, p = {pearson_pval:.4f}")
print(f"Spearman rho: {spearman_corr:.3f}, p = {spearman_pval:.4f}")

13.3 R Code

library(tidyverse)

otu_df <- read.delim("data/otu_table_filtered.tsv", row.names = 1)
meta_df <- read.delim("data/sample_metadata.tsv")

# Compute richness
richness <- colSums(otu_df > 0)
richness_df <- data.frame(sample_id = names(richness), richness = richness)
merged <- left_join(richness_df, meta_df, by = "sample_id")

# Pearson correlation
cor.test(merged$richness, merged$age, method = "pearson")

    Pearson's product-moment correlation

data:  merged$richness and merged$age
t = -0.099992, df = 8, p-value = 0.9228
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.6504867  0.6078167
sample estimates:
        cor 
-0.03533044 
# Spearman correlation
cor.test(merged$richness, merged$age, method = "spearman")

    Spearman's rank correlation rho

data:  merged$richness and merged$age
S = 179.17, p-value = 0.8135
alternative hypothesis: true rho is not equal to 0
sample estimates:
        rho 
-0.08589604