Q&A 5 How do you explore and summarize a microbiome OTU table?
5.1 Explanation
After generating an OTU (or feature) table from raw sequencing data, itβs essential to inspect and summarize it before moving into alpha or beta diversity analysis.
The OTU table is typically a matrix of samples Γ features (ASVs/OTUs), where each cell contains the abundance count of a feature in a sample.
Key summary steps include: - Calculating sample richness (how many OTUs each sample contains) - Measuring OTU prevalence (in how many samples each OTU occurs) - Assessing abundance distribution (e.g., sparse vs dominant OTUs) - Identifying sparse or noisy features that may need filtering
5.2 Python Code
import pandas as pd
# Load OTU table (OTUs as rows, samples as columns)
otu_df = pd.read_csv("data/otu_table.tsv", sep="\t", index_col=0)
# Number of OTUs per sample (richness)
sample_richness = (otu_df > 0).sum(axis=0)
# Number of samples per OTU (prevalence)
otu_prevalence = (otu_df > 0).sum(axis=1)
# Distribution of total counts per OTU
otu_abundance_summary = otu_df.sum(axis=1).describe()
print("Sample Richness:", sample_richness.head())
print("OTU Prevalence:", otu_prevalence.head())
print("Abundance Summary:", otu_abundance_summary)
5.3 R Code
otu_df <- read.delim("data/otu_table.tsv", row.names = 1)
# Sample richness: number of OTUs per sample
colSums(otu_df > 0)
Sample_1 Sample_2 Sample_3 Sample_4 Sample_5 Sample_6 Sample_7 Sample_8
44 44 43 40 40 42 45 43
Sample_9 Sample_10
45 41
OTU_1 OTU_2 OTU_3 OTU_4 OTU_5 OTU_6 OTU_7 OTU_8 OTU_9 OTU_10 OTU_11
10 7 9 9 8 8 7 8 9 8 10
OTU_12 OTU_13 OTU_14 OTU_15 OTU_16 OTU_17 OTU_18 OTU_19 OTU_20 OTU_21 OTU_22
9 10 9 9 7 9 10 9 10 7 8
OTU_23 OTU_24 OTU_25 OTU_26 OTU_27 OTU_28 OTU_29 OTU_30 OTU_31 OTU_32 OTU_33
8 9 8 9 9 10 8 8 7 9 9
OTU_34 OTU_35 OTU_36 OTU_37 OTU_38 OTU_39 OTU_40 OTU_41 OTU_42 OTU_43 OTU_44
8 10 10 9 10 7 8 8 9 9 8
OTU_45 OTU_46 OTU_47 OTU_48 OTU_49 OTU_50
8 8 8 7 7 9
Min. 1st Qu. Median Mean 3rd Qu. Max.
36.0 41.0 48.0 47.4 52.0 64.0