Q&A 6 How do you filter out low-abundance or low-prevalence OTUs?
6.1 Explanation
OTU tables are often sparse, with many OTUs occurring in only a few samples or at very low abundances. These low-abundance and low-prevalence OTUs can introduce noise, inflate diversity metrics, and complicate downstream analysis.
Filtering such OTUs is a critical EDA step before diversity analysis or visualization. Common criteria include: - Prevalence: Removing OTUs that appear in fewer than X samples - Abundance: Removing OTUs with total counts below a threshold
This step helps reduce dimensionality and improves interpretability.
6.2 Python Code
import pandas as pd
# Load OTU table
otu_df = pd.read_csv("data/otu_table.tsv", sep="\t", index_col=0)
# Filter: keep OTUs present in at least 3 samples
otu_filtered = otu_df[(otu_df > 0).sum(axis=1) >= 3]
# Further filter: keep OTUs with total count ≥ 10
otu_filtered = otu_filtered[otu_filtered.sum(axis=1) >= 10]
# Save filtered table
otu_filtered.to_csv("data/otu_table_filtered.tsv", sep="\t")
6.3 R Code
otu_df <- read.delim("data/otu_table.tsv", row.names = 1)
# Filter OTUs with prevalence ≥ 3 samples
keep_rows <- rowSums(otu_df > 0) >= 3
otu_df <- otu_df[keep_rows, ]
# Further filter by total abundance ≥ 10
keep_abundant <- rowSums(otu_df) >= 10
otu_df_filtered <- otu_df[keep_abundant, ]
# Write filtered table
write.table(otu_df_filtered, file = "data/otu_table_filtered.tsv", sep = "\t")