Q&A 6 How do you filter out low-abundance or low-prevalence OTUs?

6.1 Explanation

OTU tables are often sparse, with many OTUs occurring in only a few samples or at very low abundances. These low-abundance and low-prevalence OTUs can introduce noise, inflate diversity metrics, and complicate downstream analysis.

Filtering such OTUs is a critical EDA step before diversity analysis or visualization. Common criteria include: - Prevalence: Removing OTUs that appear in fewer than X samples - Abundance: Removing OTUs with total counts below a threshold

This step helps reduce dimensionality and improves interpretability.

6.2 Python Code

import pandas as pd

# Load OTU table
otu_df = pd.read_csv("data/otu_table.tsv", sep="\t", index_col=0)

# Filter: keep OTUs present in at least 3 samples
otu_filtered = otu_df[(otu_df > 0).sum(axis=1) >= 3]

# Further filter: keep OTUs with total count ≥ 10
otu_filtered = otu_filtered[otu_filtered.sum(axis=1) >= 10]

# Save filtered table
otu_filtered.to_csv("data/otu_table_filtered.tsv", sep="\t")

6.3 R Code

otu_df <- read.delim("data/otu_table.tsv", row.names = 1)

# Filter OTUs with prevalence ≥ 3 samples
keep_rows <- rowSums(otu_df > 0) >= 3
otu_df <- otu_df[keep_rows, ]

# Further filter by total abundance ≥ 10
keep_abundant <- rowSums(otu_df) >= 10
otu_df_filtered <- otu_df[keep_abundant, ]

# Write filtered table
write.table(otu_df_filtered, file = "data/otu_table_filtered.tsv", sep = "\t")