Q&A 24 How do you use mikropml in R for microbiome machine learning?

24.1 Explanation

mikropml is a microbiome-focused R package by Pat Schloss designed for: - End-to-end modeling workflows - Built-in cross-validation and hyperparameter tuning - Transparency in model reporting and evaluation

It simplifies the process of building, tuning, and interpreting microbiome ML models.

This Q&A introduces a basic pipeline using mikropml and prepares the OTU and metadata files as expected.

24.2 R Code

# 📦 Ensure mikropml is installed
if (!requireNamespace("mikropml", quietly = TRUE)) {
  if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")
  remotes::install_github("SchlossLab/mikropml")
}

library(mikropml)
library(tidyverse)

# Load OTU table and metadata
otu <- read.delim("data/otu_table_filtered.tsv", row.names = 1)
meta <- read.delim("data/sample_metadata.tsv")

# Transpose OTU so samples are rows
otu_t <- t(otu)
otu_df <- as.data.frame(otu_t)
otu_df$sample_id <- rownames(otu_t)

# Merge with metadata
data <- inner_join(otu_df, meta, by = "sample_id")

# Run mikropml using run_ml()
set.seed(42)
fit <- run_ml(
  dataset = data,
  outcome_colname = "group",
  method = "rf",        # Choose from rf, svm, glmnet, xgb
  seed = 42
)

# View model summary
summary(fit)
                   Length Class      Mode     
trained_model      21     train      list     
test_data          55     data.frame list     
performance        17     tbl_df     list     
feature_importance  1     -none-     character
# Plot variable importance
fit$importance_plot
NULL

24.3 Notes

  • mikropl supports additional tuning and export for reproducibility.
  • mikropml() auto-detects classification vs regression tasks.