Audience: Students, researchers, analysts, and practitioners
Theme: Understanding the complete microbiome workflow
Introduction
Every microbiome project begins with a biological question and ends with a biological conclusion. Between these two points lies a connected sequence of design, experimental, computational, statistical, and interpretive decisions.
Together, these stages form the Microbiome Analysis System.
This system is not only a software workflow. It is a chain of evidence that connects biological context, metadata, sequencing data, analytical methods, statistical results, and interpretation.
Understanding how these components connect is essential for producing microbiome analyses that are reliable, reproducible, interpretable, and defensible.
The Microbiome Analysis System
The Microbiome Analysis System follows the full path from biological question to reproducible report.
Show code
flowchart TB A[Biological Question] --> B[Study Design and Metadata] B --> C[Sample Collection and Sequencing] C --> D[Data Acquisition] D --> E[Quality Control] E --> F[Feature Generation] F --> G[Taxonomic Profiling] F --> H[Functional Profiling] G --> I[Diversity Analysis] H --> I I --> J[Differential Analysis] J --> K[Biological Interpretation] K --> L[Reproducible Reporting]
flowchart TB
A[Biological Question] --> B[Study Design and Metadata]
B --> C[Sample Collection and Sequencing]
C --> D[Data Acquisition]
D --> E[Quality Control]
E --> F[Feature Generation]
F --> G[Taxonomic Profiling]
F --> H[Functional Profiling]
G --> I[Diversity Analysis]
H --> I
I --> J[Differential Analysis]
J --> K[Biological Interpretation]
K --> L[Reproducible Reporting]
Each stage influences the next. A weakness early in the system can limit what can be concluded later, even if downstream analyses are technically correct.
For this reason, MAS emphasizes the connection between analytical steps rather than treating each step as an isolated task.
System Logic
The Microbiome Analysis System can be understood as four connected layers:
Show code
flowchart LR A[Design Layer] --> B[Data Layer] B --> C[Analysis Layer] C --> D[Interpretation Layer]
flowchart LR
A[Design Layer] --> B[Data Layer]
B --> C[Analysis Layer]
C --> D[Interpretation Layer]
The design layer defines the biological question, study population, sampling strategy, metadata requirements, and sequencing approach.
The data layer includes data acquisition, organization, validation, and quality control.
The analysis layer includes feature generation, taxonomic profiling, functional profiling, diversity analysis, and differential analysis.
The interpretation layer connects statistical findings to biological meaning and produces a reproducible report.
From Question to Data
Every successful microbiome study begins with a clearly defined biological question.
The biological question influences:
study design
sample selection
metadata collection
sequencing strategy
statistical comparisons
interpretation of results
A clear question helps determine whether the analysis is descriptive, comparative, exploratory, diagnostic, ecological, clinical, agricultural, environmental, or mechanistic.
Without a clear question, microbiome analysis can easily produce many plots without producing a defensible conclusion.
From Samples to Sequencing Data
Microbiome data are shaped long before computational analysis begins.
Sample collection, storage, DNA extraction, library preparation, sequencing platform, target region, and sequencing depth all influence the data that enter the analysis workflow.
For example:
sample type affects microbial biomass and community composition
storage conditions can introduce technical variation
DNA extraction methods can bias organism recovery
16S and shotgun metagenomics answer different questions
sequencing depth affects sensitivity and feature detection
These upstream choices must be considered when interpreting downstream results.
From Raw Data to Features
Raw sequencing reads are not yet biological results.
They must first pass through quality control and feature generation. Depending on the sequencing strategy and workflow, features may include:
ASVs
OTUs
taxonomic profiles
gene family profiles
pathway profiles
functional potential summaries
Feature generation transforms sequencing reads into structured tables that can be analyzed statistically.
The quality of these feature tables determines the reliability of downstream diversity analysis, differential analysis, and interpretation.
From Features to Biological Patterns
Once microbiome features are generated, researchers begin looking for biological patterns.
Common analytical outputs include:
taxonomic composition summaries
alpha diversity metrics
beta diversity ordinations
group-wise community comparisons
differentially abundant taxa
functional pathway summaries
associations between microbiome features and metadata variables
These outputs help describe how microbial communities differ across samples, groups, conditions, time points, or environments.
However, patterns are not automatically conclusions. They must be evaluated in relation to the study design, metadata, statistical assumptions, biological plausibility, and known limitations.
From Results to Biological Insight
The goal of MAS is not simply to generate outputs. The goal is to produce biological insight.
A defensible microbiome conclusion should answer:
What pattern was observed?
Which data support the pattern?
Which method produced the result?
What comparison was tested?
What assumptions were made?
What biological explanation is plausible?
What limitations affect interpretation?
What should be reported cautiously?
Statistical significance alone does not guarantee biological relevance. A result becomes meaningful only when it is interpreted within the biological and technical context of the study.
Relationship to CDI-SDD and CDI-DAS
MAS connects naturally with two upstream CDI systems.
Show code
flowchart TB A[CDI Systematic Dataset Discovery] --> B[Study Selection] B --> C[CDI Data Acquisition System] C --> D[Sequencing Data and Metadata] D --> E[Microbiome Analysis System] E --> F[Biological Interpretation and Reporting]
flowchart TB
A[CDI Systematic Dataset Discovery] --> B[Study Selection]
B --> C[CDI Data Acquisition System]
C --> D[Sequencing Data and Metadata]
D --> E[Microbiome Analysis System]
E --> F[Biological Interpretation and Reporting]
The CDI Systematic Dataset Discovery System supports structured identification and selection of public studies.
The CDI Data Acquisition System supports reproducible acquisition, validation, and organization of public sequencing data.
The Microbiome Analysis System then supports downstream microbiome analysis, interpretation, and reporting.
This relationship prevents MAS from becoming isolated from the upstream decisions that determine data quality and biological relevance.
Reproducibility Throughout the System
Reproducibility should be built into every stage of the workflow.
This includes:
documenting study design decisions
preserving metadata tables
recording data sources and accession numbers
using scripted data acquisition where possible
documenting quality control thresholds
saving intermediate outputs
using version-controlled code
reporting software versions and parameters
producing reproducible reports
A reproducible microbiome workflow should allow another analyst to understand what was done, why it was done, and how the conclusions were reached.
How the Chapters Connect
The remaining chapters follow the progression of the Microbiome Analysis System.
Show code
flowchart TB A[Part I: Foundation] --> B[Part II: Data Acquisition and QC] B --> C[Part III: Feature Generation and Profiling] C --> D[Part IV: Statistical Analysis] D --> E[Part V: Interpretation and Reporting] E --> F[Part VI: Workforce Readiness]
flowchart TB
A[Part I: Foundation] --> B[Part II: Data Acquisition and QC]
B --> C[Part III: Feature Generation and Profiling]
C --> D[Part IV: Statistical Analysis]
D --> E[Part V: Interpretation and Reporting]
E --> F[Part VI: Workforce Readiness]
Each chapter focuses on one component of the system while maintaining its connection to the larger workflow.
The chapters are organized as follows:
Study Design and Metadata establishes the biological and analytical foundation.
Sample Collection and Sequencing explains how upstream laboratory decisions affect data.
Data Acquisition connects MAS to reproducible public data retrieval.
Quality Control evaluates whether the data are suitable for analysis.
Feature Generation converts reads into analyzable microbiome features.
Taxonomic Profiling describes community composition.
Diversity Analysis evaluates within-sample and between-sample variation.
Differential Analysis identifies features associated with groups or conditions.
Biological Interpretation translates outputs into evidence-based insight.
Reproducible Reporting documents the workflow and results.
Workforce Readiness connects technical skills to professional microbiome analysis practice.
Core Principle
The central principle of MAS is:
Microbiome results are only as defensible as the study design, metadata, sample handling, sequencing strategy, data quality, analytical methods, and biological interpretation that support them.
This principle applies throughout the guide.
What Comes Next
The next chapter begins with Study Design and Metadata, the foundation upon which every successful microbiome project is built.
# System Overview:::cdi-message- **ID:** MICROB-001- **Type:** System Overview- **Audience:** Students, researchers, analysts, and practitioners- **Theme:** Understanding the complete microbiome workflow:::## IntroductionEvery microbiome project begins with a biological question and ends with a biological conclusion. Between these two points lies a connected sequence of design, experimental, computational, statistical, and interpretive decisions.Together, these stages form the **Microbiome Analysis System**.This system is not only a software workflow. It is a chain of evidence that connects biological context, metadata, sequencing data, analytical methods, statistical results, and interpretation.Understanding how these components connect is essential for producing microbiome analyses that are reliable, reproducible, interpretable, and defensible.## The Microbiome Analysis SystemThe Microbiome Analysis System follows the full path from biological question to reproducible report.```{mermaid}flowchart TB A[Biological Question] --> B[Study Design and Metadata] B --> C[Sample Collection and Sequencing] C --> D[Data Acquisition] D --> E[Quality Control] E --> F[Feature Generation] F --> G[Taxonomic Profiling] F --> H[Functional Profiling] G --> I[Diversity Analysis] H --> I I --> J[Differential Analysis] J --> K[Biological Interpretation] K --> L[Reproducible Reporting]```Each stage influences the next. A weakness early in the system can limit what can be concluded later, even if downstream analyses are technically correct.For this reason, MAS emphasizes the connection between analytical steps rather than treating each step as an isolated task.## System LogicThe Microbiome Analysis System can be understood as four connected layers:```{mermaid}flowchart LR A[Design Layer] --> B[Data Layer] B --> C[Analysis Layer] C --> D[Interpretation Layer]```The **design layer** defines the biological question, study population, sampling strategy, metadata requirements, and sequencing approach.The **data layer** includes data acquisition, organization, validation, and quality control.The **analysis layer** includes feature generation, taxonomic profiling, functional profiling, diversity analysis, and differential analysis.The **interpretation layer** connects statistical findings to biological meaning and produces a reproducible report.## From Question to DataEvery successful microbiome study begins with a clearly defined biological question.The biological question influences:- study design- sample selection- metadata collection- sequencing strategy- statistical comparisons- interpretation of resultsA clear question helps determine whether the analysis is descriptive, comparative, exploratory, diagnostic, ecological, clinical, agricultural, environmental, or mechanistic.Without a clear question, microbiome analysis can easily produce many plots without producing a defensible conclusion.## From Samples to Sequencing DataMicrobiome data are shaped long before computational analysis begins.Sample collection, storage, DNA extraction, library preparation, sequencing platform, target region, and sequencing depth all influence the data that enter the analysis workflow.For example:- sample type affects microbial biomass and community composition- storage conditions can introduce technical variation- DNA extraction methods can bias organism recovery- 16S and shotgun metagenomics answer different questions- sequencing depth affects sensitivity and feature detectionThese upstream choices must be considered when interpreting downstream results.## From Raw Data to FeaturesRaw sequencing reads are not yet biological results.They must first pass through quality control and feature generation. Depending on the sequencing strategy and workflow, features may include:- ASVs- OTUs- taxonomic profiles- gene family profiles- pathway profiles- functional potential summariesFeature generation transforms sequencing reads into structured tables that can be analyzed statistically.The quality of these feature tables determines the reliability of downstream diversity analysis, differential analysis, and interpretation.## From Features to Biological PatternsOnce microbiome features are generated, researchers begin looking for biological patterns.Common analytical outputs include:- taxonomic composition summaries- alpha diversity metrics- beta diversity ordinations- group-wise community comparisons- differentially abundant taxa- functional pathway summaries- associations between microbiome features and metadata variablesThese outputs help describe how microbial communities differ across samples, groups, conditions, time points, or environments.However, patterns are not automatically conclusions. They must be evaluated in relation to the study design, metadata, statistical assumptions, biological plausibility, and known limitations.## From Results to Biological InsightThe goal of MAS is not simply to generate outputs. The goal is to produce biological insight.A defensible microbiome conclusion should answer:- What pattern was observed?- Which data support the pattern?- Which method produced the result?- What comparison was tested?- What assumptions were made?- What biological explanation is plausible?- What limitations affect interpretation?- What should be reported cautiously?Statistical significance alone does not guarantee biological relevance. A result becomes meaningful only when it is interpreted within the biological and technical context of the study.## Relationship to CDI-SDD and CDI-DASMAS connects naturally with two upstream CDI systems.```{mermaid}flowchart TB A[CDI Systematic Dataset Discovery] --> B[Study Selection] B --> C[CDI Data Acquisition System] C --> D[Sequencing Data and Metadata] D --> E[Microbiome Analysis System] E --> F[Biological Interpretation and Reporting]```The **CDI Systematic Dataset Discovery System** supports structured identification and selection of public studies.The **CDI Data Acquisition System** supports reproducible acquisition, validation, and organization of public sequencing data.The **Microbiome Analysis System** then supports downstream microbiome analysis, interpretation, and reporting.This relationship prevents MAS from becoming isolated from the upstream decisions that determine data quality and biological relevance.## Reproducibility Throughout the SystemReproducibility should be built into every stage of the workflow.This includes:- documenting study design decisions- preserving metadata tables- recording data sources and accession numbers- using scripted data acquisition where possible- documenting quality control thresholds- saving intermediate outputs- using version-controlled code- reporting software versions and parameters- producing reproducible reportsA reproducible microbiome workflow should allow another analyst to understand what was done, why it was done, and how the conclusions were reached.## How the Chapters ConnectThe remaining chapters follow the progression of the Microbiome Analysis System.```{mermaid}flowchart TB A[Part I: Foundation] --> B[Part II: Data Acquisition and QC] B --> C[Part III: Feature Generation and Profiling] C --> D[Part IV: Statistical Analysis] D --> E[Part V: Interpretation and Reporting] E --> F[Part VI: Workforce Readiness]```Each chapter focuses on one component of the system while maintaining its connection to the larger workflow.The chapters are organized as follows:- **Study Design and Metadata** establishes the biological and analytical foundation.- **Sample Collection and Sequencing** explains how upstream laboratory decisions affect data.- **Data Acquisition** connects MAS to reproducible public data retrieval.- **Quality Control** evaluates whether the data are suitable for analysis.- **Feature Generation** converts reads into analyzable microbiome features.- **Taxonomic Profiling** describes community composition.- **Functional Profiling** explores functional potential.- **Diversity Analysis** evaluates within-sample and between-sample variation.- **Differential Analysis** identifies features associated with groups or conditions.- **Biological Interpretation** translates outputs into evidence-based insight.- **Reproducible Reporting** documents the workflow and results.- **Workforce Readiness** connects technical skills to professional microbiome analysis practice.## Core PrincipleThe central principle of MAS is:> Microbiome results are only as defensible as the study design, metadata, sample handling, sequencing strategy, data quality, analytical methods, and biological interpretation that support them.This principle applies throughout the guide.## What Comes NextThe next chapter begins with **Study Design and Metadata**, the foundation upon which every successful microbiome project is built.