Datasets

This guide uses the QIIME2 Moving Pictures demo dataset.

The data are downloaded as QIIME2 artifacts and converted into a single phyloseq object using reproducible scripts:

Those scripts:

From this point onward, we work with one structured object.

Load the canonical object

library(phyloseq)

ps <- readRDS("data/moving-pictures-ps.rds")

A microbiome dataset is not a single table.

It is a coordinated structure containing: - an abundance matrix
- sample-level metadata
- taxonomic annotation

All components must align exactly.

Inspect the OTU table

otu <- methods::as(phyloseq::otu_table(ps), "matrix")

if (!phyloseq::taxa_are_rows(ps)) {
  otu <- t(otu)
}

otu[1:6, 1:6]
                                 L1S105 L1S140 L1S208 L1S257 L1S281 L1S57
33e2cadd9d0b2b4ebeb6261766032e4a      0      0      0      0      0     0
5656d8b980bfee07e29e8fc119850901      0      0      0      0      0     0
7d893311a14a858907d4c8ca21d32dc4      0      0      0      0      0     0
ecf9eb9fa3970ff27a221e626395c75d      0      0      0      0      0     0
acfe4c003905a7074aeaf385b78ad9e0      0      0    104     50     47     0
80b20e907aa4fcf2309796bc303d151d      0     48      0     11      0     0

This is a small slice of the count matrix.

  • Rows represent taxa (ASVs)
  • Columns represent samples
  • Values are raw read counts

Most entries are zero.

Sparsity is not a technical artifact. It is a biological reality.

Inspect sample metadata

head(phyloseq::sample_data(ps))
       barcode.sequence body.site year month day   subject
L1S105     AGTGCGATGCGT       gut 2009     3  17 subject-1
L1S140     ATGGCAGCTCTA       gut 2008    10  28 subject-2
L1S208     CTGAGATACGCG       gut 2009     1  20 subject-2
L1S257     CCGACTGAGATG       gut 2009     3  17 subject-2
L1S281     CCTCTCGTGATC       gut 2009     4  14 subject-2
L1S57      ACACACTATGGC       gut 2009     1  20 subject-1
       reported.antibiotic.usage days.since.experiment.start
L1S105                        No                         140
L1S140                       Yes                           0
L1S208                        No                          84
L1S257                        No                         140
L1S281                        No                         168
L1S57                         No                          84

Metadata provides the biological and experimental context.

Without metadata, abundance values have no meaning. Visualization is always performed relative to metadata.

Inspect taxonomy

head(phyloseq::tax_table(ps))
Taxonomy Table:     [6 taxa by 7 taxonomic ranks]:
                                 Kingdom    Phylum          
33e2cadd9d0b2b4ebeb6261766032e4a "Bacteria" "Firmicutes"    
5656d8b980bfee07e29e8fc119850901 "Bacteria" "Firmicutes"    
7d893311a14a858907d4c8ca21d32dc4 "Bacteria" "Firmicutes"    
ecf9eb9fa3970ff27a221e626395c75d "Bacteria" "Proteobacteria"
acfe4c003905a7074aeaf385b78ad9e0 "Bacteria" "Firmicutes"    
80b20e907aa4fcf2309796bc303d151d "Bacteria" "Firmicutes"    
                                 Class                 Order          
33e2cadd9d0b2b4ebeb6261766032e4a "Clostridia"          "Clostridiales"
5656d8b980bfee07e29e8fc119850901 "Clostridia"          "Clostridiales"
7d893311a14a858907d4c8ca21d32dc4 "Clostridia"          "Clostridiales"
ecf9eb9fa3970ff27a221e626395c75d "Alphaproteobacteria" "Rickettsiales"
acfe4c003905a7074aeaf385b78ad9e0 "Clostridia"          "Clostridiales"
80b20e907aa4fcf2309796bc303d151d "Clostridia"          "Clostridiales"
                                 Family            Genus         Species     
33e2cadd9d0b2b4ebeb6261766032e4a "Peptococcaceae"  "Peptococcus" NA          
5656d8b980bfee07e29e8fc119850901 "Peptococcaceae"  "Peptococcus" NA          
7d893311a14a858907d4c8ca21d32dc4 "Peptococcaceae"  "Peptococcus" NA          
ecf9eb9fa3970ff27a221e626395c75d "mitochondria"    "Raphanus"    "sativus"   
acfe4c003905a7074aeaf385b78ad9e0 "Ruminococcaceae" "Gemmiger"    "formicilis"
80b20e907aa4fcf2309796bc303d151d "Ruminococcaceae" NA            NA          

Taxonomy links sequence variants to biological hierarchy.

Aggregation (for example genus-level barplots) depends entirely on this mapping. If taxonomy is inconsistent, interpretation becomes unstable.

Structural integrity

all(phyloseq::sample_names(ps) == 
    rownames(phyloseq::sample_data(ps)))
[1] TRUE

A valid phyloseq object guarantees alignment across components.

Many downstream errors originate from mismatched sample identifiers.