Compare Relative Abundances Among Treatments

John Quensen

2026-05-09

Introduction

Several of the functions in QsRutils have to do with making tables of differential abundances of taxa among treatments. All but one of these functions begin with comp_ for comparisons. These should be executed in a certain order as given in the next table. The arguments for each function can be found in the documentation. The first two take experiment-level phyloseq objects with at least slots otu_table, sample_data , and tax_table.

Function Purpose
comp_prepare_phyloseq Makes a copy of the phyloseq object with the OTU table transformed to percentages. For both objects, it then agglomerates taxa to a given rank, modifies the taxonomy tables to include only the given rank, and filters out OTUs below a given percentage of the total counts. The function returns a list of the two phyloseq objects.
comp_prepare_otu_table Applies a transformation to the OTU percentage table from comp_prepare_phyloseq and makes a vector of the treatment groups. The “grps” argument is the name of a factor in the sample_data table. The function returns a list of the OTU matrix as percentages, the OTU matrix as transformed data, and a vector of treatment groups.
comp_means_sd For each taxon, calculates the mean and standard deviation across all samples and returns the results as a data frame.
comp_make_f_tests For each taxon, performs one-way ANOVA for differences in relative abundance among treatments and returns the results as a data frame. The argument grps is a vector of treatment groups.
comp_comparisons For each taxon and each treatment, calculates means and standard deviations of relative abundances. Performs all pairwise t-tests and assigns letters indicating treatment differences. Returns the result as a data frame. The argument grps is a vector of treatment groups.
comp_assemble Assembles the results of the last three functions into a summary data frame.

A seventh function, make_comparisons, wraps all six of the other functions and returns a list of the the comparison table as a data frame, the OTU matrix as percentages, the OTU matrix as transformed data, and a vector of treatment groups. It is handy for comparing relative abundances of taxa for a given rank, but sometimes you may need to prepare the phyloseq object or even OTU tables in a different way. I give examples of three cases below.

Case 1 - Compare Realtive Abundances of Phyla

The is the simplest case, and is easily executed with the wrapper function. First, load the example data set. It is a phyloseq object with otu_table, sample_data, and tax_table.

suppressPackageStartupMessages(library(phyloseq))
suppressPackageStartupMessages(library(QsRutils))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(reshape2))
data("its.root")
its.root
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1496 taxa and 23 samples ]
## sample_data() Sample Data:       [ 23 samples by 8 sample variables ]
## tax_table()   Taxonomy Table:    [ 1496 taxa by 7 taxonomic ranks ]

Then use the function make_comparisons. The comparison table itself is the first item in the list returned.

comp.phyla <- make_comparisons(its.root, taxrank = "Phylum", grps = "Label", transformation = "sqrt_arc_sine",
    pc.filter = 0.01, p.adjust.method = "BH", pool.sd = TRUE)
comp <- comp.phyla$comparison.table
comp <- comp[order(comp$mean, decreasing = TRUE), ]
comp
##                mean    sd        F           2HR           3HR           CHR
## Ascomycota    77.00 21.22    3.72*  88.5+/-6.29a 90.13+/-4.16a  86.94+/-6.1a
## Glomeromycota 18.42 23.14  9.01*** 7.05+/-3.65ab  1.01+/-0.29a 2.79+/-1.77ab
## Basidiomycota  4.57  4.94 11.07*** 4.43+/-3.14ab  8.85+/-4.06a 10.27+/-5.91a
##                          2LR             3LR             CLR
## Ascomycota    50.34+/-29.99b 63.34+/-19.91ab  84.64+/-13.4ab
## Glomeromycota 49.13+/-29.87c  35.21+/-19.15c 14.29+/-12.21bc
## Basidiomycota   0.51+/-0.31c   1.43+/-1.16bc    1.07+/-1.31c

You can check that the assumption of equal variances is met with the check_var function. It takes as arguments the OTU matrix of transformed data and the vector of treatment groups. It performs a Fligner-Killeen test for homogeneity of variances for each taxon and prints the results to the console.

check_var(comp.phyla$taxa.pc.transformed, comp.phyla$groups)
## Fligner-Killeen Test of Homogeneity of Variances
## 
##           taxon statistic df   p.value
## 1 Basidiomycota  3.190297  5 0.6706743
## 2    Ascomycota  3.573743  5 0.6122599
## 3 Glomeromycota  3.565002  5 0.6135756

If the results indicate heterogeneity of variances, you can repeat make_comparisons with sd.pool = FALSE and/or try a different transformation. Three transformation functions are included in QsRutils: arc_sine, log_arc_sine, and sqrt_arc_sine, the last of which seems to work most often. Other standard functions, e.g. log, sqrt, etc. may be used, as could a custom function supplied by the user. transfomation = "none" is also accepted.

Case 2 - Compare Relative Abundances of Classes within a Phylum

In the example above, Ascomycota represents 77% of all of the counts. We might want to beak it down into classes. At first you might think we could simply subset its.root to the phylum Ascomycota and use the resulting phyloseq object:

asco <- subset_taxa(its.root, Phylum == "Ascomycota")
asco <- prune_taxa(taxa_sums(asco) > 0, asco)
comp.asco <- make_comparisons(asco, taxrank = "Class", grps = "Label", transformation = "sqrt_arc_sine",
    pc.filter = 0.01, p.adjust.method = "BH", pool.sd = TRUE)
comp.asco$comparison.table
##                     mean    sd      F            2HR              3HR
## Dothideomycetes    37.64 20.56   0.93 41.34+/-19.87a     23.53+/-9.5a
## Eurotiomycetes     12.92 14.73   0.84 17.52+/-10.64a   16.08+/-22.62a
## Orbiliomycetes      5.45 11.14 4.54** 10.86+/-10.34a    14.28+/-23.2a
## Sordariomycetes    12.95  8.78  3.66*  20.16+/-7.94a   19.3+/-10.01ab
## unclass_Ascomycota 28.96 24.77  3.99*   9.68+/-3.64a 26.45+/-25.71abc
##                                CHR             2LR             3LR
## Dothideomycetes     43.95+/-15.37a   50.5+/-25.03a  28.62+/-23.76a
## Eurotiomycetes      20.07+/-20.09a  12.87+/-15.91a    4.86+/-5.06a
## Orbiliomycetes        5.25+/-5.07a    0.06+/-0.08b    0.7+/-0.73ab
## Sordariomycetes    13.91+/-4.18abc     5.82+/-7.4c 11.69+/-7.83abc
## unclass_Ascomycota   13.21+/-5.8ab 23.7+/-19.39abc  53.77+/-20.45c
##                                CLR
## Dothideomycetes     37.99+/-27.95a
## Eurotiomycetes        3.83+/-2.62a
## Orbiliomycetes       0.23+/-0.03ab
## Sordariomycetes      4.78+/-3.38bc
## unclass_Ascomycota 52.96+/-33.09bc

But these proportions are based on the number of Ascomycota sequences, not the total sequences in its.root. (They do not add up to exactly 100% of the Ascomycota sequences because we filtered out a few small classes.) If we want them to add up to something close to 77%, the mean percentage of Ascomycota in the first example above, we have to take a different approach. We have to prepare the phyoseq object “manually,” and use the the other comp_ functions to generate the comparison table. The first part of the code below is copied from the comp_prepare_phyloseq function but modified to subset to Ascomycota and then agglomerate to class and remove ranks other than class. Because the transformation to percentages is performed before these steps, in the end result the percentages are based on the total number of sequences in the original phyloseq object. I made the following code somewhat generic by setting values for expt, taxrank, and pc.filter. Thus, for similar tasks, you can cut and paste the code, editing only the values in red as you wish.

expt <- its.root
expt.pc <- transform_sample_counts(expt, function(x) 100 * (x/sum(x)))

# Subset to higher rank
expt.taxon <- subset_taxa(expt, Phylum == "Ascomycota")
expt.taxon.pc <- subset_taxa(expt.pc, Phylum == "Ascomycota")

# Agglomerate to desired rank.
taxrank <- "Class"
expt.taxon <- tax_glom(expt.taxon, taxrank)
expt.taxon.pc <- tax_glom(expt.taxon.pc, taxrank)

# Remove ranks other than taxrank
tax_table(expt.taxon) <- tax_table(expt.taxon)[, taxrank]
tax_table(expt.taxon.pc) <- tax_table(expt.taxon.pc)[, taxrank]

# Filter out taxa that are < 0.1% of the total sequences in expt.
pc.filter <- 0.001
n <- sum(taxa_sums(expt)) * pc.filter
expt.taxon <- prune_taxa(taxa_sums(expt.taxon) >= n, expt.taxon)
expt.taxon.pc <- prune_taxa(taxa_names(expt.taxon), expt.taxon.pc)

# Make comparisons
temp2 <- comp_prepare_otu_table(expt.taxon.pc, grps = "Label", transformation = "sqrt_arc_sine")
temp3 <- comp_means_sd(temp2$otu.pc)
temp4 <- comp_make_f_tests(temp2$otu.pc.trans, grps = temp2$groups, var.equal = TRUE)
temp5 <- comp_comparisons(otu.pc = temp2$otu.pc, otu.pc.trans = temp2$otu.pc.trans,
    grps = temp2$groups, p.adjust.method = "BH", pool.sd = TRUE)
comp <- comp_assemble(temp3, temp4, temp5)
comp <- comp[order(comp$mean, decreasing = TRUE), ]
comp
##                     mean    sd F_value            2HR             3HR
## Dothideomycetes    29.24 19.45    0.82 37.42+/-20.21a   21.21+/-8.89a
## unclass_Ascomycota 21.80 21.10   3.15*    8.46+/-2.9a   24.36+/-24.4a
## Sordariomycetes    10.34  8.01  8.2***  17.66+/-6.16a   17.65+/-9.58a
## Eurotiomycetes      9.75 12.43    1.63  15.09+/-8.85a  14.37+/-20.26a
## Orbiliomycetes      4.65  9.48  5.48**   9.47+/-8.85a 12.19+/-19.53ab
## Saccharomycetes     0.62  2.23    0.68   0.16+/-0.21a     0.27+/-0.2a
## Pezizomycetes       0.55  2.42     1.5   0.12+/-0.14a    0.01+/-0.01a
##                               CHR            2LR            3LR            CLR
## Dothideomycetes    37.88+/-12.24a 30.31+/-31.35a 17.38+/-16.33a 31.87+/-24.98a
## unclass_Ascomycota  11.56+/-5.53a  11.29+/-8.92a 35.75+/-24.46a  45.24+/-30.3a
## Sordariomycetes    12.04+/-3.67ab   1.95+/-1.75c  7.26+/-5.03bd  3.85+/-2.36cd
## Eurotiomycetes     17.68+/-17.61a   4.04+/-4.12a   2.38+/-2.21a   3.32+/-2.64a
## Orbiliomycetes      4.51+/-4.43ab   0.02+/-0.03c  0.41+/-0.46bc  0.19+/-0.04bc
## Saccharomycetes       0.25+/-0.2a   2.72+/-5.38a   0.11+/-0.22a   0.05+/-0.04a
## Pezizomycetes         2.96+/-5.8a         0+/-0a   0.04+/-0.04a   0.06+/-0.06a

These percentages are based on the total counts in its.root and add up to approximately 77%, the proportion of Ascomycota sequences in the first case.

Case 3 - Working with a Pre-existing OTU Table

This is a rather trite example because the differences are so obvious, but suppose you had made a figure something like this:

data("plot_df")
ggplot(data = plot_df, aes(x = Treatment, y = Percent, fill = Family)) + stat_summary(fun.y = "mean",
    geom = "col", position = position_stack())
## Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## ℹ Please use the `fun` argument instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

You went to some trouble getting it the way you wanted, and now a reviewer wants statistics on which families are different among treatments. plot_df contains percentages of families by treatment based on a fuller data set, but is in “long” format, suitable for ggplot. You can recover the OTU table and make a table comparing differential abundances using the procedures below. The function dcast in the package reshaape2 can convert “long” format into “wide” format, but if there are replications as we should have, it insists on aggregating the data in some way, e.g. by length (counts) or means. We can prevent this by adding another variable, such as a sequence number, with unique values for every row. Then we can use dcast to get the data in the wide format we want for our OTU table.

plot_df has the format:

head(plot_df)
##   Treatment        Family    Percent
## 1       CLR Racocetraceae 4.51388889
## 2       CHR Racocetraceae 1.82291667
## 3       3LR Racocetraceae 8.50694444
## 4       2LR Racocetraceae 1.56250000
## 5       CLR Racocetraceae 1.90972222
## 6       3HR Racocetraceae 0.02893519

To recover our data in wide format, we do:

plot_df$seq <- with(plot_df, ave(Percent, Treatment, Family, FUN = seq_along))
my.data <- dcast(seq + Treatment ~ Family, data = plot_df, value.var = "Percent")
head(my.data)
##   seq Treatment Racocetraceae Gigasporaceae Glomeraceae unclass_Glomeromycota
## 1   1       2HR    0.00000000     0.6655093  3.47222222            1.15740741
## 2   1       3HR    0.02893519     0.4340278  0.34722222            0.34722222
## 3   1       CHR    1.82291667     2.3148148  0.31828704            0.34722222
## 4   1       2LR    1.56250000     2.0543981  0.08680556            0.00000000
## 5   1       3LR    8.50694444    19.4733796  0.11574074            0.02893519
## 6   1       CLR    4.51388889     1.0995370  0.05787037            0.26041667
##   Scutellosporaceae     Others
## 1         0.3182870 0.00000000
## 2         0.0000000 0.00000000
## 3         0.2314815 0.00000000
## 4         1.4467593 0.08680556
## 5         2.9513889 0.05787037
## 6         5.6423611 0.26041667

Here the ave function applies seq_along over all level combinations of Percent, Treatment, and Family. The result is that all values of seq + Treatment are unique, and thus dcast returns what we want without aggregating the data. We next save the Treatment column as a vector of our treatment groups and then remove the first two columns of my.data to get a matrix of the OTU table as percentages.

my.grps <- my.data$Treatment
my.pc <- my.data[, -c(1, 2)]

Next we transform the percents matrix using the sqrt_arc_sine function in this package.

my.pc.trans <- apply(my.pc, 2, sqrt_arc_sine)

Both my.pc and my.pc.trans have taxa in columns. For the remaining steps we need taxa to be in rows, so we have to transpose.

my.pc <- t(my.pc)
my.pc.trans <- t(my.pc.trans)

With these two matrices and the vector of groups, we are ready to make our comparison table.

temp3 <- comp_means_sd(my.pc)
temp4 <- comp_make_f_tests(my.pc.trans, grps = my.grps, var.equal = TRUE)
temp5 <- comp_comparisons(otu.pc = my.pc, otu.pc.trans = my.pc.trans, grps = my.grps,
    p.adjust.method = "BH", pool.sd = TRUE)
comp <- comp_assemble(temp3, temp4, temp5)
comp <- comp[order(comp$mean, decreasing = TRUE), ]
comp
##                       mean    sd  F_value           2HR          3HR
## Gigasporaceae         7.41 12.69  9.88***  1.38+/-2.08a  0.14+/-0.2a
## Scutellosporaceae     4.77  6.73 12.56*** 1.19+/-1.14ab 0.01+/-0.02c
## Racocetraceae         4.18  5.51   6.94** 1.47+/-2.24ab  0.12+/-0.1a
## Glomeraceae           1.18  1.53     1.76  2.53+/-1.24a 0.34+/-0.12a
## unclass_Glomeromycota 0.71  1.06      0.2   0.46+/-0.5a  0.4+/-0.21a
## Others                0.17  0.27 10.89*** 0.03+/-0.02ab       0+/-0c
##                                 CHR            2LR           3LR           CLR
## Gigasporaceae          0.59+/-1.15a 28.22+/-17.98b 11.01+/-6.6bc 1.72+/-1.43ac
## Scutellosporaceae       0.2+/-0.1ac  11.65+/-7.25d  8.69+/-8.42d 7.55+/-7.84bd
## Racocetraceae         0.94+/-0.93ab   6.63+/-3.84c 11.96+/-8.09c 3.89+/-1.75bc
## Glomeraceae            0.27+/-0.07a    0.95+/-0.9a   2.2+/-2.79a  0.66+/-1.06a
## unclass_Glomeromycota  0.77+/-0.94a    1.4+/-2.08a  0.85+/-1.25a   0.3+/-0.32a
## Others                0.01+/-0.03ac   0.28+/-0.18d  0.51+/-0.48d 0.18+/-0.13bd