Several of the functions in QsRutils have to do with
making tables of differential abundances of taxa among treatments. All
but one of these functions begin with comp_ for
comparisons. These should be executed in a certain order as
given in the next table. The arguments for each function can be found in
the documentation. The first two take experiment-level
phyloseq objects with at least slots otu_table, sample_data
, and tax_table.
| Function | Purpose |
|---|---|
| comp_prepare_phyloseq | Makes a copy of the phyloseq object with the OTU table transformed
to percentages. For both objects, it then agglomerates taxa to a given
rank, modifies the taxonomy tables to include only the given rank, and
filters out OTUs below a given percentage of the total counts. The
function returns a list of the two phyloseq objects. |
| comp_prepare_otu_table | Applies a transformation to the OTU percentage table from
comp_prepare_phyloseq and makes a vector of the treatment
groups. The “grps” argument is the name of a factor in
the sample_data table. The function returns a list of the OTU matrix as
percentages, the OTU matrix as transformed data, and a vector of
treatment groups. |
| comp_means_sd | For each taxon, calculates the mean and standard deviation across all samples and returns the results as a data frame. |
| comp_make_f_tests | For each taxon, performs one-way ANOVA for differences in relative
abundance among treatments and returns the results as a data frame. The
argument grps is a vector of treatment
groups. |
| comp_comparisons | For each taxon and each treatment, calculates means and standard
deviations of relative abundances. Performs all pairwise t-tests and
assigns letters indicating treatment differences. Returns the result as
a data frame. The argument grps is a
vector of treatment groups. |
| comp_assemble | Assembles the results of the last three functions into a summary data frame. |
A seventh function, make_comparisons, wraps all six of
the other functions and returns a list of the the comparison table as a
data frame, the OTU matrix as percentages, the OTU matrix as transformed
data, and a vector of treatment groups. It is handy for comparing
relative abundances of taxa for a given rank, but sometimes you may need
to prepare the phyloseq object or even OTU tables in a different way. I
give examples of three cases below.
The is the simplest case, and is easily executed with the wrapper
function. First, load the example data set. It is a
phyloseq object with otu_table, sample_data, and
tax_table.
suppressPackageStartupMessages(library(phyloseq))
suppressPackageStartupMessages(library(QsRutils))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(reshape2))
data("its.root")
its.root## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 1496 taxa and 23 samples ]
## sample_data() Sample Data: [ 23 samples by 8 sample variables ]
## tax_table() Taxonomy Table: [ 1496 taxa by 7 taxonomic ranks ]
Then use the function make_comparisons. The comparison
table itself is the first item in the list returned.
comp.phyla <- make_comparisons(its.root, taxrank = "Phylum", grps = "Label", transformation = "sqrt_arc_sine",
pc.filter = 0.01, p.adjust.method = "BH", pool.sd = TRUE)
comp <- comp.phyla$comparison.table
comp <- comp[order(comp$mean, decreasing = TRUE), ]
comp## mean sd F 2HR 3HR CHR
## Ascomycota 77.00 21.22 3.72* 88.5+/-6.29a 90.13+/-4.16a 86.94+/-6.1a
## Glomeromycota 18.42 23.14 9.01*** 7.05+/-3.65ab 1.01+/-0.29a 2.79+/-1.77ab
## Basidiomycota 4.57 4.94 11.07*** 4.43+/-3.14ab 8.85+/-4.06a 10.27+/-5.91a
## 2LR 3LR CLR
## Ascomycota 50.34+/-29.99b 63.34+/-19.91ab 84.64+/-13.4ab
## Glomeromycota 49.13+/-29.87c 35.21+/-19.15c 14.29+/-12.21bc
## Basidiomycota 0.51+/-0.31c 1.43+/-1.16bc 1.07+/-1.31c
You can check that the assumption of equal variances is met with the
check_var function. It takes as arguments the OTU matrix of
transformed data and the vector of treatment groups. It performs a
Fligner-Killeen test for homogeneity of variances for each taxon and
prints the results to the console.
## Fligner-Killeen Test of Homogeneity of Variances
##
## taxon statistic df p.value
## 1 Basidiomycota 3.190297 5 0.6706743
## 2 Ascomycota 3.573743 5 0.6122599
## 3 Glomeromycota 3.565002 5 0.6135756
If the results indicate heterogeneity of variances, you can repeat
make_comparisons with sd.pool = FALSE and/or
try a different transformation. Three transformation functions are
included in QsRutils: arc_sine,
log_arc_sine, and sqrt_arc_sine, the last of
which seems to work most often. Other standard functions,
e.g. log, sqrt, etc. may be used, as could a
custom function supplied by the user.
transfomation = "none" is also accepted.
In the example above, Ascomycota represents 77% of all of
the counts. We might want to beak it down into classes. At first you
might think we could simply subset its.root to the phylum
Ascomycota and use the resulting phyloseq object:
asco <- subset_taxa(its.root, Phylum == "Ascomycota")
asco <- prune_taxa(taxa_sums(asco) > 0, asco)
comp.asco <- make_comparisons(asco, taxrank = "Class", grps = "Label", transformation = "sqrt_arc_sine",
pc.filter = 0.01, p.adjust.method = "BH", pool.sd = TRUE)
comp.asco$comparison.table## mean sd F 2HR 3HR
## Dothideomycetes 37.64 20.56 0.93 41.34+/-19.87a 23.53+/-9.5a
## Eurotiomycetes 12.92 14.73 0.84 17.52+/-10.64a 16.08+/-22.62a
## Orbiliomycetes 5.45 11.14 4.54** 10.86+/-10.34a 14.28+/-23.2a
## Sordariomycetes 12.95 8.78 3.66* 20.16+/-7.94a 19.3+/-10.01ab
## unclass_Ascomycota 28.96 24.77 3.99* 9.68+/-3.64a 26.45+/-25.71abc
## CHR 2LR 3LR
## Dothideomycetes 43.95+/-15.37a 50.5+/-25.03a 28.62+/-23.76a
## Eurotiomycetes 20.07+/-20.09a 12.87+/-15.91a 4.86+/-5.06a
## Orbiliomycetes 5.25+/-5.07a 0.06+/-0.08b 0.7+/-0.73ab
## Sordariomycetes 13.91+/-4.18abc 5.82+/-7.4c 11.69+/-7.83abc
## unclass_Ascomycota 13.21+/-5.8ab 23.7+/-19.39abc 53.77+/-20.45c
## CLR
## Dothideomycetes 37.99+/-27.95a
## Eurotiomycetes 3.83+/-2.62a
## Orbiliomycetes 0.23+/-0.03ab
## Sordariomycetes 4.78+/-3.38bc
## unclass_Ascomycota 52.96+/-33.09bc
But these proportions are based on the number of Ascomycota
sequences, not the total sequences in its.root. (They do
not add up to exactly 100% of the Ascomycota sequences because
we filtered out a few small classes.) If we want them to add up to
something close to 77%, the mean percentage of Ascomycota in
the first example above, we have to take a different approach. We have
to prepare the phyoseq object “manually,” and use the the other
comp_ functions to generate the comparison table. The first
part of the code below is copied from the
comp_prepare_phyloseq function but modified to subset to
Ascomycota and then agglomerate to class and remove ranks other
than class. Because the transformation to percentages is performed
before these steps, in the end result the percentages are based on the
total number of sequences in the original phyloseq object.
I made the following code somewhat generic by setting values for
expt, taxrank, and pc.filter.
Thus, for similar tasks, you can cut and paste the code, editing only
the values in red as you wish.
expt <- its.root
expt.pc <- transform_sample_counts(expt, function(x) 100 * (x/sum(x)))
# Subset to higher rank
expt.taxon <- subset_taxa(expt, Phylum == "Ascomycota")
expt.taxon.pc <- subset_taxa(expt.pc, Phylum == "Ascomycota")
# Agglomerate to desired rank.
taxrank <- "Class"
expt.taxon <- tax_glom(expt.taxon, taxrank)
expt.taxon.pc <- tax_glom(expt.taxon.pc, taxrank)
# Remove ranks other than taxrank
tax_table(expt.taxon) <- tax_table(expt.taxon)[, taxrank]
tax_table(expt.taxon.pc) <- tax_table(expt.taxon.pc)[, taxrank]
# Filter out taxa that are < 0.1% of the total sequences in expt.
pc.filter <- 0.001
n <- sum(taxa_sums(expt)) * pc.filter
expt.taxon <- prune_taxa(taxa_sums(expt.taxon) >= n, expt.taxon)
expt.taxon.pc <- prune_taxa(taxa_names(expt.taxon), expt.taxon.pc)
# Make comparisons
temp2 <- comp_prepare_otu_table(expt.taxon.pc, grps = "Label", transformation = "sqrt_arc_sine")
temp3 <- comp_means_sd(temp2$otu.pc)
temp4 <- comp_make_f_tests(temp2$otu.pc.trans, grps = temp2$groups, var.equal = TRUE)
temp5 <- comp_comparisons(otu.pc = temp2$otu.pc, otu.pc.trans = temp2$otu.pc.trans,
grps = temp2$groups, p.adjust.method = "BH", pool.sd = TRUE)
comp <- comp_assemble(temp3, temp4, temp5)
comp <- comp[order(comp$mean, decreasing = TRUE), ]
comp## mean sd F_value 2HR 3HR
## Dothideomycetes 29.24 19.45 0.82 37.42+/-20.21a 21.21+/-8.89a
## unclass_Ascomycota 21.80 21.10 3.15* 8.46+/-2.9a 24.36+/-24.4a
## Sordariomycetes 10.34 8.01 8.2*** 17.66+/-6.16a 17.65+/-9.58a
## Eurotiomycetes 9.75 12.43 1.63 15.09+/-8.85a 14.37+/-20.26a
## Orbiliomycetes 4.65 9.48 5.48** 9.47+/-8.85a 12.19+/-19.53ab
## Saccharomycetes 0.62 2.23 0.68 0.16+/-0.21a 0.27+/-0.2a
## Pezizomycetes 0.55 2.42 1.5 0.12+/-0.14a 0.01+/-0.01a
## CHR 2LR 3LR CLR
## Dothideomycetes 37.88+/-12.24a 30.31+/-31.35a 17.38+/-16.33a 31.87+/-24.98a
## unclass_Ascomycota 11.56+/-5.53a 11.29+/-8.92a 35.75+/-24.46a 45.24+/-30.3a
## Sordariomycetes 12.04+/-3.67ab 1.95+/-1.75c 7.26+/-5.03bd 3.85+/-2.36cd
## Eurotiomycetes 17.68+/-17.61a 4.04+/-4.12a 2.38+/-2.21a 3.32+/-2.64a
## Orbiliomycetes 4.51+/-4.43ab 0.02+/-0.03c 0.41+/-0.46bc 0.19+/-0.04bc
## Saccharomycetes 0.25+/-0.2a 2.72+/-5.38a 0.11+/-0.22a 0.05+/-0.04a
## Pezizomycetes 2.96+/-5.8a 0+/-0a 0.04+/-0.04a 0.06+/-0.06a
These percentages are based on the total counts in
its.root and add up to approximately 77%, the proportion of
Ascomycota sequences in the first case.
This is a rather trite example because the differences are so obvious, but suppose you had made a figure something like this:
data("plot_df")
ggplot(data = plot_df, aes(x = Treatment, y = Percent, fill = Family)) + stat_summary(fun.y = "mean",
geom = "col", position = position_stack())## Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## ℹ Please use the `fun` argument instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
You went to some trouble getting it the way you wanted, and now a
reviewer wants statistics on which families are different among
treatments. plot_df contains percentages of families by
treatment based on a fuller data set, but is in “long” format, suitable
for ggplot. You can recover the OTU table and make a table
comparing differential abundances using the procedures below. The
function dcast in the package reshaape2 can
convert “long” format into “wide” format, but if there are replications
as we should have, it insists on aggregating the data in some way,
e.g. by length (counts) or means. We can prevent this by adding another
variable, such as a sequence number, with unique values for every row.
Then we can use dcast to get the data in the wide format we
want for our OTU table.
plot_df has the format:
## Treatment Family Percent
## 1 CLR Racocetraceae 4.51388889
## 2 CHR Racocetraceae 1.82291667
## 3 3LR Racocetraceae 8.50694444
## 4 2LR Racocetraceae 1.56250000
## 5 CLR Racocetraceae 1.90972222
## 6 3HR Racocetraceae 0.02893519
To recover our data in wide format, we do:
plot_df$seq <- with(plot_df, ave(Percent, Treatment, Family, FUN = seq_along))
my.data <- dcast(seq + Treatment ~ Family, data = plot_df, value.var = "Percent")
head(my.data)## seq Treatment Racocetraceae Gigasporaceae Glomeraceae unclass_Glomeromycota
## 1 1 2HR 0.00000000 0.6655093 3.47222222 1.15740741
## 2 1 3HR 0.02893519 0.4340278 0.34722222 0.34722222
## 3 1 CHR 1.82291667 2.3148148 0.31828704 0.34722222
## 4 1 2LR 1.56250000 2.0543981 0.08680556 0.00000000
## 5 1 3LR 8.50694444 19.4733796 0.11574074 0.02893519
## 6 1 CLR 4.51388889 1.0995370 0.05787037 0.26041667
## Scutellosporaceae Others
## 1 0.3182870 0.00000000
## 2 0.0000000 0.00000000
## 3 0.2314815 0.00000000
## 4 1.4467593 0.08680556
## 5 2.9513889 0.05787037
## 6 5.6423611 0.26041667
Here the ave function applies seq_along
over all level combinations of Percent,
Treatment, and Family. The result is that all
values of seq + Treatment are unique, and thus
dcast returns what we want without aggregating the data. We
next save the Treatment column as a vector of our treatment
groups and then remove the first two columns of my.data to
get a matrix of the OTU table as percentages.
Next we transform the percents matrix using the
sqrt_arc_sine function in this package.
Both my.pc and my.pc.trans have taxa in
columns. For the remaining steps we need taxa to be in rows, so we have
to transpose.
With these two matrices and the vector of groups, we are ready to make our comparison table.
temp3 <- comp_means_sd(my.pc)
temp4 <- comp_make_f_tests(my.pc.trans, grps = my.grps, var.equal = TRUE)
temp5 <- comp_comparisons(otu.pc = my.pc, otu.pc.trans = my.pc.trans, grps = my.grps,
p.adjust.method = "BH", pool.sd = TRUE)
comp <- comp_assemble(temp3, temp4, temp5)
comp <- comp[order(comp$mean, decreasing = TRUE), ]
comp## mean sd F_value 2HR 3HR
## Gigasporaceae 7.41 12.69 9.88*** 1.38+/-2.08a 0.14+/-0.2a
## Scutellosporaceae 4.77 6.73 12.56*** 1.19+/-1.14ab 0.01+/-0.02c
## Racocetraceae 4.18 5.51 6.94** 1.47+/-2.24ab 0.12+/-0.1a
## Glomeraceae 1.18 1.53 1.76 2.53+/-1.24a 0.34+/-0.12a
## unclass_Glomeromycota 0.71 1.06 0.2 0.46+/-0.5a 0.4+/-0.21a
## Others 0.17 0.27 10.89*** 0.03+/-0.02ab 0+/-0c
## CHR 2LR 3LR CLR
## Gigasporaceae 0.59+/-1.15a 28.22+/-17.98b 11.01+/-6.6bc 1.72+/-1.43ac
## Scutellosporaceae 0.2+/-0.1ac 11.65+/-7.25d 8.69+/-8.42d 7.55+/-7.84bd
## Racocetraceae 0.94+/-0.93ab 6.63+/-3.84c 11.96+/-8.09c 3.89+/-1.75bc
## Glomeraceae 0.27+/-0.07a 0.95+/-0.9a 2.2+/-2.79a 0.66+/-1.06a
## unclass_Glomeromycota 0.77+/-0.94a 1.4+/-2.08a 0.85+/-1.25a 0.3+/-0.32a
## Others 0.01+/-0.03ac 0.28+/-0.18d 0.51+/-0.48d 0.18+/-0.13bd