5. Alpha Diversity Estimates

Reproducible workflow to assess alpha diversity across temperature treatments using Hill numbers.

Click here for setup information.
knitr::opts_chunk$set(echo = TRUE, eval = FALSE)
set.seed(119)
#library(conflicted)
library(phyloseq); packageVersion("phyloseq")
library(Biostrings); packageVersion("Biostrings")
pacman::p_load(tidyverse, hilldiv, 
               microbiome, phytools, phangorn, 
               pairwiseAdonis, naniar, 
               labdsv, patchwork, agricolae, ggpubr,
               reactable, downloadthis, captioner,
               install = FALSE, update = FALSE)
source("hack_code/div_test_plot_jjs.R")
source("hack_code/div_test_jjs.R")

options(scipen=999)
knitr::opts_current$get(c(
  "cache",
  "cache.path",
  "cache.rebuild",
  "dependson",
  "autodep"
))

Synopsis

This workflow contains diversity assessments for the 2018 high temperature data sets. In order to run the workflow, you either need to first run the DADA2 Workflow and the Data Preparation workflow or download the files linked below. See the Data Availability page for complete details.

Workflow Input

Files needed to run this workflow can be downloaded from figshare.

16s rRNA

To account for presence of rare sequence variants caused by sequencing errors or other technical artifacts, we use Hill numbers (Alberdi and Gilbert 2019a). Hill numbers allow the weight put on rare versus abundant sequence variants to be scaled while providing intuitive comparisons of diversity levels using “effective number of ASVs” as a measuring unit. This approach allows for balancing the over representation of rare ASVs that might be inflated due to sequencing errors.

We will then use Shapiro-Wilk tests to test for normalcy and then, depending on the results, either use parametric ANOVA or non-parametric Kruskal-Wallis to compare alpha diversity among treatments.

Calculate Hill Numbers

To calculate Hill numbers, we use the R package hilldiv (Alberdi and Gilbert 2019b). We calculate three metrics that put more or less weight on common species:

  1. Observed richness, where q-value = 0.
  2. Shannon exponential, which weighs ASVs by their frequency, where q-value = 1.
  3. Simpson multiplicative inverse, which over weighs abundant ASVs, where q-value = 2.

We perform each analysis against the Full (unfiltered) data set as well as the Arbitrary, PERfect and PIME, filtered data sets using the function hill_div.

The command is as follows:

hill_div(count = x, qvalue = i, tree = ultrametric_tree)

where x is the sample by ASV table, i is the q-value corresponding to the metric of interest and tree is an ultrametric formatted phylogenetic tree if you choose to look at lineage, rather than ASV, diversity.

We first transform all the data to relative abundance values, and compute new trees.

Code
ssu18_alpha_ds <- c("ssu18_ps_work", "ssu18_ps_filt", "ssu18_ps_perfect", "ssu18_ps_pime")
for (i in ssu18_alpha_ds) {
  tmp_ps <- transform_sample_counts(get(i), function(otu) otu/sum(otu))
  tmp_ps@phy_tree <- NULL
  tmp_tree <- rtree(ntaxa(tmp_ps), rooted = TRUE,
                      tip.label = taxa_names(tmp_ps))
  tmp_ps_norm <- merge_phyloseq(tmp_ps, sample_data, tmp_tree)
  tmp_asv <- data.frame(t(otu_table(tmp_ps_norm)))
  tmp_ps_name <- purrr::map_chr(i, ~ paste0(., "_norm"))
  assign(tmp_ps_name, tmp_ps_norm)
  tmp_asv_name <- purrr::map_chr(i, ~ paste0(., "_tu"))
  assign(tmp_asv_name, tmp_asv)
  rm(list = ls(pattern = "tmp_"))
}

Next, we run the analysis for all three metrics on the data sets (without a tree).

Code
qvalue <- c(0,1,2)
for (i in qvalue) {
  for (j in ssu18_alpha_ds) {
     tmp_asv <- get(purrr::map_chr(j, ~ paste0(., "_tu")))
     tmp_df <- data.frame(hill_div(tmp_asv, qvalue = i))
     tmp_df <- tmp_df %>% dplyr::rename("tmp_name" = 1) %>%
                              tibble::rownames_to_column("SamName")
     tmp_name <- purrr::map_chr(j, ~ paste0(., "_h", i))
     print(tmp_name)
     assign(tmp_name, tmp_df)
     rm(list = ls(pattern = "tmp_"))
  }
}
objects(pattern = "_h")

And make summary tables to add back into each ps object.

Code
for (i in ssu18_alpha_ds) {
     tmp_obs <- get(purrr::map_chr(i, ~ paste0(., "_h0")))
     tmp_sha <- get(purrr::map_chr(i, ~ paste0(., "_h1")))
     tmp_sim <- get(purrr::map_chr(i, ~ paste0(., "_h2")))
     tmp_hill <- dplyr::left_join(tmp_obs, tmp_sha, by = "SamName") %>%
       dplyr::left_join(., tmp_sim, by = "SamName") %>%
       dplyr::rename("Observed" = 2, "Shannon_exp" = 3, "InvSimpson" = 4)
     tmp_name <- purrr::map_chr(i, ~ paste0(., "_hill"))
     assign(tmp_name, tmp_hill)
     rm(list = ls(pattern = "tmp_"))
}
objects(pattern = "_hill")
objects()

And then create the new objects with the diversity data.

Code
for (i in ssu18_alpha_ds) {
     tmp_ps <- get(i)
     tmp_tree <- rtree(ntaxa(tmp_ps), rooted = TRUE,
                      tip.label = taxa_names(tmp_ps))
     tmp_ps <- merge_phyloseq(tmp_ps, sample_data, tmp_tree)
     tmp_hill <- get(purrr::map_chr(i, ~ paste0(., "_hill")))
     tmp_hill_samp <- dplyr::left_join(data.frame(sample_data(tmp_ps)),
                                         tmp_hill, by = "SamName")
     tmp_hill_samp$ID <- tmp_hill_samp$SamName
     tmp_hill_samp <- tmp_hill_samp %>%  tibble::column_to_rownames("ID")
     tmp_ps2 <- merge_phyloseq(otu_table(tmp_ps),
                                sample_data(tmp_hill_samp),
                                tax_table(tmp_ps),
                                phy_tree(tmp_ps))
     assign(i, tmp_ps2)
     tmp_path <- file.path("files/alpha/rdata/")
     saveRDS(tmp_ps2, paste(tmp_path, i, ".rds", sep = ""))
     rm(list = ls(pattern = "tmp_"))
}
objects()

Hill Numbers Summary

Now we summarize the data for each sample against all three metrics. The table contains the results of ASV diversity estimates from the full data set and the three filtered data sets.

The suffix _fi indicates metrics for the Arbitrary data set. The suffix _pe indicates metrics for the PERfect data set and the suffix _pet indicates the lineage diversity for the PERfect data set.

The suffix _pi indicates metrics for the PIME data set and the suffix _pit indicates the lineage diversity for the PIME data set.

Hill Summary

(16S rRNA) Table 1 | Hill numbers for samples in the unfiltered & filtered data sets.

Normality Tests

Before running significance tests, we need to know if data is normally distributed, which will tell use whether to use a parametric or non-parametric test. To test if the data are normally distributed, we use the Shapiro-Wilk Normality test and the Bartlett Test of Homogeneity of Variances.

If the p-values are both not significant (p > 0.05) from the tests, we accept the null hypothesis (that the results are normally distributed) and test for significance between samples using an ANOVA. If the p-values are both significant (p < 0.05), we reject the null hypothesis (that the results are normally distributed) and test for significance between samples using Kruskal-Wallis (non-parametric equivalent of ANOVA).

The commands are as follows:

shapiro.test(x), where x is a numeric vector of alpha diversity values from the sample data table.

bartlett.test(Value ~ Group, data = df) Where Value is the metric of interest, Group in the treatment to compare, and df is the data frame.

First the Shapiro-Wilk Normality test.

Code
ssu_tab_alpha_div
ssu18_div_tab <- ssu_tab_alpha_div
ssu18_shap_tests_asv <- c()
for (i in colnames(ssu18_div_tab[,7:ncol(ssu18_div_tab)])) {
   tmp_name <- purrr::map_chr(i, ~ paste0("ssu18_shap_asv_", .))
   ssu18_shap_tests_asv <- append(ssu18_shap_tests_asv, tmp_name)
   tmp_test <- eval(shapiro.test(ssu18_div_tab[[i]]))
   tmp_test$data.name <- tmp_name
   assign(tmp_name, tmp_test)
   rm(list = ls(pattern = "tmp_"))
}

objects(pattern = "shap_")

And then the Bartlett Test of Homogeneity of Variances.

Code
ssu18_div_tab_asv <- ssu_tab_alpha_div
ssu18_bart_tests_asv <- c()
for (i in colnames(ssu18_div_tab_asv[,7:ncol(ssu18_div_tab_asv)])) {
   tmp_name <- purrr::map_chr(i, ~ paste0("ssu18_bart_asv_", .))
   ssu18_bart_tests_asv <- append(ssu18_bart_tests_asv, tmp_name)
   tmp_test <- eval(bartlett.test(ssu18_div_tab_asv[[i]] ~ TEMP, data = ssu18_div_tab_asv))
   tmp_test$data.name <- tmp_name
   assign(tmp_name, tmp_test)
   rm(list = ls(pattern = "tmp_"))
}
objects(pattern = "_bart_")

Here we see which Shapiro-Wilk Normality and Bartlett tests were significant and which were not. So wherever the value of both p-values in > 0.05 we can use an ANOVA, otherwise we use Kruskal-Wallis.

Warning: Expected 2 pieces. Missing pieces filled with `NA` in 3 rows [1, 5, 9].

(16S rRNA) Table 2 | Summary of Normality Tests for unfiltered & filtered data sets. P-values in red indicate significance (p-value < 0.05).

Significance Tests

To begin, we need to create a hierarchy variable; a two-column matrix specifying the relationship between samples (first column) and groups (second column).

Code
ssu18_hill_hier <- ssu_tab_alpha_div
ssu18_hill_hier <- ssu18_hill_hier %>% dplyr::select("Sample_ID", "TEMP") %>%
  tibble::remove_rownames()
ssu18_hill_hier <- ssu18_hill_hier[order(ssu18_hill_hier$TEMP), ]
ssu18_hill_hier$TEMP = paste(ssu18_hill_hier$TEMP, 'C', sep='')
ssu18_hill_hier <- ssu18_hill_hier %>% tibble::remove_rownames()
saveRDS(ssu18_hill_hier, "files/alpha/rdata/ssu18_hill_hier.rds")

Again, we start by testing significance of ASV diversity for the data sets against each of the three metrics using the div_test function.

The command is as follows:

div_test(countable = x, qvalue = i, hierarchy = hier, 
         tree = ultrametric_tree, posthoc = TRUE)

where x is ASV by sample table, i is the q-value corresponding to the metric of interest, hier is the hierarchy matrix, tree is an ultrametric formatted phylogenetic tree (if you choose to explore lineage diversity), and posthoc indicates whether to run post hoc pairwise analyses.

Code
qvalue <- c(0,1,2)
for (i in ssu18_alpha_ds) {
     for (j in qvalue) {
         tmp_get <- get(purrr::map_chr(i, ~ paste0(., "_tu")))
         tmp_test <- div_test(tmp_get, qvalue = j,
                      hierarchy = ssu18_hill_hier,
                      posthoc = TRUE)
         tmp_name <- purrr::map_chr(i, ~ paste0(., "_q", j, "_adt"))
         print(tmp_name)
         assign(tmp_name, tmp_test)
         rm(list = ls(pattern = "tmp_"))
 }
}

Summary

(16S rRNA) Table 3 | Summary of significance tests for unfiltered & filtered data sets. P-values in red for post hoc analysis indicate significance (p-value < 0.05).

PostHoc Analyses

First let’s check the results of each posthoc analysis.

[1] "FULL (Observed)"
[1] "Tukey post-hoc test"
        diff       lwr       upr      p.adj
3C-0C -219.6  -956.946 517.74597 0.71329233
8C-0C -721.0 -1458.346  16.34597 0.05546473
8C-3C -501.4 -1238.746 235.94597 0.20658985
[1] "FILT (Observed)"
[1] "Tukey post-hoc test"
        diff       lwr      upr      p.adj
3C-0C   -6.0 -221.6963 209.6963 0.99696904
8C-0C -283.8 -499.4963 -68.1037 0.01109212
8C-3C -277.8 -493.4963 -62.1037 0.01267274
[1] "PIME (Observed)"
[1] "Tukey post-hoc test"
        diff       lwr       upr          p.adj
3C-0C  -46.6 -140.1306   46.9306 0.406733563435
8C-0C -325.6 -419.1306 -232.0694 0.000002206313
8C-3C -279.0 -372.5306 -185.4694 0.000011005557
[1] "PERfect (Observed)"
[1] "Tukey post-hoc test"
        diff       lwr       upr      p.adj
3C-0C    2.4 -191.8227 196.62267 0.99940103
8C-0C -174.2 -368.4227  20.02267 0.08059254
8C-3C -176.6 -370.8227  17.62267 0.07616681
[1] "FULL (Shannon exponential)"
[1] "Tukey post-hoc test"
           diff       lwr        upr         p.adj
3C-0C -246.4375 -442.9939  -49.88101 0.01492642034
8C-0C -511.4071 -707.9635 -314.85066 0.00004295672
8C-3C -264.9696 -461.5261  -68.41321 0.00950353162
[1] "FILT (Shannon exponential)"
[1] "Tukey post-hoc test"
           diff       lwr        upr          p.adj
3C-0C  -96.4627 -177.3128  -15.61258 0.019965252769
8C-0C -283.1154 -363.9656 -202.26533 0.000002072881
8C-3C -186.6527 -267.5029 -105.80262 0.000133670064
[1] "PIME (Shannon exponential)"
[1] "Dunn test with Benjamini-Hochberg correction"
             Z     P.unadj       P.adj
0C-3C 1.767767 0.077099872 0.077099872
0C-8C 3.535534 0.000406952 0.001220856
3C-8C 1.767767 0.077099872 0.115649808
[1] "PERfect (Shannon exponential)"
[1] "Tukey post-hoc test"
            diff       lwr        upr          p.adj
3C-0C  -82.87871 -155.6134  -10.14407 0.025808833615
8C-0C -232.15045 -304.8851 -159.41581 0.000005488596
8C-3C -149.27174 -222.0064  -76.53710 0.000385348747
[1] "FULL (Inverse Simpson)"
[1] "Tukey post-hoc test"
           diff       lwr        upr         p.adj
3C-0C -157.7331 -251.7760  -63.69027 0.00202540187
8C-0C -249.3938 -343.4367 -155.35097 0.00003566326
8C-3C  -91.6607 -185.7036    2.38217 0.05628905499
[1] "FILT (Inverse Simpson)"
[1] "Tukey post-hoc test"
            diff       lwr        upr           p.adj
3C-0C  -84.38551 -126.1609  -42.61016 0.0004422940804
8C-0C -158.66732 -200.4427 -116.89197 0.0000008678593
8C-3C  -74.28181 -116.0572  -32.50646 0.0012804120273
[1] "PIME (Inverse Simpson)"
[1] "Tukey post-hoc test"
            diff        lwr       upr          p.adj
3C-0C  -61.82853  -93.26452 -30.39255 0.000556162049
8C-0C -122.59975 -154.03574 -91.16377 0.000000651479
8C-3C  -60.77122  -92.20720 -29.33524 0.000643758728
[1] "PERfect (Inverse Simpson)"
[1] "Tukey post-hoc test"
            diff       lwr        upr          p.adj
3C-0C  -80.17581 -122.1927  -38.15894 0.000718157610
8C-0C -146.57155 -188.5884 -104.55467 0.000002158578
8C-3C  -66.39574 -108.4126  -24.37887 0.003172202683

Now we can plot the results from the posthoc analyses for each metric and data set using the function div_test_plot_jjs. I modified the original function (div_test_plot) to control a little of the formatting.

The command is as follows:

div_test_plot(divtest = x, chart = "type", colour = col.pal, 
              posthoc = TRUE, threshold = value))

where x is the results from the div_test function, "type" is chart type (box, jitter, or violin), colour is is a color palette, posthoc indicates whether to run posthoc pairwise analyses, and value is the maximum p-value to show in pairwise posthoc results. WARNING if none of the posthoc results are below the specified threshold, the function will throw an error. Therefore, until this is fixed, all posthoc values are shown.

Code
source("hack_code/div_test_plot_jjs.R")
rm(list = ls(pattern = "_adt_plot"))
for (i in objects(pattern = "_adt")) {
     tmp_name <- purrr::map_chr(i, ~ paste0(., "_plot"))
     tmp_get <- get(i)
     tmp_df <- div_test_plot_jjs(tmp_get, chart = "box",
                                 colour = swel_col, posthoc = TRUE)
     tmp_df <- ggpar(tmp_df, legend = "none")
     print(tmp_name)
     assign(tmp_name, tmp_df)
     rm(list = ls(pattern = "tmp_"))
}

Alpha Diversity Plots

Posthoc adjusted p-values given for each pairwise comparison.

(16S rRNA) Figure 1 | Alpha diversity plots. Top row = Observed; middle row = Shannon exponential; bottom row = Inverse Simpson. From left to right, Full (unfiltered) data, Arbitrary filtered, PERfect filtered, PIME filtered.

ITS

To account for presence of rare sequence variants caused by sequencing errors or other technical artifacts, we use Hill numbers (Alberdi and Gilbert 2019a). Hill numbers allow the weight put on rare versus abundant sequence variants to be scaled while providing intuitive comparisons of diversity levels using “effective number of ASVs” as a measuring unit. This approach allows for balancing the over representation of rare ASVs that might be inflated due to sequencing errors.

We will then use Shapiro-Wilk tests to test for normalcy and then, depending on the results, either use parametric ANOVA or non-parametric Kruskal-Wallis to compare alpha diversity among treatments.

Calculate Hill Numbers

To calculate Hill numbers, we use the R package hilldiv (Alberdi and Gilbert 2019b). We calculate three metrics that put more or less weight on common species:

  1. Observed richness, where q-value = 0.
  2. Shannon exponential, which weighs ASVs by their frequency, where q-value = 1.
  3. Simpson multiplicative inverse, which over weighs abundant ASVs, where q-value = 2.

We perform each analysis against the Full (unfiltered) data set as well as the Arbitrary, PERfect and PIME, filtered data sets using the function hill_div.

The command is as follows:

hill_div(count = x, qvalue = i, tree = ultrametric_tree)

where x is the sample by ASV table, i is the q-value corresponding to the metric of interest and tree is an ultrametric formatted phylogenetic tree, however this really doesn’t apply for ITS data anyway.

We first transform all the data to relative abundance values, and compute new trees.

Code
its18_alpha_ds <- c("its18_ps_work", "its18_ps_filt", "its18_ps_perfect", "its18_ps_pime")
for (i in its18_alpha_ds) {
  tmp_ps <- transform_sample_counts(get(i), function(otu) otu/sum(otu))
  tmp_ps@phy_tree <- NULL
  tmp_tree <- rtree(ntaxa(tmp_ps), rooted = TRUE,
                      tip.label = taxa_names(tmp_ps))
  tmp_ps_norm <- merge_phyloseq(tmp_ps, sample_data, tmp_tree)
  tmp_asv <- data.frame(t(otu_table(tmp_ps_norm)))
  tmp_ps_name <- purrr::map_chr(i, ~ paste0(., "_norm"))
  assign(tmp_ps_name, tmp_ps_norm)
  tmp_asv_name <- purrr::map_chr(i, ~ paste0(., "_tu"))
  assign(tmp_asv_name, tmp_asv)
  rm(list = ls(pattern = "tmp_"))
}

Next, we run the analysis for all three metrics on the data sets (without a tree).

Code
qvalue <- c(0,1,2)
for (i in qvalue) {
  for (j in its18_alpha_ds) {
     tmp_asv <- get(purrr::map_chr(j, ~ paste0(., "_tu")))
     tmp_df <- data.frame(hill_div(tmp_asv, qvalue = i))
     tmp_df <- tmp_df %>% dplyr::rename("tmp_name" = 1) %>%
                              tibble::rownames_to_column("SamName")
     tmp_name <- purrr::map_chr(j, ~ paste0(., "_h", i))
     print(tmp_name)
     assign(tmp_name, tmp_df)
     rm(list = ls(pattern = "tmp_"))
  }
}
objects(pattern = "_h")

And make summary tables to add back into each ps object.

Code
for (i in its18_alpha_ds) {
     tmp_obs <- get(purrr::map_chr(i, ~ paste0(., "_h0")))
     tmp_sha <- get(purrr::map_chr(i, ~ paste0(., "_h1")))
     tmp_sim <- get(purrr::map_chr(i, ~ paste0(., "_h2")))
     tmp_hill <- dplyr::left_join(tmp_obs, tmp_sha, by = "SamName") %>%
       dplyr::left_join(., tmp_sim, by = "SamName") %>%
       dplyr::rename("Observed" = 2, "Shannon_exp" = 3, "InvSimpson" = 4)
     tmp_name <- purrr::map_chr(i, ~ paste0(., "_hill"))
     assign(tmp_name, tmp_hill)
     rm(list = ls(pattern = "tmp_"))
}
objects(pattern = "_hill")
objects()

And then create the new objects with the diversity data.

Code
for (i in its18_alpha_ds) {
     tmp_ps <- get(i)
     tmp_tree <- rtree(ntaxa(tmp_ps), rooted = TRUE,
                      tip.label = taxa_names(tmp_ps))
     tmp_ps <- merge_phyloseq(tmp_ps, sample_data, tmp_tree)
     tmp_hill <- get(purrr::map_chr(i, ~ paste0(., "_hill")))
     tmp_hill_samp <- dplyr::left_join(data.frame(sample_data(tmp_ps)),
                                         tmp_hill, by = "SamName")
     tmp_hill_samp$ID <- tmp_hill_samp$SamName
     tmp_hill_samp <- tmp_hill_samp %>%  tibble::column_to_rownames("ID")
     tmp_ps2 <- merge_phyloseq(otu_table(tmp_ps),
                                sample_data(tmp_hill_samp),
                                tax_table(tmp_ps),
                                phy_tree(tmp_ps))
     assign(i, tmp_ps2)
     tmp_path <- file.path("files/alpha/rdata/")
     saveRDS(tmp_ps2, paste(tmp_path, i, ".rds", sep = ""))
     rm(list = ls(pattern = "tmp_"))
}
objects()

Hill Numbers Summary

Now we summarize the data for each sample against all three metrics. The table contains the results of ASV diversity estimates from the full data set and the three filtered data sets.

The suffix _fi indicates metrics for the Arbitrary data set. The suffix _pe indicates metrics for the PERfect data set and the suffix _pet indicates the lineage diversity for the PERfect data set.

The suffix _pi indicates metrics for the PIME data set and the suffix _pit indicates the lineage diversity for the PIME data set.

Hill Summary

(ITS) Table 1 | Hill numbers for samples in the unfiltered & filtered data sets.

Normality Tests

Before running significance tests, we need to know if data is normally distributed, which will tell use whether to use a parametric or non-parametric test. To test if the data are normally distributed, we use the Shapiro-Wilk Normality test and the Bartlett Test of Homogeneity of Variances.

If the p-values are both not significant (p > 0.05) from the tests, we accept the null hypothesis (that the results are normally distributed) and test for significance between samples using an ANOVA. If the p-values are both significant (p < 0.05), we reject the null hypothesis (that the results are normally distributed) and test for significance between samples using Kruskal-Wallis (non-parametric equivalent of ANOVA).

The commands are as follows:

shapiro.test(x), where x is a numeric vector of alpha diversity values from the sample data table.

bartlett.test(Value ~ Group, data = df) Where Value is the metric of interest, Group in the treatment to compare, and df is the data frame.

First the Shapiro-Wilk Normality test.

Code
its_tab_alpha_div
its18_div_tab <- its_tab_alpha_div
its18_shap_tests_asv <- c()
for (i in colnames(its18_div_tab[,7:ncol(its18_div_tab)])) {
   tmp_name <- purrr::map_chr(i, ~ paste0("its18_shap_asv_", .))
   its18_shap_tests_asv <- append(its18_shap_tests_asv, tmp_name)
   tmp_test <- eval(shapiro.test(its18_div_tab[[i]]))
   tmp_test$data.name <- tmp_name
   assign(tmp_name, tmp_test)
   rm(list = ls(pattern = "tmp_"))
}

objects(pattern = "shap_")

And then the Bartlett Test of Homogeneity of Variances.

Code
its18_div_tab_asv <- its_tab_alpha_div
its18_bart_tests_asv <- c()
for (i in colnames(its18_div_tab_asv[,7:ncol(its18_div_tab_asv)])) {
   tmp_name <- purrr::map_chr(i, ~ paste0("its18_bart_asv_", .))
   its18_bart_tests_asv <- append(its18_bart_tests_asv, tmp_name)
   tmp_test <- eval(bartlett.test(its18_div_tab_asv[[i]] ~ TEMP, data = its18_div_tab_asv))
   tmp_test$data.name <- tmp_name
   assign(tmp_name, tmp_test)
   rm(list = ls(pattern = "tmp_"))
}
objects(pattern = "_bart_")

Here we see which Shapiro-Wilk Normality and Bartlett tests were significant and which were not. So wherever the value of both p-values in > 0.05 we can use an ANOVA, otherwise we use Kruskal-Wallis.

Warning: Expected 2 pieces. Missing pieces filled with `NA` in 3 rows [1, 5, 9].

(ITS) Table 2 | Summary of Normality Tests for unfiltered & filtered data sets. P-values in red indicate significance (p-value < 0.05).

Significance Tests

To begin, we need to create a hierarchy variable; a two-column matrix specifying the relationship between samples (first column) and groups (second column).

Code
its18_hill_hier <- its_tab_alpha_div
its18_hill_hier <- its18_hill_hier %>% dplyr::select("Sample_ID", "TEMP") %>%
  tibble::remove_rownames()
its18_hill_hier <- its18_hill_hier[order(its18_hill_hier$TEMP), ]
its18_hill_hier$TEMP = paste(its18_hill_hier$TEMP, 'C', sep='')
its18_hill_hier <- its18_hill_hier %>% tibble::remove_rownames()
saveRDS(its18_hill_hier, "files/alpha/rdata/its18_hill_hier.rds")

Again, we start by testing significance of ASV diversity for the data sets against each of the three metrics using the div_test function.

The command is as follows:

div_test(countable = x, qvalue = i, hierarchy = hier, 
         tree = ultrametric_tree, posthoc = TRUE)

where x is ASV by sample table, i is the q-value corresponding to the metric of interest, hier is the hierarchy matrix, tree is an ultrametric formatted phylogenetic tree (if you choose to explore lineage diversity), and posthoc indicates whether to run post hoc pairwise analyses.

Code
qvalue <- c(0,1,2)
for (i in its18_alpha_ds) {
     for (j in qvalue) {
         tmp_get <- get(purrr::map_chr(i, ~ paste0(., "_tu")))
         tmp_test <- div_test(tmp_get, qvalue = j,
                      hierarchy = its18_hill_hier,
                      posthoc = TRUE)
         tmp_name <- purrr::map_chr(i, ~ paste0(., "_q", j, "_adt"))
         print(tmp_name)
         assign(tmp_name, tmp_test)
         rm(list = ls(pattern = "tmp_"))
 }
}

Summary

(ITS) Table 3 | Summary of significance tests for unfiltered & filtered data sets. P-values in red for post hoc analysis indicate significance (p-value < 0.05).

PostHoc Analyses

First let’s check the results of each posthoc analysis.

[1] "FULL (Observed)"
[1] "Tukey post-hoc test"
         diff       lwr        upr       p.adj
3C-0C  -68.70 -337.2489  199.84893 0.768173881
8C-0C -394.45 -662.9989 -125.90107 0.006201167
8C-3C -325.75 -608.8254  -42.67457 0.025372568
[1] "FILT (Observed)"
[1] "Tukey post-hoc test"
         diff       lwr       upr       p.adj
3C-0C   11.00 -101.5783 123.57828 0.961373753
8C-0C -173.25 -285.8283 -60.67172 0.004588355
8C-3C -184.25 -302.9179 -65.58208 0.004327805
[1] "PIME (Observed)"
[1] "Tukey post-hoc test"
        diff        lwr        upr          p.adj
3C-0C  -28.3  -59.02587   2.425871 0.071121744588
8C-0C -107.8 -138.52587 -77.074129 0.000006170051
8C-3C  -79.5 -111.88791 -47.112088 0.000138893963
[1] "PERfect (Observed)"
[1] "Tukey post-hoc test"
        diff       lwr       upr       p.adj
3C-0C  -5.35 -38.46046  27.76046 0.898558605
8C-0C -60.60 -93.71046 -27.48954 0.001378091
8C-3C -55.25 -90.15149 -20.34851 0.003804791
[1] "FULL (Shannon exponential)"
[1] "Tukey post-hoc test"
           diff        lwr       upr     p.adj
3C-0C  68.52857  -31.79969 168.85684 0.1967805
8C-0C  34.59038  -65.73789 134.91864 0.6257554
8C-3C -33.93820 -139.69348  71.81708 0.6644870
[1] "FILT (Shannon exponential)"
[1] "Tukey post-hoc test"
           diff       lwr      upr     p.adj
3C-0C  42.89239 -10.62278 96.40756 0.1199483
8C-0C  18.50838 -35.00679 72.02355 0.6240029
8C-3C -24.38401 -80.79395 32.02593 0.4878559
[1] "PIME (Shannon exponential)"
[1] "Tukey post-hoc test"
             diff       lwr      upr     p.adj
3C-0C  15.1283988 -14.51712 44.77392 0.3777877
8C-0C   0.5613833 -29.08414 30.20690 0.9985157
8C-3C -14.5670155 -45.81614 16.68211 0.4381419
[1] "PERfect (Shannon exponential)"
[1] "Tukey post-hoc test"
             diff        lwr       upr     p.adj
3C-0C  13.7449005  -7.058747 34.548549 0.2152611
8C-0C   0.3995566 -20.404091 21.203205 0.9984731
8C-3C -13.3453440 -35.274314  8.583626 0.2639185
[1] "FULL (Inverse Simpson)"
[1] "Tukey post-hoc test"
           diff       lwr      upr     p.adj
3C-0C  30.12853 -10.93247 71.18953 0.1597780
8C-0C  15.09562 -25.96538 56.15662 0.5888250
8C-3C -15.03291 -58.31500 28.24918 0.6216188
[1] "FILT (Inverse Simpson)"
[1] "Tukey post-hoc test"
            diff       lwr      upr     p.adj
3C-0C  19.474580  -7.08637 46.03553 0.1601425
8C-0C   8.362193 -18.19876 34.92314 0.6743506
8C-3C -11.112388 -39.11009 16.88531 0.5422962
[1] "PIME (Inverse Simpson)"
[1] "Tukey post-hoc test"
            diff        lwr       upr     p.adj
3C-0C  8.4231399  -7.904659 24.750938 0.3705643
8C-0C  0.3894783 -15.938320 16.717277 0.9976460
8C-3C -8.0336616 -25.244672  9.177349 0.4372626
[1] "PERfect (Inverse Simpson)"
[1] "Tukey post-hoc test"
           diff        lwr       upr     p.adj
3C-0C  9.662568  -4.771842 24.096979 0.2078889
8C-0C  1.449038 -12.985373 15.883449 0.9592809
8C-3C -8.213530 -23.428735  7.001674 0.3405328

Now we can plot the results from the posthoc analyses for each metric and data set using the function div_test_plot_jjs. I modified the original function (div_test_plot) to control a little of the formatting.

The command is as follows:

div_test_plot(divtest = x, chart = "type", colour = col.pal, 
              posthoc = TRUE, threshold = value))

where x is the results from the div_test function, "type" is chart type (box, jitter, or violin), colour is is a color palette, posthoc indicates whether to run posthoc pairwise analyses, and value is the maximum p-value to show in pairwise posthoc results. WARNING if none of the posthoc results are below the specified threshold, the function will throw an error. Therefore, until this is fixed, all posthoc values are shown.

Code
source("hack_code/div_test_plot_jjs.R")
rm(list = ls(pattern = "_adt_plot"))
for (i in objects(pattern = "_adt")) {
     tmp_name <- purrr::map_chr(i, ~ paste0(., "_plot"))
     tmp_get <- get(i)
     tmp_df <- div_test_plot_jjs(tmp_get, chart = "box",
                                 colour = swel_col, posthoc = TRUE)
     tmp_df <- ggpar(tmp_df, legend = "none")
     print(tmp_name)
     assign(tmp_name, tmp_df)
     rm(list = ls(pattern = "tmp_"))
}
its18_ps_work_q0_adt_plot

Alpha Diversity Plots

Posthoc adjusted p-values given for each pairwise comparison.

(ITS) Figure 1 | Alpha diversity plots. Top row = Observed; middle row = Shannon exponential; bottom row = Inverse Simpson. From left to right, Full (unfiltered) data, Arbitrary filtered, PERfect filtered, PIME filtered.

Workflow Output

Data products generated in this workflow can be downloaded from figshare.

Next workflow:
6. Beta Diversity & Dispersion Estimates
Previous workflow:
4. Taxonomic Diversity


Source Code

The source code for this page can be accessed on GitHub by clicking this link.

Data Availability

Data generated in this workflow and the Rdata need to run the workflow can be accessed on figshare at 10.25573/data.16826779.

Last updated on

[1] "2022-07-05 14:41:14 EST"

References

Alberdi, Antton, and M Thomas P Gilbert. 2019a. “A Guide to the Application of Hill Numbers to DNA-Based Diversity Analyses.” Molecular Ecology Resources 19 (4): 804–17. https://doi.org/10.1111/1755‐0998.13014.
———. 2019b. “Hilldiv: An r Package for the Integral Analysis of Diversity Based on Hill Numbers.” bioRxiv, 545665. https://doi.org/10.1101/545665.