Downstream Analysis Pipeline: Diversity

The Downstream Analysis pipeline provides sample observation and taxonomic summaries and diversity analyses of an OTU table using QIIME 2.

Input Files and Parameters

  • BIOM File: The biom file contains the OTU and taxonomy tables to be analyzed. This pipeline accepts BIOM V1 or QIIME's BIOM V2 format. All biom files produced by Nephele's amplicon pipelines should work.

  • Mapping File: The mapping file contains the metadata about the samples which will be used in the analysis. The mapping file format is the same as that used by the amplicon pipelines (SE or PE templates). The FASTQ file columns will be ignored. Only samples listed in the mapping file will be included in the pipeline results. So, if you want to analyze only a subset of the samples in your biom file, you can submit the mapping file with only those samples, and the others will be excluded from the analysis.

  • Tree File (Optional): The tree file contains a rooted phylogenetic tree with tip labels that correspond to the OTU names in the BIOM file. The pipeline accepts Newick format (.nwk). OTUs that are in the BIOM file and not in the tree will be filtered from the BIOM file before analysis. The tree file is optional; see "diversity core-metrics-phylogenetic" in Pipeline Steps for additional analyses that will be run if it is uploaded.

  • Sampling depth: The total frequency that each sample should be rarefied to prior to computing diversity metrics. Samples with counts below this value will be removed from the analysis. If you are using the output of a Nephele amplicon pipeline, you may find it useful to consult otu_summary_table.txt which lists the sample counts.

  • Alpha group significance: Run alpha diversity statistical comparisons between groups, and produce alpha diversity plots. This step runs after samples with counts below your sampling depth are filtered from your biom and mapping files. It requires the filtered mapping file's Treatment Group column to contain at least 2 groups, and each group to contain at least 2 samples.

    If you are not sure which samples and groups will remain after the sampling depth filtering, run the job first with this option unchecked, and review the summary.qzv file. If you are using the output of a Nephele amplicon pipeline, you may find it useful to consult otu_summary_table.txt which lists the sample counts.

QIIME 2

Nephele uses the QIIME 2 v2022.2 Artifact API.

Pipeline Steps & Output files

The output .qza and .qzv files can be viewed on QIIME 2's view page. See QIIME 2's information about the output formats and for help with the view page. Where possible, the .qza artifacts are also exported to their native format in directories of the same name. The following plugin methods are used:

  1. biom add-metadata which adds metadata annotations from the mapping file to the input biom file.
    • input_with_annotations.biom
  2. feature-table summarize which produces a summary of the counts along with the sample metadata.
    • summary.qzv: summary visualization
  3. taxa barplot which produces barplots of the taxonomies of the samples with counts above the sampling depth. filter-samples is used to filter these samples.
    • barplot.qzv
  4. diversity core-metrics which applies a set of non-phylogenetic diversity metrics to the count data after rarefaction.
    • rarefied_table.qza, rarefied_table/feature-table.biom: rarefied feature table
    • observed_otus_vector.qza: vector of observed OTU/ASV values by sample
    • shannon_vector.qza: vector of Shannon diversity values by sample
    • evenness_vector.qza: vector of Pielou's evenness values by sample
    • jaccard_distance_matrix.qza, jaccard_pcoa_results.qza, jaccard_emperor.qzv: Jaccard distance matrix and coordinates for the resulting PCoA ordination. The graph is plotted using Emperor.
    • bray_curtis_distance_matrix.qza, bray_curtis_pcoa_results.qza, bray_curtis_emperor.qzv: Bray-Curtis distance matrix, coordinates for the resulting PCoA ordination, and Emperor graph.
    If you upload a tree file, diversity core-metrics-phylogenetic will be run instead which produces the same output as core-metrics and additionally:
    • faith_pd_vector.qza: vector of Faith's phylogenetic diversity values by sample
    • unweighted/weighted_unifrac_distance_matrix.qza, unweighted/weighted_unifrac_pcoa_results.qza, unweighted/weighted_unifrac_emperor.qzv: Unweighted and weighted UniFrac distance matrices, coordinates for the resulting PCoA ordination, and Emperor graph.
  5. diversity alpha-group-significance which does statistical comparisons of the alpha diversity indexes based on the sample metadata groups.
    • alpha_group_significance_evenness.qzv: comparison of Pielou (evenness) index between groups
    • alpha_group_significance_shannon.qzv: comparison of Shannon index between groups

Tools and References

  • Bolyen, Evan, et al. "Reproducible, Interactive, Scalable and Extensible Microbiome Data Science Using QIIME 2." Nature Biotechnology, July 2019, doi:10.1038/s41587-019-0209-9.
  • McDonald, Daniel, et al. "The Biological Observation Matrix (BIOM) Format or: How I Learned to Stop Worrying and Love the Ome-Ome." GigaScience, vol. 1, July 2012, p. 7. BioMed Central, doi:10.1186/2047-217X-1-7.
  • Kruskal, William H., and W. Allen Wallis. "Use of Ranks in One-Criterion Variance Analysis." Journal of the American Statistical Association, vol. 47, no. 260, 1952, pp. 583-621. JSTOR, doi:10.2307/2280779.
  • Vázquez-Baeza, Yoshiki, Antonio Gonzalez, et al. "Bringing the Dynamic Microbiome to Life with Animations." Cell Host & Microbe, vol. 21, no. 1, Jan. 2017, pp. 7-10. ScienceDirect, doi:10.1016/j.chom.2016.12.009.
  • Vázquez-Baeza, Yoshiki, Meg Pirrung, et al. "EMPeror: A Tool for Visualizing High-Throughput Microbial Community Data." GigaScience, vol. 2, no. 1, Dec. 2013. academic.oup.com, doi:10.1186/2047-217X-2-16.
  • Weiss, Sophie, et al. "Normalization and Microbial Differential Abundance Strategies Depend upon Data Characteristics." Microbiome, vol. 5, no. 1, Mar. 2017, p. 27. Crossref, doi:10.1186/s40168-017-0237-y.