DADA2 ITS Pipeline

Packages

Nephele runs the DADA2 R package v1.28 following the steps in the package authors’ DADA2 ITS workflow and Big Data workflow. We make some minor modifications of the parameters used. Our pipeline is outlined below. If you are new to DADA2, it might be helpful to read through the DADA2 Tutorial and DADA2 ITS tutorial.

User Options

Primer removal with cutadapt

  • Forward primer: Forward primer sequence which was used to amplify the dataset. (Default: ACCTGCGGARGGATCA – BITS3 primer).
  • Reverse primer: Reverse primer sequence which was used to amplify the dataset. (Default: GAGATCCRTTGYTRAAAGTT – B58S3 primer).

The primers above are specific to ITS1 region (3). Alternatively, you can use primers specific to ITS2 region (4):

  • GCATCGATGAAGAACGCAGC – ITS3 (forward)
  • TCCTCCGCTTATTGATATGC – ITS4 (reverse)

Filter and Trim

  • Truncation quality score: Truncate reads at the first instance of a quality score less than or equal to this value. (Default: 2).
  • Minimum length: Remove reads with length less than this value. It is enforced after trimming and truncation. (Default: 50)
  • Maximum expected errors (maxEE): After truncation, reads with higher than this many “expected errors” will be discarded. Expected errors are calculated from the nominal definition of the quality score \(EE = sum(10^\\frac{-Q}{10})\). (Default: 5).

Merge Pairs

For paired-end data only.

  • Just concatenate: Concatenate paired reads instead of merging. (Default: FALSE)
  • Maximum mismatches: The maximum number of mismatches allowed in the overlap region when merging read pairs. (Default: 0).
  • Trim overhanging sequence: After merging paired end reads, trim sequence which overhangs the start of each read. If amplicons are shorter than read length, we suggest checking this option. (Logical. Default: FALSE).

Analysis

  • Chimera removal: Remove chimeric sequences. If primers are not trimmed (either prior to submission or using the trim left option), then we suggest unchecking this option. (Default: TRUE).
  • Reference database: Reference database to be used for taxonomic assignment. See Databases below.

Pipeline steps

  1. Filter ambiguous bases. The presence of ambiguous bases in the sequencing reads makes accurate mapping of short primer sequences difficult. This step pre-filters the sequences just to remove those with ambiguous bases, but perform no other filtering. N-filtered files are saved in filtN subdirectory.

  2. Identify and remove primers. In the standard 16S workflow, it is generally possible to remove primers (when included on the reads) via trimming from left as they only appear at the start of the reads and have a fixed length. However, the more complex read-through scenarios that are encountered when sequencing the highly-length-variable ITS region require the use of external tools. Here we use the cutadapt tool for removal of primers from the ITS amplicon sequencing data. Reads with removed primers are saved in cutadapt subdirectory.

  3. Plot quality profiles of forward and reverse reads. These graphs are saved as qualityProfile_R1.pdf and qualityProfile_R2.pdf.

  4. Preprocess sequence data with filterAndTrim. The maxEE, truncQ, and truncLen parameters can be set by the user. The filtered sequence files, *_trim.fastq.gz, are output to the filtered_data directory.

  5. Learn the error rates with learnErrors. The nBases parameter is set to 1e+08. The error rate graphs made with plotErrors are saved as errorRate_R1.pdf, errorRate_R2.pdf. The error profiles, err, are also saved as a list R binary object in the intermediate_files directory.

  6. Dereplicate reads with derepFastq and run the dada sequence-variant inference algorithm.

  7. For paired-end data, merge the overlapping denoised reads with mergePairs. The minOverlap parameter is set to 12. trimOverhang, justConcatenate, and maxMismatch are set by the user. The sequence table, seqtab, containing the final amplicon sequence variants (ASVs), is saved as an R binary object to the intermediate_files directory.

  8. Classify the remaining ASVs taxonomically with using assignTaxonomy. The minBoot parameter for minimum bootstrap confidence is set to 80 and tryRC is set to TRUE. This final result is saved as a biom file taxa.biom. For ITS_PE data, if the mergePairs justConcatenate option is checked, species annotation will only be done using the forward reads (R1).

  9. The final results are also saved as a tab-separated text file OTU_table.txt. The final sequence variants used for taxonomic classification are output as seq.fasta.

Output Files

See Pipeline Steps above for more details on how these files were made.

  • OTU_table.txt: tab-separated text file of ASV counts and taxonomic assignment
  • seq.fasta: FASTA file of amplicon sequence variants
  • taxa.biom: taxonomic assignment at the genus or species level depending on choice of database or method of assignment in BIOM V1 format
  • taxonomy_table.txt: tab-separated taxonomy file suitable for importing into QIIME2
  • errorRate_R1/2.pdf: error profile plots
  • qualityProfile_R1/2.pdf: quality profile plots
  • filtered_data: trimmed sequence files
  • intermediate_files: intermediate files produced by the pipeline; useful for debugging
  • graphs: output of the visualization pipeline

Tools and References

Pipeline

  1. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA and Holmes SP (2016). “DADA2: High-resolution sample inference from Illumina amplicon data.” Nature Methods, 13, pp. 581-583. doi: 10.1038/nmeth.3869.

  2. Microsoft and Weston S (2017). foreach: Provides Foreach Looping Construct for R. R package version 1.4.4, https://CRAN.R-project.org/package=foreach.

Primers

  1. Bakker, MG. A fungal mock community control for amplicon sequencing experiments. Mol Ecol Resour. 2018; 18: 541– 556. doi: https://doi.org/10.1111/1755-0998.12760.

  2. Robinson K, Xiao Y, Johnson TJ, et al. Chicken Intestinal Mycobiome: Initial Characterization and Its Response to Bacitracin Methylene Disalicylate. Applied and Environmental Microbiology. 2020 Jun;86(13). DOI: doi: 10.1128/aem.00304-20.

Databases

  1. Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2021): UNITE general FASTA release for Fungi. Version 10.05.2021. UNITE Community. https://doi.org/10.15156/BIO/1280049