Nephele runs the DADA2 R package v1.18 following the steps in the package authors’ DADA2 ITS workflow and Big Data workflow. We make some minor modifications of the parameters used. Our pipeline is outlined below. If you are new to DADA2, it might be helpful to read through the DADA2 Tutorial and DADA2 ITS tutorial.
ACCTGCGGARGGATCA– BITS3 primer).
GAGATCCRTTGYTRAAAGTT– B58S3 primer).
The primers above are specific to ITS1 region (3). Alternatively, you can use primers specific to ITS2 region (4):
GCATCGATGAAGAACGCAGC– ITS3 (forward)
TCCTCCGCTTATTGATATGC– ITS4 (reverse)
For paired-end data only.
Filter ambiguous bases. The presence of ambiguous bases in the
sequencing reads makes accurate mapping of short primer sequences
difficult. This step pre-filters the sequences just to remove those
with ambiguous bases, but perform no other filtering. N-filtered files
are saved in
Identify and remove primers. In the standard 16S workflow, it is
generally possible to remove primers (when included on the reads) via
trimming from left as they only appear at the start of the reads and
have a fixed length. However, the more complex read-through scenarios
that are encountered when sequencing the highly-length-variable ITS
region require the use of external tools. Here we use the
tool for removal of primers from the ITS amplicon sequencing data.
Reads with removed primers are saved in
Plot quality profiles
of forward and reverse reads. These graphs are saved as
Preprocess sequence data with
truncLen parameters can be set by the user. The filtered
*_trim.fastq.gz, are output to the
Learn the error rates with
nBases parameter is set to 1e+08. The error rate
graphs made with
are saved as
errorRate_R2.pdf. The error profiles,
are also saved as a list R binary object in the
For paired-end data, merge the overlapping denoised reads with
minOverlap parameter is set to 12.
maxMismatch are set by the user. The sequence table,
seqtab, containing the final amplicon sequence variants
(ASVs), is saved as an R binary object to the
Classify the remaining ASVs taxonomically with using
minBoot parameter for minimum bootstrap confidence
is set to 80 and
tryRC is set to
final result is saved as a biom file
ITS_PE data, if the
justConcatenate option is checked, species annotation
will only be done
using the forward reads (R1).
The final results are also saved as a tab-separated text file
OTU_table.txt. The final sequence variants used for
taxonomic classification are output as
See Pipeline Steps above for more details on how these files were made.
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA and Holmes SP (2016). “DADA2: High-resolution sample inference from Illumina amplicon data.” Nature Methods, 13, pp. 581-583. doi: 10.1038/nmeth.3869.
Microsoft and Weston S (2017). foreach: Provides Foreach Looping Construct for R. R package version 1.4.4, https://CRAN.R-project.org/package=foreach.
Bakker, MG. A fungal mock community control for amplicon sequencing experiments. Mol Ecol Resour. 2018; 18: 541– 556. doi: https://doi.org/10.1111/1755-0998.12760.
Robinson K, Xiao Y, Johnson TJ, et al. Chicken Intestinal Mycobiome: Initial Characterization and Its Response to Bacitracin Methylene Disalicylate. Applied and Environmental Microbiology. 2020 Jun;86(13). DOI: doi: 10.1128/aem.00304-20.