Nephele runs the DADA2 R package v1.28 following the steps in the package authors’ DADA2 ITS workflow and Big Data workflow. We make some minor modifications of the parameters used. Our pipeline is outlined below. If you are new to DADA2, it might be helpful to read through the DADA2 Tutorial and DADA2 ITS tutorial.
cutadapt
ACCTGCGGARGGATCA
–
BITS3 primer).
GAGATCCRTTGYTRAAAGTT
– B58S3 primer).
The primers above are specific to ITS1 region (3). Alternatively, you can use primers specific to ITS2 region (4):
GCATCGATGAAGAACGCAGC
– ITS3 (forward)TCCTCCGCTTATTGATATGC
– ITS4 (reverse)2
).
50
)
5
).
For paired-end data only.
0
).
FALSE
).
TRUE
).
Filter ambiguous bases. The presence of ambiguous bases in the
sequencing reads makes accurate mapping of short primer sequences
difficult. This step pre-filters the sequences just to remove those
with ambiguous bases, but perform no other filtering. N-filtered files
are saved in filtN
subdirectory.
Identify and remove primers. In the standard 16S workflow, it is
generally possible to remove primers (when included on the reads) via
trimming from left as they only appear at the start of the reads and
have a fixed length. However, the more complex read-through scenarios
that are encountered when sequencing the highly-length-variable ITS
region require the use of external tools. Here we use the
cutadapt
tool for removal of primers from the ITS amplicon sequencing data.
Reads with removed primers are saved in
cutadapt
subdirectory.
Plot quality profiles
of forward and reverse reads. These graphs are saved as
qualityProfile_R1.pdf
and
qualityProfile_R2.pdf
.
Preprocess sequence data with
filterAndTrim. The maxEE
,
truncQ
, and
truncLen
parameters can be set by the user. The filtered
sequence files, *_trim.fastq.gz
, are output to the
filtered_data
directory.
Learn the error rates with
learnErrors. The nBases
parameter is
set to 1e+08. The error rate
graphs made with
plotErrors
are saved as errorRate_R1.pdf
,
errorRate_R2.pdf
. The error profiles, err
,
are also saved as a list R binary object in the
intermediate_files
directory.
Dereplicate reads with derepFastq and run the dada sequence-variant inference algorithm.
For paired-end data, merge the overlapping denoised reads with
mergePairs. The minOverlap
parameter
is set to 12.
trimOverhang
, justConcatenate
, and
maxMismatch
are set by the user. The sequence table,
seqtab
, containing the final amplicon sequence variants
(ASVs), is saved as an R binary object to the
intermediate_files
directory.
Classify the remaining ASVs taxonomically with using
assignTaxonomy
. The
minBoot
parameter for minimum bootstrap confidence
is set to 80 and tryRC
is set to TRUE
. This
final result is saved as a biom file taxa.biom
. For
ITS_PE data, if the
mergePairs
justConcatenate option is checked, species annotation
will only be done
using the forward reads (R1).
The final results are also saved as a tab-separated text file
OTU_table.txt
. The final sequence variants used for
taxonomic classification are output as seq.fasta
.
See Pipeline Steps above for more details on how these files were made.
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA and Holmes SP (2016). “DADA2: High-resolution sample inference from Illumina amplicon data.” Nature Methods, 13, pp. 581-583. doi: 10.1038/nmeth.3869.
Microsoft and Weston S (2017). foreach: Provides Foreach Looping Construct for R. R package version 1.4.4, https://CRAN.R-project.org/package=foreach.
Bakker, MG. A fungal mock community control for amplicon sequencing experiments. Mol Ecol Resour. 2018; 18: 541– 556. doi: https://doi.org/10.1111/1755-0998.12760.
Robinson K, Xiao Y, Johnson TJ, et al. Chicken Intestinal Mycobiome: Initial Characterization and Its Response to Bacitracin Methylene Disalicylate. Applied and Environmental Microbiology. 2020 Jun;86(13). DOI: doi: 10.1128/aem.00304-20.