Nephele provides mothur and DADA2 pipelines for amplicon data and the bioBakery pipeline for metagenome shotgun data. In addition, we have added a sequence data quality check pipeline so that you can inspect and control for your data quality before analysis.
We also like learning about new and different pipelines that could better serve your research and educational
needs. If you have a suggestion of a tool or analysis for Nephele, please fill out this form.
*Note: your request will be considered in planning new features, but it doesn’t guarantee
implementation.
Nephele v2 supports demultiplexed paired-end, single-end, and Nanopore FASTQ files. In addition, BIOM, FASTA, and BAM files are used in certain pipelines. Please see the supported data type per pipeline below.
Short Read QC | Nanopore QC | QIIME 2 | mothur v1.46.1 | DADA2 v1.18 | DADA2 ITS v1.18 | bioBakery | SARS-CoV-2 | Downstream Analysis: Diversity (amplicon only) | Metagenomics Inference | DiscoVir | |
---|---|---|---|---|---|---|---|---|---|---|---|
Paired End FASTQ |
|
|
|
|
|
|
|
||||
Single End FASTQ |
|
|
|
|
|
|
|||||
Nanopore FASTQ |
|
||||||||||
BIOM |
Mapping file from your 16S or ITS pipeline output
|
Mapping file from your 16S or ITS pipeline output
|
|||||||||
FASTA |
Reads file from ASV or representative sequence frorm OTUs
|
||||||||||
FASTA + BAM |
|
If you need help figuring out what type of data you have, please see the FAQ.
Nephele provides a pre-processing quality check pipeline for demultiplexed paired-end and single-end short read FASTQ files. Please see this FAQ on why you may want to run QC pipeline before you run a microbiome analysis. The Nephele Short Read QC pipeline can run a quality control check (FastQC), Trim primers and/or adapters, Trim and/or Filter reads based on quality scores, Merge read pairs, and provides summary graphs of the QC steps.
The features list below is a summary of the Nephele QC pipeline workflow.
Pipeline Features | Pre-processing Quality Check |
---|---|
FastQC sequence quality check |
Always run
|
Trim primers and/or adapters |
Run if selected
|
Trim reads based on quality scores |
Run if selected
|
Filter reads based on quality scores |
Run if selected
|
Merge read pairs |
Run if selected
|
Summary graphs of QC steps |
Always run
|
* Please note that Trim primers and/or adapters, Trim reads based on quality scores,
Filter reads based on quality scores, Merge read pairs options will be executed ONLY IF the option is selected.
**Merge read pairs is Paired End only.
More information about the pipeline, including output files and user options, can be found on the Pre-processing QC Pipeline help page
You can use our 16S example files to try out this pipeline.
Nephele provides a pre-processing quality check pipeline for Oxford Nanopore Technology (Nanopore) long read sequences. The Nephele Nanopore QC pipeline runs a quality control check (NanoPlot), trims known or unknown primers and/or adapters (Porechop_ABI), trims and/or filters reads (nanoq), and provides summary graphs of the QC steps.
The features list below is a summary of the Nephele Nanopore QC pipeline workflow.
Pipeline Features | Pre-processing Quality Check Nanopore Long Read |
---|---|
NanoPlot sequence quality check |
Always run
|
Trim barcodes and/or adapters |
Run if selected
|
Trim reads based on quality scores |
Run if selected
|
Trim fixed length off ends of reads |
Run if selected
|
Filter reads based on quality scores |
Run if selected
|
Filter based on length |
Run if selected
|
NanoPlot sequence quality after QC |
Always run
|
*Please note that adapter trimming and additional read trimming and filtering will only be executed if the options are selected
You can use our Nanopore QC example files to try out this pipeline.
Nephele is currently running QIIME2 v2022.2, mothur v1.46.1. and DADA2 v1.18. You can use any of these pipelines to run 16S analysis. For ITS, DADA2 ITS is available.
Amplicon | QIIME2 v2022.2 | mothur v1.46.1 | DADA2 v1.18 | DADA2 ITS v1.18 |
---|---|---|---|---|
16S | ||||
ITS |
Each pipeline has different features and steps. Please see the table below to see the different features each 16S pipeline provides in Nephele.
Pipeline Features | mothur | QIIME2 | DADA2 | DADA2 ITS |
---|---|---|---|---|
Join forward and reverse short reads as contigs | ||||
Screen contigs to reduce sequencing errors | ||||
Denoising with Deblur | ||||
Dereplicate contig sequences | ||||
Taxonomic assignment based on selected database | ||||
Remove sequences likely due to sequencing errors | ||||
Identify and remove chimeric sequences | ||||
Classify sequences based on k-nearest neighbor | ||||
Classify sequences based on vsearch and sklearn | ||||
Remove sequences belonging to undesirable lineages | ||||
Remove rare OTUs in the samples | ||||
Detect differentially abundant features in samples | ||||
Construct phylogenetic tree | ||||
Calculate various measures of diversity | ||||
Calculate various measures of diversity for rarefaction curve only | ||||
Ion Torrent Processing - Beta | ||||
Create biom file that can be used in MicrobiomeDB | ||||
Create biom file that can be used in Downstream Analysis pipeline |
More detailed information about each step can be found on the individual pipeline help pages:
Follow our tutorial to try it out.
Nephele2 is currently running 2 shotgun metagenomics pipelines:
The pipeline and modules of your choice will depend on:
Pipeline Features | Pipeline Name | ||
---|---|---|---|
Module | Steps | bioBakery | WGSA2 |
Data processing | Accomodation of single-end (SE) data | kneadData | |
Trimming, Filtering & Error-correction | fastp | ||
Decontamination against host genome of choice | kraken2 | ||
Read-based taxonomic profiling | Taxonomic Assignments & composition assessment | MetaPhlAn3 + StrainPhlAn3 | kraken2 |
Visualizations of taxonomic community composition | bioBakery/R | KronaTools | |
Read-based functional profiling | Functional Annotation | HUMAnN3 | |
Pathway Inference based on functional annotations | |||
Visualizations of functional community composition | bioBakery/R | ||
Read assembly and mapping | Assembly to scaffolds | metaSpades | |
Read mapping and alignment processing | samtools | ||
Gene-based functional profiling | Gene Prediction & abundance scoring | Prodigal + verse | |
Functional Annotation | eggNog-mapper2 | ||
Pathway Inference based on functional annotations | MinPath | ||
Visualizations of functional community composition | KronaTools | ||
Gene-based taxonomic profiling (user elective) | Taxonomic Assignments of predicted genes | kraken2 | |
Visualizations of gene-based taxonomic composition | KronaTools | ||
MAG-based taxonomic profiling (user elective) | Scaffold binning into draft genomes (MAGs) | metabat2 | |
Bin quality, abundance & taxonomic assessments | CheckM | ||
Visualizations of taxonomic community composition | KronaTools | ||
Dataset-wide community stats & visualizations | Collective TAX & FUNC abundance matrix | bioBakery/R | WGSA2/R |
Collective TAX & FUNC profile heatmap | |||
Collective Alpha diveristy visualization plot | |||
Collective Beta diverisy ordination (PCoA) plot | |||
Functional feature vs sequencing depth plot | |||
Biom file creation (for MicrobiomeDB) | WGSA2/R |
Follow our tutorial to try it out.
You can use the biom and tree files from your 16S or ITS pipeline outputs to run the downstream analysis (DA) pipeline. Nephele's DA pipeline uses QIIME 2 to provide sample observation and taxonomic summaries and diversity analyses of an OTU table. You could also use the biom and fasta files generated in the 16S QIIME 2 or DADA2 pipeline file to run the metagenomics inference using PICRUSt2.
Pipeline Features | Downstream Analysis: Diversity | Metagenomic Inference: PICRUSt2 |
---|---|---|
Summarize sample metadata | ||
Calculate alpha diversity measures | ||
Plot PCoA ordination of beta diversity | ||
Plot bar graphs of taxonomic abundance | ||
Annotate biom file with metadata | ||
Create biom file that can be used in MicrobiomeDB | ||
Infer genes and pathway abundances from 16S data | ||
Generate interactive heatmap of pathways per sample |
More detailed information about each step can be found on the Downstream Analysis pipeline help page.
Follow our tutorial to try it out.
You can use the biom and fasta files generated on the 16S amplicon pipelines (DADA2, QIIME2) as input for this Metagenomics Inference pipeline. The Nephele implementation leverages the PICRUSt2 code and documentation released by the Huttenhower lab; learn more here: https://github.com/picrust/picrust2/wiki/Full-pipeline-script and here https://github.com/picrust/picrust2/wiki/PICRUSt2-Tutorial-(v2.4.1). This implementation is only meant to be used with the outputs of 16S analysis.
The PICRUSt2 performs the 4 key steps outlined on this wiki: (1) sequence placement, (2) hidden-state prediction of genomes, (3) metagenome prediction, (4) pathway-level predictions. The outputs are further annotated with descriptions of the functional categories.
Finally, the predicted pathways are also plotted in an interactive heatmap using Morpheus.
More detailed information about each step can be found on the Metagenome Inference PICRUSt2 pipeline help page.
Follow our tutorial to try it out.
This pipeline allows processing of Illumina sequence data generated using a tiling multiplexed primer design such as the popular SARS-CoV-2 ARTIC protocol released by the ARTIC network, which merges two pools of primer pairs prior to sequencing.
ARTICplus method: This pipeline version gives the user an option to indicate which of the various primer sets was used during library preparation (see sequences of primer sets ARTIC v1, v2, v3 and v4 here: https://github.com/joshquick/artic-ncov2019/tree/master/primer_schemes/nCoV-2019, NEB VarSkip Short and Long here: https://github.com/nebiolabs/VarSkip and IDT's Midnight 1200 here: https://zenodo.org/record/3897530#.Xuk7oGpLjep). For example, libraries generated using the v3 primer set should expect to see 400nt amplicons tiling the 30kb viral genome based on the primer design published. If the user prepared the library using a custom primer strategy, this pipeline provides a way of importing a BED file to describe the primer locations based on the SARS-CoV-2 reference (NC_045512.2).
The SARS-CoV-2 ARTICplus method workflow first takes as input single or paired-end data and proceeds to align reads to the genome using BWA. The primer sequences are then masked using iVar and the masked alignment BAM file is then downsampled using jvarkit downsample tool prior to calling variants using the GATK HaplotypeCaller. Filtered variants are then annotated and consensus genomes are generated incorporating both snps and indels an including masked bases to indicated low coverage regions.
Pipeline Features | SARS-CoV-2 ARTICplus |
---|---|
Trim data with Trimmomatic | |
Align data with BWA | |
Call variants using haplotypeCaller from GATK | |
Consensus Sequence | |
Generate QC metrics | |
Generate coverage plot | |
Primer sequences masked using iVar |
More detailed information about each step can be found on the SARS-CoV-2 pipeline help page.
Follow our tutorial to try it out.
DiscoVir is a pipeline for exploring viruses (ssDNA, dsDNA phage, and giant DNA viruses) and viral diversity in metagenomes. You can use metagenome FASTA assembly files and BAM files of reads mapped back to the assemblies generated from Nephele's WGSA2 pipeline or elsewhere as input to DiscoVir.
Pipeline Features | |
---|---|
Identification of viral sequences with geNomad |
Always run
|
Quality check and filtering with CheckV |
Always run
|
Gene annotation with DRAM-v |
Always run
|
vOTU clustering |
Always run
|
vOTU and functional gene abundances |
Always run
|
Host identification with iPHoP |
Run if selected
|
Gene annotation with NCBI nr and Diamond |
Run if selected
|
AMG annotation with DRAM-v |
Run if selected
|
More detailed information about each step can be found on the DiscoVir pipeline help page.
After the pipeline has completed, you will receive an email with a link to a results page. The results pages are organized into sections, depending on the result type, e.g. table, image, file. Convieniently, each result can be viewed in the browser or downloaded. On top of that, you can download all results in a single archive.
We encourage you to explore the example DADA2 pipeline results page to get a sense of the information that is provided.
Below are links to example result archives for each pipeline. The archives contain all the results that are generated by the pipeline.