Nephele

Nephele provides mothur and DADA2 pipelines for amplicon data and the bioBakery pipeline for metagenome shotgun data. In addition, we have added a sequence data quality check pipeline so that you can inspect and control for your data quality before analysis.

We also like learning about new and different pipelines that could better serve your research and educational needs. If you have a suggestion of a tool or analysis for Nephele, please fill out this form.
*Note: your request will be considered in planning new features, but it doesn’t guarantee implementation.

Data Types

Nephele v2 supports demultiplexed paired-end, single-end, and Nanopore FASTQ files. In addition, BIOM, FASTA, and BAM files are used in certain pipelines. Please see the supported data type per pipeline below.

	Short Read QC	Nanopore QC	QIIME 2	mothur v1.46.1	DADA2 v1.18	DADA2 ITS v1.18	bioBakery	SARS-CoV-2	Downstream Analysis: Diversity (amplicon only)	Metagenomics Inference	DiscoVir
Paired End FASTQ	Mapping file template		Mapping file template	Mapping file template	Mapping file template	Mapping file template	Mapping file template	Mapping file template
Single End FASTQ	Mapping file template		Mapping file template		Mapping file template	Mapping file template	Mapping file template	Mapping file template
Nanopore FASTQ		Mapping file template
BIOM									Mapping file from your 16S or ITS pipeline output	Mapping file from your 16S or ITS pipeline output
FASTA										Reads file from ASV or representative sequence frorm OTUs
FASTA + BAM											Mapping file template

If you need help figuring out what type of data you have, please see the FAQ.

Pre-processing Short Read Quality Check (QC) Pipeline

Nephele provides a pre-processing quality check pipeline for demultiplexed paired-end and single-end short read FASTQ files. Please see this FAQ on why you may want to run QC pipeline before you run a microbiome analysis. The Nephele Short Read QC pipeline can run a quality control check (FastQC), Trim primers and/or adapters, Trim and/or Filter reads based on quality scores, Merge read pairs, and provides summary graphs of the QC steps.

The features list below is a summary of the Nephele QC pipeline workflow.

Pipeline Features	Pre-processing Quality Check
FastQC sequence quality check	Always run
Trim primers and/or adapters	Run if selected
Trim reads based on quality scores	Run if selected
Filter reads based on quality scores	Run if selected
Merge read pairs	Run if selected
Summary graphs of QC steps	Always run

* Please note that Trim primers and/or adapters, Trim reads based on quality scores, Filter reads based on quality scores, Merge read pairs options will be executed ONLY IF the option is selected.
**Merge read pairs is Paired End only.

More information about the pipeline, including output files and user options, can be found on the Pre-processing QC Pipeline help page

Try it out

You can use our 16S example files to try out this pipeline.

Pre-processing Nanopore Quality Check (QC) pipeline

Nephele provides a pre-processing quality check pipeline for Oxford Nanopore Technology (Nanopore) long read sequences. The Nephele Nanopore QC pipeline runs a quality control check (NanoPlot), trims known or unknown primers and/or adapters (Porechop_ABI), trims and/or filters reads (nanoq), and provides summary graphs of the QC steps.

The features list below is a summary of the Nephele Nanopore QC pipeline workflow.

Pipeline Features	Pre-processing Quality Check Nanopore Long Read
NanoPlot sequence quality check	Always run
Trim barcodes and/or adapters	Run if selected
Trim reads based on quality scores	Run if selected
Trim fixed length off ends of reads	Run if selected
Filter reads based on quality scores	Run if selected
Filter based on length	Run if selected
NanoPlot sequence quality after QC	Always run

*Please note that adapter trimming and additional read trimming and filtering will only be executed if the options are selected

Try it out

You can use our Nanopore QC example files to try out this pipeline.

Amplicon Pipelines

Nephele is currently running QIIME2 v2022.2, mothur v1.46.1. and DADA2 v1.18. You can use any of these pipelines to run 16S analysis. For ITS, DADA2 ITS is available.

Amplicon	QIIME2 v2022.2	mothur v1.46.1	DADA2 v1.18	DADA2 ITS v1.18
16S
ITS

Each pipeline has different features and steps. Please see the table below to see the different features each 16S pipeline provides in Nephele.

Pipeline Features	mothur	QIIME2	DADA2	DADA2 ITS
Join forward and reverse short reads as contigs
Screen contigs to reduce sequencing errors
Denoising with Deblur
Dereplicate contig sequences
Taxonomic assignment based on selected database
Remove sequences likely due to sequencing errors
Identify and remove chimeric sequences
Classify sequences based on k-nearest neighbor
Classify sequences based on vsearch and sklearn
Remove sequences belonging to undesirable lineages
Remove rare OTUs in the samples
Detect differentially abundant features in samples
Construct phylogenetic tree
Calculate various measures of diversity
Calculate various measures of diversity for rarefaction curve only
Ion Torrent Processing - Beta
Create biom file that can be used in MicrobiomeDB
Create biom file that can be used in Downstream Analysis pipeline

More detailed information about each step can be found on the individual pipeline help pages:

Follow our tutorial to try it out.

Metagenome Pipelines

Nephele2 is currently running 2 shotgun metagenomics pipelines:

bioBakery Workflows developed by The Hutterhower Lab (Learn more here)
WGSA2: Nephele's in-house assembly-based pipeline (Learn more here)

The pipeline and modules of your choice will depend on:

The type of sequence you have (Single End or Paired End data)
The type of community you have (prokaryotic only or complex microbial community)
The scientific questions you want answered (Taxonomic or Functional exploration)
The level of desired detail (initial familiarization with dataset or ready of detailed exploration)

Pipeline Features		Pipeline Name
Module	Steps	bioBakery	WGSA2
Data processing	Accomodation of single-end (SE) data	kneadData
	Trimming, Filtering & Error-correction		fastp
	Decontamination against host genome of choice		kraken2
Read-based taxonomic profiling	Taxonomic Assignments & composition assessment	MetaPhlAn3 + StrainPhlAn3	kraken2
Read-based taxonomic profiling	Visualizations of taxonomic community composition	bioBakery/R	KronaTools
Read-based functional profiling	Functional Annotation	HUMAnN3
	Pathway Inference based on functional annotations	HUMAnN3
	Visualizations of functional community composition	bioBakery/R
Read assembly and mapping	Assembly to scaffolds		metaSpades
Read assembly and mapping	Read mapping and alignment processing		samtools
Gene-based functional profiling	Gene Prediction & abundance scoring		Prodigal + verse
	Functional Annotation		eggNog-mapper2
	Pathway Inference based on functional annotations		MinPath
	Visualizations of functional community composition		KronaTools
Gene-based taxonomic profiling (user elective)	Taxonomic Assignments of predicted genes		kraken2
Gene-based taxonomic profiling (user elective)	Visualizations of gene-based taxonomic composition		KronaTools
MAG-based taxonomic profiling (user elective)	Scaffold binning into draft genomes (MAGs)		metabat2
	Bin quality, abundance & taxonomic assessments		CheckM
	Visualizations of taxonomic community composition		KronaTools
Dataset-wide community stats & visualizations	Collective TAX & FUNC abundance matrix	bioBakery/R	WGSA2/R
	Collective TAX & FUNC profile heatmap
	Collective Alpha diveristy visualization plot
	Collective Beta diverisy ordination (PCoA) plot
	Functional feature vs sequencing depth plot
	Biom file creation (for MicrobiomeDB)		WGSA2/R

Follow our tutorial to try it out.

Downstream Analysis Pipeline: Diversity

You can use the biom and tree files from your 16S or ITS pipeline outputs to run the downstream analysis (DA) pipeline. Nephele's DA pipeline uses QIIME 2 to provide sample observation and taxonomic summaries and diversity analyses of an OTU table. You could also use the biom and fasta files generated in the 16S QIIME 2 or DADA2 pipeline file to run the metagenomics inference using PICRUSt2.

Pipeline Features	Downstream Analysis: Diversity	Metagenomic Inference: PICRUSt2
Summarize sample metadata
Calculate alpha diversity measures
Plot PCoA ordination of beta diversity
Plot bar graphs of taxonomic abundance
Annotate biom file with metadata
Create biom file that can be used in MicrobiomeDB
Infer genes and pathway abundances from 16S data
Generate interactive heatmap of pathways per sample

More detailed information about each step can be found on the Downstream Analysis pipeline help page.

Follow our tutorial to try it out.

Metagenome Inference PICRUSt2 Pipeline

You can use the biom and fasta files generated on the 16S amplicon pipelines (DADA2, QIIME2) as input for this Metagenomics Inference pipeline. The Nephele implementation leverages the PICRUSt2 code and documentation released by the Huttenhower lab; learn more here: https://github.com/picrust/picrust2/wiki/Full-pipeline-script and here https://github.com/picrust/picrust2/wiki/PICRUSt2-Tutorial-(v2.4.1). This implementation is only meant to be used with the outputs of 16S analysis.

The PICRUSt2 performs the 4 key steps outlined on this wiki: (1) sequence placement, (2) hidden-state prediction of genomes, (3) metagenome prediction, (4) pathway-level predictions. The outputs are further annotated with descriptions of the functional categories.

Finally, the predicted pathways are also plotted in an interactive heatmap using Morpheus.

More detailed information about each step can be found on the Metagenome Inference PICRUSt2 pipeline help page.

Follow our tutorial to try it out.

SARS-CoV-2 ARTICplus and SGS methods (beta)

This pipeline allows processing of Illumina sequence data generated using a tiling multiplexed primer design such as the popular SARS-CoV-2 ARTIC protocol released by the ARTIC network, which merges two pools of primer pairs prior to sequencing. In addition, this pipeline supports a variation of the ARTIC protocol, which uses an alternative set of primer pairs for the two pools and allows each pool to be sequenced independently (learn more about the Ghedin laboratory protocol here). The original version of this pipeline, which processes the raw data of each sequencing run and then merges the aligned data, was written by the Elodie Ghedin's Lab with the goal of facilitating deep sequencing. To distinguish the version of the pipeline that supports the workflow used for the ARTIC protocol from the version written to support the Ghedin Lab - (Systems Genomics Section) protocol, we have decided to name these ARTICplus method and SGS method.

ARTICplus method: This pipeline version gives the user an option to indicate which of the various primer sets was used during library preparation (see sequences of primer sets ARTIC v1, v2, v3 and v4 here: https://github.com/joshquick/artic-ncov2019/tree/master/primer_schemes/nCoV-2019, NEB VarSkip Short and Long here: https://github.com/nebiolabs/VarSkip and IDT's Midnight 1200 here: https://zenodo.org/record/3897530#.Xuk7oGpLjep). For example, libraries generated using the v3 primer set should expect to see 400nt amplicons tiling the 30kb viral genome based on the primer design published. If the user prepared the library using a custom primer strategy, this pipeline provides a way of importing a BED file to describe the primer locations based on the SARS-CoV-2 reference (NC_045512.2).

The SARS-CoV-2 ARTICplus method workflow first takes as input single or paired-end data and proceeds to align reads to the genome using BWA. The primer sequences are then masked using iVar and the masked alignment BAM file is then downsampled using jvarkit downsample tool prior to calling variants using the GATK HaplotypeCaller. Filtered variants are then annotated and consensus genomes are generated incorporating both snps and indels an including masked bases to indicated low coverage regions.

SGS method: For the SGS method version, the amplicons should be generated in two separate pools, A and B, from amplification through library generation and sequencing. The pipeline takes as input single or paired-end fastq files and trims both adapters and primers and aligns them to the provided reference file, then merges the A and B files to create merged BAM files as output. The pipeline additionally calls variants using GATK's HaplotypeCaller. It will work on any number of samples, provided that each sample has 2 (single end) or 4 (paired end) input fastq files (read files from both A and B amplicon preparations). A consensus sequence is generated but in the current version this consensus is built only with the snps (not indels).

Pipeline Features	SARS-CoV-2 ARTICplus method	SARS-CoV-2 SGS method
Trim data with Trimmomatic
Align data with BWA
Merge alignment data with Picard MergeSamFiles
Call variants using haplotypeCaller from GATK
Consensus Sequence
Generate QC metrics
Generate coverage plot
Primer sequences masked using iVar
Trim primers using Trimmomatic

More detailed information about each step can be found on the SARS-CoV-2 pipeline help page.

Follow our tutorial to try it out.

DiscoVir

DiscoVir is a pipeline for exploring viruses (ssDNA, dsDNA phage, and giant DNA viruses) and viral diversity in metagenomes. You can use metagenome FASTA assembly files and BAM files of reads mapped back to the assemblies generated from Nephele's WGSA2 pipeline or elsewhere as input to DiscoVir.

Pipeline Features
Identification of viral sequences with geNomad	Always run
Quality check and filtering with CheckV	Always run
Gene annotation with DRAM-v	Always run
vOTU clustering	Always run
vOTU and functional gene abundances	Always run
Host identification with iPHoP	Run if selected
Gene annotation with NCBI nr and Diamond	Run if selected
AMG annotation with DRAM-v	Run if selected

More detailed information about each step can be found on the DiscoVir pipeline help page.

Nephele Output

Example result page

After the pipeline has completed, you will receive an email with a link to a results page. The results pages are organized into sections, depending on the result type, e.g. table, image, file. Convieniently, each result can be viewed in the browser or downloaded. On top of that, you can download all results in a single archive.

We encourage you to explore the example DADA2 pipeline results page to get a sense of the information that is provided.

Example result archives

Below are links to example result archives for each pipeline. The archives contain all the results that are generated by the pipeline.

Nephele User Guide

About Nephele Pipelines

Data Types

Pre-processing Short Read Quality Check (QC) Pipeline

Try it out

Pre-processing Nanopore Quality Check (QC) pipeline

Try it out

Amplicon Pipelines

Metagenome Pipelines

Downstream Analysis Pipeline: Diversity

Metagenome Inference PICRUSt2 Pipeline

SARS-CoV-2 ARTICplus and SGS methods (beta)

DiscoVir

Nephele Output

Example result page

Example result archives

Quick links

Nephele User Guide

About Nephele Pipelines

Data Types

Pre-processing Short Read Quality Check (QC) Pipeline

Try it out

Pre-processing Nanopore Quality Check (QC) pipeline

Try it out

Amplicon Pipelines

Metagenome Pipelines

Downstream Analysis Pipeline: Diversity

Metagenome Inference PICRUSt2 Pipeline

SARS-CoV-2 ARTICplus and SGS methods (beta)

DiscoVir

Nephele Output

Example result page

Example result archives