Nephele User Guide

Nephele Pipelines

Pipelines

Nephele provides QIIME1, mothur and DADA2 pipelines for amplicon data and the bioBakery pipeline for metagenome shotgun data. In addition, we have added a sequence data quality check pipeline so that you can inspect and control for your data quality before analysis.

We also like learning about new and different pipelines that could better serve your research and educational needs. If you have a suggestion of a tool or analysis for Nephele, please fill out this form.
*Note: your request will be considered in planning new features, but it doesn’t guarantee implementation.

Data Types

Nephele v2 supports demultiplexed paired-end and single-end FASTQ files. In addition, BIOM and FASTA files are used in certain pipelines. Please see the supported data type per pipeline below.

If you need help figuring out what type of data you have, please see the FAQ.

Pre-processing Quality Check (QC) Pipeline

Nephele provides a pre-processing quality check pipeline for demultiplexed paired-end and single-end FASTQ files. Please see this FAQ on why you may want to run QC pipeline before you run a microbiome analysis. The Nephele QC pipeline can run a quality control check (FastQC), Trim primers and/or adapters, Trim and/or Filter reads based on quality scores, Merge read pairs, and provides summary graphs of the QC steps.

The features list below is a summary of the Nephele QC pipeline workflow.

Pipeline Features Pre-processing Quality Check
FastQC sequence quality check
Always run
Trim primers and/or adapters
Run if selected
Trim reads based on quality scores
Run if selected
Filter reads based on quality scores
Run if selected
Merge read pairs
Run if selected
Summary graphs of QC steps
Always run

* Please note that Trim primers and/or adapters, Trim reads based on quality scores, Filter reads based on quality scores, Merge read pairs options will be executed ONLY IF the option is selected.
**Merge read pairs is Paired End only.

More information about the pipeline, including output files and user options, can be found on the Pre-processing QC Pipeline help page

Try it out

You can use our 16S example files to try out this pipeline.

Amplicon Pipelines

Nephele is currently running QIIME v1.9.1, QIIME2 v2020.11, mothur v1.46.1. and DADA2 v1.18. You can use any of these pipelines to run 16S analysis. For ITS, QIIME1 and DADA2 ITS are available.

Each pipeline has different features and steps. Please see the table below to see the different features each 16S pipeline provides in Nephele.

Pipeline Features mothur QIIME1 QIIME2 DADA2 DADA2 ITS
Join forward and reverse short reads as contigs
Screen contigs to reduce sequencing errors
Denoising with Deblur
Dereplicate contig sequences
Taxonomic assignment based on selected database
Remove sequences likely due to sequencing errors
Identify and remove chimeric sequences
Classify sequences based on k-nearest neighbor
Classify sequences based on vsearch and sklearn
Remove sequences belonging to undesirable lineages
Remove rare OTUs in the samples
Detect differentially abundant features in samples
Construct phylogenetic tree
Calculate various measures of diversity
Calculate various measures of diversity for rarefaction curve only
Ion Torrent Processing - Beta
Create biom file that can be used in MicrobiomeDB
Create biom file that can be used in Downstream Analysis pipeline

More detailed information about each step can be found on the individual pipeline help pages:

Follow our tutorial to try it out.

Metagenome Pipelines

Nephele2 is currently running 2 whole metagenome sequencing pipelines:

You can use any of these pipelines to run whole metagenome sequence data analysis, depending on the kind of dataset you have and the scientific questions you want to address

Pipeline Features bioBakery WGSA
Accommodates single-end (SE) data
Trim & Filter kneadData fastp
Decontaminate against genome of choice kneadData bbmap
Assembly to contigs & scaffolds metaSpades
Draft genome creation (bins) & quality assessments metabat2, CheckM
Taxonomic Assignments MetaPhlAn3 CheckM
Strain Assigments StrainPhlAn3 CheckM
Community composition taxonomic visualizations bioBakery/R KronaTools
Gene Prediction Prodigal
Functional Annotation HUMAnN3 metaProkka
Pathway Inference based on functional annotations HUMAnN3 MinPath
Visualizations of functional community composition bioBakery/R KronaTools
PCoA ordination of species composition bioBakery/R
Plots of functional feature detection vs sequencing depth bioBakery/R
Create biom file that can be used in MicrobiomeDB

Follow our tutorial to try it out.

Downstream Analysis Pipeline: Diversity

You can use the biom and tree files from your 16S or ITS pipeline outputs to run the downstream analysis (DA) pipeline. Nephele's DA pipeline uses QIIME 2 to provide sample observation and taxonomic summaries and diversity analyses of an OTU table. You could also use the biom and fasta files generated in the 16S QIIME 2 or DADA2 pipeline file to run the metagenomics inference using PICRUSt2.

Pipeline Features Downstream Analysis: Diversity Metagenomic Inference: PICRUSt2
Summarize sample metadata
Calculate alpha diversity measures
Plot PCoA ordination of beta diversity
Plot bar graphs of taxonomic abundance
Annotate biom file with metadata
Create biom file that can be used in MicrobiomeDB
Infer genes and pathway abundances from 16S data
Generate interactive heatmap of pathways per sample

More detailed information about each step can be found on the Downstream Analysis pipeline help page.

Follow our tutorial to try it out.

Metagenome Inference PICRUSt2 Pipeline

You can use the biom and fasta files generated on the 16S amplicon pipelines (DADA2, QIIME2) as input for this Metagenomics Inference pipeline. The Nephele implementation leverages the PICRUSt2 code and documentation released by the Huttenhower lab; learn more here: https://github.com/picrust/picrust2/wiki/Full-pipeline-script and here https://github.com/picrust/picrust2/wiki/PICRUSt2-Tutorial-(v2.4.1). This implementation is only meant to be used with the outputs of 16S analysis.

The PICRUSt2 performs the 4 key steps outlined on this wiki: (1) sequence placement, (2) hidden-state prediction of genomes, (3) metagenome prediction, (4) pathway-level predictions. The outputs are further annotated with descriptions of the functional categories.

Finally, the predicted pathways are also plotted in an interactive heatmap using Morpheus.

More detailed information about each step can be found on the Metagenome Inference PICRUSt2 pipeline help page.

Follow our tutorial to try it out.

SARS-CoV-2 Pipeline (beta)

SARS-CoV-2 pipeline was written originally by the Elodie Ghedin's Lab using the Nextflow workflow manager. The pipeline uses as input raw data files and proceeds to trim, aligns reads and calls variants against a SARS-CoV-2 reference sequence. The pipeline was designed to work with data generated from adapted versions of the ARTIC consortium protocol. We specifically recommend the protocol developed by the Ghedin Lab, which is available for download here.

The amplicons should be generated in two separate pools, A and B, from amplification through library generation and sequencing. The pipeline takes single or paired-end fastq files and trims both adapters and primers and aligns them to the provided reference file, then merges the A and B files to create merged BAM files as output. The pipeline additionally calls minority variants using GATK's HaplotypeCaller. The pipeline will work on any number of samples, provided that each sample has 2 (single end) or 4 (paired end) input fastq files (read files from both A and B amplicon preparations).

Pipeline Features SARS-CoV-2
Trim data with Trimmomatic
Align data with BWA
Merge data with Picard
Calls minority variants using haplotypeCaller from GATK
Consensus Sequence
Generate QC metrics

More detailed information about each step can be found on the SARS-CoV-2 pipeline help page.

Follow our tutorial to try it out.