Nephele User Guide

Nephele Pipelines

Pipelines

Nephele provides mothur and DADA2 pipelines for amplicon data and the bioBakery pipeline for metagenome shotgun data. In addition, we have added a sequence data quality check pipeline so that you can inspect and control for your data quality before analysis.

We also like learning about new and different pipelines that could better serve your research and educational needs. If you have a suggestion of a tool or analysis for Nephele, please fill out this form.
*Note: your request will be considered in planning new features, but it doesn’t guarantee implementation.

Data Types

Nephele v2 supports demultiplexed paired-end and single-end FASTQ files. In addition, BIOM and FASTA files are used in certain pipelines. Please see the supported data type per pipeline below.

If you need help figuring out what type of data you have, please see the FAQ.

Pre-processing Quality Check (QC) Pipeline

Nephele provides a pre-processing quality check pipeline for demultiplexed paired-end and single-end FASTQ files. Please see this FAQ on why you may want to run QC pipeline before you run a microbiome analysis. The Nephele QC pipeline can run a quality control check (FastQC), Trim primers and/or adapters, Trim and/or Filter reads based on quality scores, Merge read pairs, and provides summary graphs of the QC steps.

The features list below is a summary of the Nephele QC pipeline workflow.

Pipeline Features Pre-processing Quality Check
FastQC sequence quality check
Always run
Trim primers and/or adapters
Run if selected
Trim reads based on quality scores
Run if selected
Filter reads based on quality scores
Run if selected
Merge read pairs
Run if selected
Summary graphs of QC steps
Always run

* Please note that Trim primers and/or adapters, Trim reads based on quality scores, Filter reads based on quality scores, Merge read pairs options will be executed ONLY IF the option is selected.
**Merge read pairs is Paired End only.

More information about the pipeline, including output files and user options, can be found on the Pre-processing QC Pipeline help page

Try it out

You can use our 16S example files to try out this pipeline.

Amplicon Pipelines

Nephele is currently running QIIME2 v2020.11, mothur v1.46.1. and DADA2 v1.18. You can use any of these pipelines to run 16S analysis. For ITS, DADA2 ITS is available.

Each pipeline has different features and steps. Please see the table below to see the different features each 16S pipeline provides in Nephele.

Pipeline Features mothur QIIME2 DADA2 DADA2 ITS
Join forward and reverse short reads as contigs
Screen contigs to reduce sequencing errors
Denoising with Deblur
Dereplicate contig sequences
Taxonomic assignment based on selected database
Remove sequences likely due to sequencing errors
Identify and remove chimeric sequences
Classify sequences based on k-nearest neighbor
Classify sequences based on vsearch and sklearn
Remove sequences belonging to undesirable lineages
Remove rare OTUs in the samples
Detect differentially abundant features in samples
Construct phylogenetic tree
Calculate various measures of diversity
Calculate various measures of diversity for rarefaction curve only
Ion Torrent Processing - Beta
Create biom file that can be used in MicrobiomeDB
Create biom file that can be used in Downstream Analysis pipeline

More detailed information about each step can be found on the individual pipeline help pages:

Follow our tutorial to try it out.

Metagenome Pipelines

Nephele2 is currently running 2 shotgun metagenomics pipelines:

The pipeline and modules of your choice will depend on:

  • The type of sequence you have (Single End or Paired End data)
  • The type of community you have (prokaryotic only or complex microbial community)
  • The scientific questions you want answered (Taxonomic or Functional exploration)
  • The level of desired detail (initial familiarization with dataset or ready of detailed exploration)
Pipeline Features Pipeline Name
Module Steps bioBakery WGSA2
Data processing Accomodation of single-end (SE) data kneadData
Trimming, Filtering & Error-correction fastp
Decontamination against host genome of choice kraken2
Read-based taxonomic profiling Taxonomic Assignments & composition assessment MetaPhlAn3 + StrainPhlAn3 kraken2
Visualizations of taxonomic community composition bioBakery/R KronaTools
Read-based functional profiling Functional Annotation HUMAnN3
Pathway Inference based on functional annotations
Visualizations of functional community composition bioBakery/R
Read assembly and mapping Assembly to scaffolds metaSpades
Read mapping and alignment processing samtools
Gene-based functional profiling Gene Prediction & abundance scoring Prodigal + verse
Functional Annotation eggNog-mapper2
Pathway Inference based on functional annotations MinPath
Visualizations of functional community composition KronaTools
Gene-based taxonomic profiling (user elective) Taxonomic Assignments of predicted genes kraken2
Visualizations of gene-based taxonomic composition KronaTools
MAG-based taxonomic profiling (user elective) Scaffold binning into draft genomes (MAGs) metabat2
Bin quality, abundance & taxonomic assessments CheckM
Visualizations of taxonomic community composition KronaTools
Dataset-wide community stats & visualizations Collective TAX & FUNC abundance matrix bioBakery/R WGSA2/R
Collective TAX & FUNC profile heatmap
Collective Alpha diveristy visualization plot
Collective Beta diverisy ordination (PCoA) plot
Functional feature vs sequencing depth plot
Biom file creation (for MicrobiomeDB) WGSA2/R

Follow our tutorial to try it out.

Downstream Analysis Pipeline: Diversity

You can use the biom and tree files from your 16S or ITS pipeline outputs to run the downstream analysis (DA) pipeline. Nephele's DA pipeline uses QIIME 2 to provide sample observation and taxonomic summaries and diversity analyses of an OTU table. You could also use the biom and fasta files generated in the 16S QIIME 2 or DADA2 pipeline file to run the metagenomics inference using PICRUSt2.

Pipeline Features Downstream Analysis: Diversity Metagenomic Inference: PICRUSt2
Summarize sample metadata
Calculate alpha diversity measures
Plot PCoA ordination of beta diversity
Plot bar graphs of taxonomic abundance
Annotate biom file with metadata
Create biom file that can be used in MicrobiomeDB
Infer genes and pathway abundances from 16S data
Generate interactive heatmap of pathways per sample

More detailed information about each step can be found on the Downstream Analysis pipeline help page.

Follow our tutorial to try it out.

Metagenome Inference PICRUSt2 Pipeline

You can use the biom and fasta files generated on the 16S amplicon pipelines (DADA2, QIIME2) as input for this Metagenomics Inference pipeline. The Nephele implementation leverages the PICRUSt2 code and documentation released by the Huttenhower lab; learn more here: https://github.com/picrust/picrust2/wiki/Full-pipeline-script and here https://github.com/picrust/picrust2/wiki/PICRUSt2-Tutorial-(v2.4.1). This implementation is only meant to be used with the outputs of 16S analysis.

The PICRUSt2 performs the 4 key steps outlined on this wiki: (1) sequence placement, (2) hidden-state prediction of genomes, (3) metagenome prediction, (4) pathway-level predictions. The outputs are further annotated with descriptions of the functional categories.

Finally, the predicted pathways are also plotted in an interactive heatmap using Morpheus.

More detailed information about each step can be found on the Metagenome Inference PICRUSt2 pipeline help page.

Follow our tutorial to try it out.

SARS-CoV-2 ARTICplus and SGS methods (beta)

This pipeline allows processing of Illumina sequence data generated using a tiling multiplexed primer design such as the popular SARS-CoV-2 ARTIC protocol released by the ARTIC network, which merges two pools of primer pairs prior to sequencing. In addition, this pipeline supports a variation of the ARTIC protocol, which uses an alternative set of primer pairs for the two pools and allows each pool to be sequenced independently (learn more about the Ghedin laboratory protocol here). The original version of this pipeline, which processes the raw data of each sequencing run and then merges the aligned data, was written by the Elodie Ghedin's Lab with the goal of facilitating deep sequencing. To distinguish the version of the pipeline that supports the workflow used for the ARTIC protocol from the version written to support the Ghedin Lab - (Systems Genomics Section) protocol, we have decided to name these ARTICplus method and SGS method.

ARTICplus method: This pipeline version gives the user an option to indicate which of the various primer sets was used during library preparation (see sequences of primer sets ARTIC v1, v2, v3 and v4 here: https://github.com/joshquick/artic-ncov2019/tree/master/primer_schemes/nCoV-2019, NEB VarSkip Short and Long here: https://github.com/nebiolabs/VarSkip and IDT's Midnight 1200 here: https://zenodo.org/record/3897530#.Xuk7oGpLjep). For example, libraries generated using the v3 primer set should expect to see 400nt amplicons tiling the 30kb viral genome based on the primer design published. If the user prepared the library using a custom primer strategy, this pipeline provides a way of importing a BED file to describe the primer locations based on the SARS-CoV-2 reference (NC_045512.2).

The SARS-CoV-2 ARTICplus method workflow first takes as input single or paired-end data and proceeds to align reads to the genome using BWA. The primer sequences are then masked using iVar and the masked alignment BAM file is then downsampled using jvarkit downsample tool prior to calling variants using the GATK HaplotypeCaller. Filtered variants are then annotated and consensus genomes are generated incorporating both snps and indels an including masked bases to indicated low coverage regions.

SGS method: For the SGS method version, the amplicons should be generated in two separate pools, A and B, from amplification through library generation and sequencing. The pipeline takes as input single or paired-end fastq files and trims both adapters and primers and aligns them to the provided reference file, then merges the A and B files to create merged BAM files as output. The pipeline additionally calls variants using GATK's HaplotypeCaller. It will work on any number of samples, provided that each sample has 2 (single end) or 4 (paired end) input fastq files (read files from both A and B amplicon preparations). A consensus sequence is generated but in the current version this consensus is built only with the snps (not indels).

Pipeline Features SARS-CoV-2 ARTICplus method SARS-CoV-2 SGS method
Trim data with Trimmomatic
Align data with BWA
Merge alignment data with Picard MergeSamFiles
Call variants using haplotypeCaller from GATK
Consensus Sequence
Generate QC metrics
Generate coverage plot
Primer sequences masked using iVar
Trim primers using Trimmomatic

More detailed information about each step can be found on the SARS-CoV-2 pipeline help page.

Follow our tutorial to try it out.