Nephele User Guide

Test datasets

Thank you for choosing Nephele. If you would like to try submitting with a small dataset, below are the sample input files for each pipeline type.

Note: please unzip the file and upload individual *.fastq.gz files when submitting.
Quality Check (Short Read) and 16S Amplicon Data (Mothur, QIIME, DADA2)
File (Type) Size Description Reference
Sequences 72MB Contains paired-end data (forward and reverse) from 10 samples sequenced on an Illumina MiSeq Experimental Microbial Dysbiosis Does Not Promote Disease Progression in SIV-Infected Macaques.
NCBI BioProject: PRJNA417022
Mapping File (Excel) 9KB Metadata file used in Nephele job submissions that describes samples, groups, etc. for analysis
Quality Check Nanopore Data
File (Type) Size Description Reference
Sequences 142.4MB Contains subsampled fastq files from 2 samples sequenced using Minion Genome-centric analysis of short and long read metagenomes reveals uncharacterized microbiome diversity in Southeast Asians.
NCBI BioProject: PRJEB49168
Mapping File (Excel) 17KB Metadata file used in Nephele job submissions that describes samples for analysis
ITS Amplicon Data (QIIME, DADA2 ITS)
File (Type) Size Description Reference
Sequences 24MB Contains paired-end data (forward and reverse) from 3 samples sequenced on an Illumina MiSeq A fungal mock community control for amplicon sequencing experiments.
NCBI BioProject: PRJNA377530
Mapping File (Excel) 10KB Metadata file used in Nephele job submissions that describes samples, groups, etc. for analysis
Downstream Analysis: Diversity
File (Type) Size Description Reference
Biom 252KB Contains abundance and taxonomy assignments. It is generated by the analysis pipelines Abundance table of a DADA2 analysis of dataset from 2017. NCBI BioProject: PRJNA417022
Tree 25KB It is a rooted phylogenetic tree in newick format
Metagenome Inference: PICRUSt2
File (Type) Size Description Reference
Biom 99KB Abundance table generated by DADA2 pipeline Peluso et.al. (2020)
Fasta 130KB Sequences corresponding to sequence variants identified by the DADA2 pipeline in Nephele
Mapping File (Excel) 40KB Metadata file used in Nephele job submissions that describes samples, groups, etc. for analysis
WGSA and bioBakery
File (Type) Size Description Reference
Sequences 856MB Sequence files (fastq.gz) derived from a sequencing run using the Illumina HiSeq platform The example dataset is a subsampled version of HiSeq sample data collected from the 2nd CAMI Toy Human Microbiome Project Dataset
Sczyrba et al. (2017)
Mapping File (Excel) 9KB Metadata file used in Nephele submissions that describes samples for analysis
SARS-CoV-2 SGS
File (Type) Size Description Reference
Sequences 730MB Directory with four fastq.gz files (pairs) corresponding to sequencing with Pool A and B primers. Elodie Ghedin's Lab
Primers 1KB A directory with two fasta files (new_A.fa and new_B.fa) each with all primers for Pools A and B Elodie Ghedin's Lab
Mapping File (Excel) 9KB
SARS-CoV-2 ARTICplus
File (Type) Size Description Reference
Sequences 137MB Directory with four sample fastq.gz files (pairs)
Mapping File (Excel) 11KB
Primer File 8KB
DiscoVir
File (Type) Size Description Reference
.fasta and .bam files 632.9MB Contains assemblies and bam files subsampled after being generated using WGSA2 on paired reads from 10 samples Shkoporov, A. N. et al. (2019). NCBI Bioproject PRJNA545408
Mapping File (Excel) 10KB Metadata file used for DiscoVir job and describes samples for analysis