Nephele User Guide

Getting Started About Nephele Pipelines Pipeline Descriptions Test Datasets FAQ

Test datasets

Thank you for choosing Nephele. If you would like to try submitting with a small dataset, below are the sample input files for each pipeline type.

Note: please unzip the file and upload individual *.fastq.gz files when submitting.

Quality Check (Short Read) and 16S Amplicon Data (Mothur, QIIME, DADA2)
File (Type)	Size	Description	Reference
Sequences	72MB	Contains paired-end data (forward and reverse) from 10 samples sequenced on an Illumina MiSeq	Experimental Microbial Dysbiosis Does Not Promote Disease Progression in SIV-Infected Macaques. NCBI BioProject: PRJNA417022
Mapping File (Excel)	9KB	Metadata file used in Nephele job submissions that describes samples, groups, etc. for analysis

Quality Check Nanopore Data
File (Type)	Size	Description	Reference
Sequences	142.4MB	Contains subsampled fastq files from 2 samples sequenced using Minion	Genome-centric analysis of short and long read metagenomes reveals uncharacterized microbiome diversity in Southeast Asians. NCBI BioProject: PRJEB49168
Mapping File (Excel)	17KB	Metadata file used in Nephele job submissions that describes samples for analysis

ITS Amplicon Data (QIIME, DADA2 ITS)
File (Type)	Size	Description	Reference
Sequences	24MB	Contains paired-end data (forward and reverse) from 3 samples sequenced on an Illumina MiSeq	A fungal mock community control for amplicon sequencing experiments. NCBI BioProject: PRJNA377530
Mapping File (Excel)	10KB	Metadata file used in Nephele job submissions that describes samples, groups, etc. for analysis

Downstream Analysis: Diversity
File (Type)	Size	Description	Reference
Biom	252KB	Contains abundance and taxonomy assignments. It is generated by the analysis pipelines	Abundance table of a DADA2 analysis of dataset from 2017. NCBI BioProject: PRJNA417022
Tree	25KB	It is a rooted phylogenetic tree in newick format

Metagenome Inference: PICRUSt2
File (Type)	Size	Description	Reference
Biom	99KB	Abundance table generated by DADA2 pipeline	Peluso et.al. (2020)
Fasta	130KB	Sequences corresponding to sequence variants identified by the DADA2 pipeline in Nephele	Peluso et.al. (2020)
Mapping File (Excel)	40KB	Metadata file used in Nephele job submissions that describes samples, groups, etc. for analysis

WGSA and bioBakery
File (Type)	Size	Description	Reference
Sequences	856MB	Sequence files (fastq.gz) derived from a sequencing run using the Illumina HiSeq platform	The example dataset is a subsampled version of HiSeq sample data collected from the 2nd CAMI Toy Human Microbiome Project Dataset Sczyrba et al. (2017)
Mapping File (Excel)	9KB	Metadata file used in Nephele submissions that describes samples for analysis

SARS-CoV-2 SGS
File (Type)	Size	Description	Reference
Sequences	730MB	Directory with four fastq.gz files (pairs) corresponding to sequencing with Pool A and B primers.	Elodie Ghedin's Lab
Primers	1KB	A directory with two fasta files (new_A.fa and new_B.fa) each with all primers for Pools A and B	Elodie Ghedin's Lab
Mapping File (Excel)	9KB

SARS-CoV-2 ARTICplus
File (Type)	Size	Description	Reference
Sequences	137MB	Directory with four sample fastq.gz files (pairs)
Mapping File (Excel)	11KB
Primer File	8KB

DiscoVir
File (Type)	Size	Description	Reference
.fasta and .bam files	632.9MB	Contains assemblies and bam files subsampled after being generated using WGSA2 on paired reads from 10 samples	Shkoporov, A. N. et al. (2019). NCBI Bioproject PRJNA545408
Mapping File (Excel)	10KB	Metadata file used for DiscoVir job and describes samples for analysis