Nephele User Guide
Frequently Asked Questions
Pipeline Submission Tips
- Paired End: Paired-end sequencing involves sequencing DNA from both ends of a fragment. Choose this option if your sequence files are paired and have been demultiplexed. For example, Illumina MiSeq Paired End FASTQ files consist of two files for each sample with names ending on "R1_001.fastq" and "R2_001.fastq"
- Single End: Single-end sequencing involves sequencing DNA from only one end. Choose this option if your samples were only sequenced from one end or if you don’t want to use the reverse FASTQ file.
- If you need further information to help you decide what type of file you have, see the SRA file type information page.
The optional parameters were carefully chosen based on (1) the most common scenarios of NGS data analysis, (2) the suggestions from the developers, and (3) published results. The different pipelines available on Nephele target different kinds of NGS studies, such as whole genome shotgun sequencing, 16S microbiome survey, and functional annotation of microbial community.
Most users submit their jobs with default values of the optional parameters. In our experience, more experienced bioinformaticians change the parameters to optimize their input data. We also have received feedback from novice microbiome researchers or students that they study the optional parameters (reading help text and testing different values-even if it fails) as a part of learning microbiome analysis.
- Make sure your files (fastq or fastq.gz) are already stored in your Google Drive account.
- Download the Google Drive app to your device.
- Once the download is completed, log into Google Drive with your account.
- Go to https://nephele.niaid.nih.gov in your browser.
- Tap on analysis type, for example, 16S, enter your email address and tap on Upload from Local.
- Tap the Add files button.
- In the menu that pops up, tap on Browse.
- You might see "iCloud Drive" as an option. But you should tap on Locations on the top left, and enable Google Drive/Drive under Locations (for Android, it may say Documents instead).
- Once you enable "Drive," you can navigate to your Google Drive account and select a file.
- Once you select a file, tap Start Upload. **Please note that you will need to upload one file at a time. Selecting all files from Drive and uploading them in a batch will not work.
- Once you upload all the files from Drive, you can submit your job to the pipeline!
Studies show that quality filtering can greatly improve microbiome analysis results. Best practices on working with sequencing data include doing a series of QC steps to verify and even improve the quality of the data. Our Pre-processing QC pipeline was designed to run a quality check by default, so the user can run it without choosing any options and receive FastQC tables and graphs providing information on the quality of individual samples. After evaluating these results, the user can submit their files to an analysis pipeline or return to the QC pipeline to trim reads and merge read pairs as needed.
Even though our 16S and WGS pipelines include quality filtering, trimming and merging steps, it may be best to run those processing steps separately ahead of time. We have incorporated the tools cutadapt and Trimmomatic in our Pre-processing QC pipeline steps to give users more control for modifying parameters, which can be helpful for some datasets, especially if the amplicon region is variable length. For the read merging step, we have integrated the FLASH merger, which some results show might provide better precision and recall than the native tools used by QIIME 1 and mothur. For longer amplicon regions with a short overlap between paired reads, FLASH may perform better than the DADA2 merger. So, we designed the QC pipeline to provide these programs for our users as well to help them get better results. For, more information about the tools we use see the details page.
Some usage examples:
- Run paired-end data through this pipeline, choosing to merge the reads, and then submit the resulting FASTQ files to the DADA2 or QIIME1 Single End pipelines.
- Examine the average per-base quality scores from the FastQC results of the pipeline, and use that information to set the
Truncation lengthparameter in DADA2 or the
Minimum Phred quality scoreparameter in QIIME 1.
Troubleshooting Nephele Errors
logfile.txtfile, which can be found directly on the results download page as well as in the
PipelineResults.JOBID.tar.gzdirectories. Specifically, you can do a text search for
ERRORto see some common errors that can arise with data analyses on Nephele. Many of these errors are described further in additional FAQs here, which provide detailed suggestions or solutions. If you continue to have issues, please do not hesitate to send us a support request.
Problems with Output Files
logfile.txt. It is possible that one or more of your samples did not have the minimum number of OTUs or reads and was excluded from further analysis. This will be indicated in the
samples_being_ignored.txtfile. You may also look at the logfile.txt to see why those samples have been excluded. Samples that have low OTU or sequence variant counts are sometimes removed because of the
Sampling depthcutoff parameter. If you do not specify the parameter, Nephele will use the default value of 10,000; see FAQ:How is the sampling depth calculated? for more information. If you open the
otu_summary_table.txtfile, you can see OTU counts for all of your samples. Adjusting the
Sampling depthparameter accordingly (i.e., entering a value that will include all of your samples) in a new run with the same data will resolve this issue. The parameter can be set under the Analysis tab of the job submission page, and you can use the job resubmission feature of Nephele to more easily resubmit your data with a different value.
The DADA2 pipeline is highly sensitive to sequence quality and primer trimming. It is very important to specify the correct primer lengths at job submission (or remove the primers from the data before submitting), as these sequences may interfere with the denoising of the reads as well as with chimera removal (if you are in doubt of the primer lengths, we advise you not to choose the chimera removal option). See this DADA2 FAQ for more information.
The DADA2 pipeline produces quality profile plots that you can look at to gauge the quality of your data (qualityProfile_R1/2.pdf). If the data is poor quality, the reads may be filtered out during the
filterAndTrim step. You can also see a table in the log file of how many reads pass this step. Additionally, if the data is poor quality, reads that pass the filter may be trimmed too much in the
filterAndTrim step, and may not merge properly in the
mergePairs step. You can search the log file for
paired-reads for how many reads successfully merged for each sample. Sometimes, it is helpful to use a trimming program such as cutadapt, Trimmomatic, or BBDuk to trim for quality (and/or primers) prior to running DADA2. You can use Nephele's QC pipeline to do this pre-processing of your data; see here for more information.
In articles and other publications, please cite Nephele as follows:
Office of Cyber Infrastructure and Computational Biology (OCICB), National Institute of Allergy and Infectious Diseases (NIAID). Nephele v2. http://nephele.niaid.nih.gov (2018)
Alternatively, use the following acknowledgement:
This study used the Nephele platform from the National Institute of Allergy and Infectious Diseases (NIAID) Office of Cyber Infrastructure and Computational Biology (OCICB) in Bethesda, MD.
Please also report the publication of articles which made use of Nephele by providing the reference in an email to the Nephele Project Team.
Please refer to the Release Notes to see when Nephele updates were made. Also, in the initial email you receive for each job, you will find the version of Nephele that corresponds to the Release Notes, as well as a copy of all the parameters that were selected for that job. Software package versions for the pipelines are also listed in the log files.
Choosing a sampling depth is generally arbitrary. Generally, it's recommended to choose a value high enough that you are able to capture the diversity present in samples with high read counts, but low enough to include the majority of your samples. For a simple community with only a handful of abundant members, for example, a sampling depth of 5,000 or less may suffice for an accurate estimate of diversity. For a more complex community with many low abundant members, however, a much higher value for sampling depth, 10,000 or higher, is generally necessary.
Nephele specifies the sampling depth of 10,000 reads as the minimum requirement for all downstream analysis. The pipelines use the following logics to determine the sampling depth:
- apply the user-specified sampling depth, if available
- set the sampling depth based on the sample with the least number of reads if it is greater than or equal to 10,000
- otherwise, no downstream analysis is performed.
Note: Users are encouraged to specify the sampling depth that is most appropriate for their studies. There is really no formula that can precisely determine the most appropriate value simply based on the distribution of read counts and the number of samples. If the pipeline does not generate any downstream analysis for you samples, it is most likely that the sample with the least number of reads is below 10,000. You will need to lower the sampling depth in order to run the downstream analysis.