Pre-processing Nanopore QC

Description

The NanoporeQC pipeline is for evaluating quality and pre-processing long read sequencing reads from Oxford Nanopore Technology in fastq or fastq.gz format prior to analysis such as annotation or assembly. This pipeline uses NanoPlot1 to display the quality of the raw data. Optionally, users can additionally run Porechop_ABI2 to trim adapters/barcodes and nanoq3 for additional length and quality filtering. If Porechop_ABI or nanoq are selected, the final reads are assessed with NanoPlot a second time after QC.

User Options and Defaults

NanoPlot

NanoPlot is always run on input data to the NanoporeQC pipeline. NanoPlot produces summary statistics and plots describing the quality of the input data. If Porechop_ABI and/or nanoq are selected to run, NanoPlot will also produce summary statistics and plots of the quality filtered and trimmed data. The user has the option to:

  1. Enter the value of the max length of sequences to be displayed in the results (default=100000000)
  2. Choose file type of the output plots in addition to the html summary: pdf or png
  3. Choose the plot type for each bivariate plot: dot or kernel density estimate (kde)

Porechop_ABI

Porechop_ABI detects adapter and barcode sequences in Nanopore reads. Running Porechop_ABI is recommended if barcodes or adapters were not removed when demultiplexing or generating FASTQ files in MinKNOW (trimming barcodes is not performed by default during this process).

To learn more about Porechop_ABI, see: Porechop_ABI GitHub page.

If Porechop_ABI is selected to run:

  • The default for Porechop_ABI is to run reads through a list of known Nanopore adapters and barcodes.
  • Optionally, the user can choose to select adapter inference + known adapters and barcodes -abi or adapter inferences only –guess_adapter_only.
  • Adapter inference + known adapters and barcodes -abi can be used to screen reads through a list of known adapters and barcodes AND perform adapter inference.
  • The parameter adapter inferences -guess_adapter_only can be chosen to only perform adapter inference, where reads are not screened through a list of known adapters and barcodes.

nanoq

The user additionally has the option to run the input data through nanoq. Nanoq is a trimming and filtering tool for Oxford Nanopore reads. The input is not required to be trimmed for adapters by Porechop_ABI, however if adapter trimming was performed, the input to nanoq accepts the resulting adapter trimmed sequences.

To learn more about nanoq, see: nanoq GitHub page

If nanoq is selected, user can choose the following options:

  • Enter the minimum length of reads to pass filtering (-l). Anything below the user-set value will be discarded. Default: 0, which means no minimum length filtering.
  • Enter maximum length of read (-m). Anything above this user-set value will be discarded. Default is 0, which means there is no maximum length filtering.
  • Enter the minimum quality of read (-q). Reads with quality below this user-set value will be discarded. Default is 5, which is the default in Guppy, and this may be beneficial to increase.
  • Enter the number of bases that will be removed from the start of all reads (-S). Default is 0, which means no trimming at the start of the read.
  • Enter the number of bases that will be removed from the end of all reads (-E). Default is 0, which means no trimming at the end of the read.

Output files

Four folders will be produced in the output of this pipeline.

1. pre_QC_plots

This folder contains the output from NanoPlot displaying quality of data before QC for each sample in a separate folder labeled by sample name.

  • HistogramReadlength: Read length versus number of reads plot before quality analysis
  • LengthvsQualityScatterPlot_dot: Read length versus average read quality plot before quality analysis
  • LogTransformed_HistogramReadlength: Read length versus number of reads plot before quality analysis
  • NanoPlot_date.log: Verbose output from NanoPlot before quality analysis
  • NanoPlot-report.html: Summary html displaying all NanoPlot results before quality analysis
  • NanoStats.txt: Summary statistics before quality analysis
  • Weighted_HistogramReadlength: Read length versus number of bases plot before quality analysis
  • Weighted_LogTransformed_HistogramReadlength: Read length (log transformed x axis) versus number of bases plot before quality analysis
  • Yield_by_length: read length versus cumulative yield for minimum length plot before quality analysis

2. abi_trimmed

This folder contains the reads after adapter trimming in Porechop_ABI. Only produced if Run_Porechop_ABI is selected.

  • pc.fastq.gz: Adapter and barcode trimmed reads
  • abi_log.txt: Verbose output from Porechop_ABI

3. nanoq_trimmed

This folder contains the reads after adapter trimming plus trimming/filtering with nanoq. Only produced if run_nanoq is selected.

  • trimmed.fastq.gz: Quality filtered and trimmed reads that may or may not have been passed through porechop_ABI
  • report.txt

4. post_QC_plots

This folder contains the outputs from NanoPlot displaying quality of data after QC. Only produced if run_Porechop_ABI and/or run_nanoq are selected.

  • HistogramReadlength: Read length versus number of reads plot after quality analysis
  • LengthvsQualityScatterPlot_dot: Read length versus average read quality plot after quality analysis
  • LogTransformed_HistogramReadlength: Read length versus number of reads plot after quality analysis
  • NanoPlot_date.log: Verbose output from NanoPlot after quality analysis
  • NanoPlot-report.html: Summary html displaying all NanoPlot results after quality analysis
  • NanoStats.txt: Summary statistics after quality analysis
  • Weighted_HistogramReadlength: Read length versus number of bases plot after quality analysis
  • Weighted_LogTransformed_HistogramReadlength: Read length (log transformed x axis) versus number of bases plot after quality analysis

References

  1. De Coster, W., D’hert, S., Schultz, D. T., Cruts, M., & Van Broeckhoven, C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), 2666-2669.
  2. Bonenfant, Q., Noé, L., & Touzet, H. (2023). Porechop_ABI: discovering unknown adapters in Oxford Nanopore Technology sequencing reads for downstream trimming. Bioinformatics Advances, 3(1), vbac085.
  3. Steinig, E., & Coin, L. (2022). Nanoq: ultra-fast quality control for nanopore reads. Journal of Open Source Software, 7(69), 2991.