bioBakery WGS Pipeline

bioBakery Workflows

User Options

  • Strainphlan: Should strain profiling with StrainPhlAn be run? Strain profiling can greatly increase the runtime of your job depending on the size and diversity of your samples. (Logical. Default: False)
  • Project name: A project name to go at the top of the html graphical output.

Output Files

  • The workflows tutorial goes through the pipelines step-by-step including information about all intermediate and final output files. We list some of the output files that may be of interest to our users here, as well as any output files made or removed by Nephele.
  • log files:
    • logfile.txt: contains the messages associated with the Nephele backend, such as transferring files
    • anadama.log: produced by the bioBakery wmgx workflow and contains all the associated information and error messages from the analysis
    • wgmx_vis/anadama.log: produced by the bioBakery wmgx_vis workflow
  • renamed_inputs: If you submit paired-end data, Nephele makes renamed links to the sequence files to suit the workflow's convention in this directory.
  • kneaddata:
    • main: Nephele removes all FASTQ files produced by Kneaddata, so this folder will only contain log files for each sample.
    • merged/kneaddata_read_count_table.tsv: merged data file containing read counts for each step of the QC process for each input file
  • metaphlan2:
    • main: Nephele removes all bowtie2 sam files produced by MetaPhlAn2. So, this folder only contains sample_name_taxonomic_profile.tsv, a taxonomic profile for each sample
    • merged/metaphlan2_taxonomic_profiles.tsv: merged taxonomic profiles for all samples
    • merged/metaphlan2_species_counts_table.tsv: total number of species identified for each sample
  • humann2:
    • main: for each sample, a file of gene family and pathway abundances, pathway coverage, and a log
    • merged/*.tsv: gene families, ecs, and pathways files for all samples merged into single files
    • merged/*_relab.tsv: data sets normalized to relative abundance
    • counts/humann2_feature_counts.tsv: feature counts of gene families, ecs, and pathways for all samples
    • counts/humann2_read_and_species_count_table.tsv: total species identified after filtering and total reads aligning (for nucleotide and translated search) for each sample
  • strainphlan: if the Strainphlan option is chosen, contains core output for profiling up to 10 species found in the sample
    • RAxML.*: trees generated for each species, may not exist if species are not found (more information can be found in the StrainPhlAn manual)
    • clade_name.fasta: the alignment file of all metagenomic strains
    • *.info: general information like the total length of the concatenated markers (full sequence length), number of used markers, etc.
    • *.polymorphic: statistics on the polymorphic site, details here
    • *.marker_pos: this file shows the starting position of each marker in the strains.
  • wmgx_vis: If you submit at least 3 samples, the html report from the visualization pipeline will be created here. It includes the software versions as well as the individual commands used.

Tools and References

  • McIver, L. J., Abu-Ali, G., Franzosa, E. A., Schwager, R., Morgan, X. C., Waldron, L., ... Huttenhower, C. (n.d.). "bioBakery: a meta'omic analysis environment." Bioinformatics. https://doi.org/10.1093/bioinformatics/btx754
  • Nephele runs the biobakery/workflows docker image from September 2017, which lists the following software versions:
    • kneaddata v0.6.1
    • MetaPhlAn version 2.6.0 (19 August 2016)
    • humann2 v0.11.1