Skip to main content

teton

Tags: metagenomics taxonomy classification kraken bracken genome-size run-scope

Perform taxonomic classification and estimate bacterial genome sizes.

This subworkflow processes raw sequencing reads through a taxonomic classification pipeline using Kraken2 and Bracken to estimate bacterial genome sizes and separate bacterial from non-bacterial organisms. It first removes host reads using the scrubber subworkflow, then classifies reads, and finally creates sample sheets with genome size estimates for downstream Bactopia analysis.

Uses explicit positional record fields for reads:

  • Input: record(meta, r1, r2, se, lr) where each read slot is Path?

Take

reads: Channel<Record>
FieldDescription
metaGroovy Record containing sample information
r1Illumina R1 reads (paired-end)
r2Illumina R2 reads (paired-end)
seSingle-end Illumina reads
lrLong reads (ONT/PacBio)
db: Path?
use_srascrubber: Boolean
use_nohuman: Boolean
nohuman_db: Path?
download_nohuman: Boolean
nohuman_save_as_tarball: Boolean
deacon_db: Path?
download_deacon: Boolean
NameTypeDescription
dbPath?Optional Kraken2 database path for taxonomic classification
use_srascrubberBooleanBoolean flag to use SRA scrubber for host read removal
use_nohumanBooleanBoolean flag to use nohuman for host read removal
nohuman_dbPath?Path to nohuman database directory or tarball
download_nohumanBooleanBoolean flag to download the nohuman database
nohuman_save_as_tarballBooleanBoolean flag to save downloaded nohuman database as tarball
deacon_dbPath?Path to deacon minimizer index file (.idx)
download_deaconBooleanBoolean flag to download the deacon index

Emit

Published

The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.

sample_outputs

No sample-scope outputs.

run_outputs

No run-scope outputs.

Subworkflow Composition

This subworkflow calls the following subworkflows:

  • scrubber - Remove contaminant sequences from metagenomic data.
  • bracken - Estimate species abundance from metagenomic reads.

Module Composition

This subworkflow calls the following modules:

  • bactopia_teton - Predict genome size and route samples based on taxonomic classification.
  • csvtk_join - Join two CSV or TSV files based on common fields.
  • csvtk_concat - Concatenate multiple CSV or TSV files into a single table.

Used By

This subworkflow is used by the following workflows:

  • teton - Taxonomic classification and abundance profiling of metagenomic reads.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub