Skip to main content

bactopia_gather

Tags: fastq validation sra ena download merging simulation art ncbi sample-scope

Search, validate, gather, or simulate input samples.

This process is the entry point for data ingestion. It handles:

  • Validation: Verifies FASTQ formatting and gzip integrity.
  • Merging: Combines multiple runs (lanes) into a single sample.
  • Downloading: Fetches reads (SRA/ENA) or assemblies (NCBI) from accessions.
  • Simulation: Generates synthetic reads from assemblies using ART to enable read-based analysis.

Uses explicit named slots for input and output reads:

  • Input accepts Set<Path> for each slot (pre-merge, supports multiple files)
  • Output emits Path? for each slot (post-merge, single consolidated file or null)

Inputs

record (
meta: Record,
r1_files: Set<Path?>,
r2_files: Set<Path?>,
se_files: Set<Path?>,
lr_files: Set<Path?>,
fna_files: Set<Path?>
)
FieldTypeDescription
metaRecordGroovy Record containing sample information
r1_filesSet<Path?>Illumina R1 read files (Set, elements may be null)
r2_filesSet<Path?>Illumina R2 read files (Set, elements may be null)
se_filesSet<Path?>Single-end read files (Set, elements may be null)
lr_filesSet<Path?>Long read files (ONT) or assembly for simulation (Set, elements may be null)
fna_filesSet<Path?>Input or downloaded assembly file (Set, elements may be null)

Outputs

record (
meta: Record,
r1: Path?,
r2: Path?,
se: Path?,
lr: Path?,
fna: Path?,
tsv: Path,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path?>
)
FieldTypeDescription
metaRecordSample information record
r1Path?Merged Illumina R1 read file
r2Path?Merged Illumina R2 read file
sePath?Merged single-end read file
lrPath?Merged long read file (ONT)
fnaPath?Assembly file
tsvPathA tab-delimited metadata file describing the valid samples
resultsSet<Path>All output files to be published
logsSet<Path?>Optional program specific log files
nf_logsSet<Path>Nextflow-specific log files (e.g. .command.{begin
versionsSet<Path?>A YAML formatted file with program versions

Parameters

Gather Parameters

ParameterTypeDefaultDescription
--skip_fastq_checkbooleanSkip minimum requirement checks for input FASTQs
--min_basepairsinteger2241820The minimum amount of basepairs required to continue downstream analyses.
--min_readsinteger7472The minimum amount of reads required to continue downstream analyses.
--min_coverageinteger10The minimum amount of coverage required to continue downstream analyses.
--min_proportionnumber0.5The minimum proportion of basepairs for paired-end reads to continue downstream analyses.
--min_genome_sizeinteger100000The minimum estimated genome size allowed for the input sequence to continue downstream analyses.
--max_genome_sizeinteger18040666The maximum estimated genome size allowed for the input sequence to continue downstream analyses.
--attemptsinteger3Maximum times to attempt downloads
--use_enabooleanDownload FASTQs from ENA
--no_cachebooleanSkip caching the assembly summary file from ncbi-genome-download

Used By

Subworkflows

  • bactopia_gather - Search, validate, gather, and standardize input samples.

Workflows

  • bactopia - Comprehensive bacterial analysis pipeline for complete genomic characterization.
  • cleanyerreads - Quality control and optional host read removal from raw sequencing reads.
  • staphopia - Comprehensive analysis pipeline for Staphylococcus aureus isolates.
  • teton - Taxonomic classification and abundance profiling of metagenomic reads.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub

Version

BACTOPIA_GATHER:
- bactopia-gather: 1.0.6