Skip to main content

bactopia_qc

Tags: fastq qc adapter-removal error-correction subsampling fastp bbduk lighter porechop nanoq fastqc nanoplot sample-scope

Automated quality control, error correction, and read subsampling.

A comprehensive QC pipeline that adapts to the input read type:

  • Illumina: Adapter/PhiX removal (Fastp or BBDuk), Error Correction (Lighter), and Subsampling (Rasusa)
  • Nanopore: Adapter removal (Porechop), Quality filtering (Nanoq), and Subsampling (Rasusa)
  • Hybrid: Processes both short and long reads through their respective pipelines
  • Assembly: Passes through simulated reads from assemblies

Generates quality metrics using fastq-scan and optional quality reports using FastQC (Illumina) and NanoPlot (ONT).

Inputs

record (
meta: Record,
r1: Path?,
r2: Path?,
se: Path?,
lr: Path?,
fna: Path?
)
FieldTypeDescription
metaRecordGroovy Record containing sample information (must include runtype, genome_size, species)
r1Path?Illumina R1 reads (paired-end forward)
r2Path?Illumina R2 reads (paired-end reverse)
sePath?Single-end Illumina reads
lrPath?Long reads (ONT)
fnaPath?Assembly file (FASTA) for assembly-based simulations
adapters: Path?
phix: Path?
NameTypeDescription
adaptersPath?Filepath for custom adapter sequences (FASTA)
phixPath?Filepath for custom PhiX sequences (FASTA)

Outputs

record (
meta: Record,
r1: Path?,
r2: Path?,
se: Path?,
lr: Path?,
fna: Path?,
reads_grouped: Set<Path?>,
error: Set<Path?>,
skipped: Path?,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
FieldTypeDescription
metaRecordSample information record
r1Path?QC'd Illumina R1 reads (paired-end forward)
r2Path?QC'd Illumina R2 reads (paired-end reverse)
sePath?QC'd single-end Illumina reads
lrPath?QC'd long reads (ONT)
fnaPath?Assembly file (FASTA)
reads_groupedSet<Path?>All output FASTQs for publishing
errorSet<Path?>Captured error messages if QC failed (e.g., reads empty after trimming)
skippedPath?Marker file indicating QC was skipped for this sample
resultsSet<Path>All output files to be published
logsSet<Path?>Optional program specific log files
nf_logsSet<Path>Nextflow-specific log files (e.g. .command.{begin
versionsSet<Path>A YAML formatted file with program versions

Parameters

QC Parameters

ParameterTypeDefaultDescription
--use_bbmapbooleanIllumina reads will be QC'd using BBMap
--use_porechopbooleanfalseUse Porechop to remove adapters from ONT reads
--skip_qcbooleanThe QC step will be skipped and it will be assumed the inputs sequences have already been QCed.
--skip_qc_plotsbooleanQC Plot creation by FastQC or Nanoplot will be skipped
--skip_error_correctionbooleanFLASH error correction of reads will be skipped.
--adaptersstringA FASTA file containing adapters to remove
--adapter_kinteger23Kmer length used for finding adapters.
--phixstringphiX174 reference genome to remove
--phix_kinteger31Kmer length used for finding phiX174.
--ktrimstringrTrim reads to remove bases matching reference kmers (choices: f, r, l)
--minkinteger11Look for shorter kmers at read tips down to this length, when k-trimming or masking.
--hdistinteger1Maximum Hamming distance for ref kmers (subs only)
--tpestringtWhen kmer right-trimming, trim both reads to the minimum length of either (choices: f, t)
--tbostringtTrim adapters based on where paired reads overlap (choices: f, t)
--qtrimstringrlTrim read ends to remove bases with quality below trimq. (choices: rl, f, r, l, w)
--trimqinteger6Regions with average quality BELOW this will be trimmed if qtrim is set to something other than f
--maqinteger10Reads with average quality (after trimming) below this will be discarded
--minlengthinteger35Reads shorter than this after trimming will be discarded
--ftminteger5If positive, right-trim length to be equal to zero, modulo this number
--tossjunkstringtDiscard reads with invalid characters as bases (choices: f, t)
--ainstringfWhen detecting pair names, allow identical names (choices: f, t)
--qoutstring33PHRED offset to use for output FASTQs (choices: 33, 64)
--maxcorinteger1Max number of corrections within a 20bp window
--sampleseedinteger42Set to a positive number to use as the random number generator seed for sampling
--ont_minlengthinteger1000ONT Reads shorter than this will be discarded
--ont_minqualinteger0Minimum average read quality filter of ONT reads
--porechop_optsstringExtra Porechop options in quotes
--nanoplot_optsstringExtra NanoPlot options in quotes
--bbduk_optsstringExtra BBDuk options in quotes
--fastp_optsstringExtra fastp options in quotes

Used By

Subworkflows

  • bactopia_qc - Perform comprehensive quality control on sequencing reads.

Workflows

  • bactopia - Comprehensive bacterial analysis pipeline for complete genomic characterization.
  • cleanyerreads - Quality control and optional host read removal from raw sequencing reads.
  • staphopia - Comprehensive analysis pipeline for Staphylococcus aureus isolates.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub

Version

BACTOPIA_QC:
- bactopia-qc: 1.0.4