staphopia

Tags: staphylococcus-aureus assembly annotation amr mlst spa-typing agr-typing sccmec named-workflow

Comprehensive analysis pipeline for Staphylococcus aureus isolates.

This workflow performs complete bacterial analysis including quality control, assembly, annotation, antimicrobial resistance detection, MLST typing, and Staphylococcus-specific analysis using Spatyper, AgrVATE, SCCmecFinder, and StaphSCAN. It processes raw sequencing reads and produces a comprehensive genomic characterization for S. aureus isolates.

Usage

staphopia CLI:

staphopia \
  --input samples.csv \
  --outdir results/

Nextflow:

nextflow run bactopia/bactopia/workflows/staphopia/main.nf \
  --input samples.csv \
  --outdir results/

Outputs

Expected Output Files

<BACTOPIA_DIR>
├── <SAMPLE_NAME>
│   ├── main
│   │   ├── annotator
│   │   │   └── prokka
│   │   │       ├── <SAMPLE_NAME>-blastdb.tar.gz
│   │   │       ├── <SAMPLE_NAME>.faa.gz
│   │   │       ├── <SAMPLE_NAME>.ffn.gz
│   │   │       ├── <SAMPLE_NAME>.fna.gz
│   │   │       ├── <SAMPLE_NAME>.fsa.gz
│   │   │       ├── <SAMPLE_NAME>.gbk.gz
│   │   │       ├── <SAMPLE_NAME>.gff.gz
│   │   │       ├── <SAMPLE_NAME>.sqn.gz
│   │   │       ├── <SAMPLE_NAME>.tbl.gz
│   │   │       ├── <SAMPLE_NAME>.tsv
│   │   │       ├── <SAMPLE_NAME>.txt
│   │   │       └── logs
│   │   │           ├── <SAMPLE_NAME>.err
│   │   │           ├── <SAMPLE_NAME>.log
│   │   │           ├── nf.command.{begin,err,log,out,run,sh,trace}
│   │   │           └── versions.yml
│   │   ├── assembler
│   │   │   ├── <SAMPLE_NAME>.fna.gz
│   │   │   ├── <SAMPLE_NAME>.tsv
│   │   │   ├── logs
│   │   │   │   ├── nf.command.{begin,err,log,out,run,sh,trace}
│   │   │   │   ├── shovill.log
│   │   │   │   └── versions.yml
│   │   │   └── supplemental
│   │   │       ├── flash.hist
│   │   │       ├── flash.histogram
│   │   │       ├── illumina.txt
│   │   │       └── shovill.corrections
│   │   ├── gather
│   │   │   ├── <SAMPLE_NAME>-meta.tsv
│   │   │   └── logs
│   │   │       ├── nf.command.{begin,err,log,out,run,sh,trace}
│   │   │       └── versions.yml
│   │   ├── qc
│   │   │   ├── <SAMPLE_NAME>_R1.fastq.gz
│   │   │   ├── <SAMPLE_NAME>_R2.fastq.gz
│   │   │   ├── logs
│   │   │   │   ├── <SAMPLE_NAME>-fastp.log
│   │   │   │   ├── nf.command.{begin,err,log,out,run,sh,trace}
│   │   │   │   └── versions.yml
│   │   │   └── supplemental
│   │   │       ├── <SAMPLE_NAME>.fastp.html
│   │   │       ├── <SAMPLE_NAME>.fastp.json
│   │   │       ├── <SAMPLE_NAME>_R1-final.json
│   │   │       ├── <SAMPLE_NAME>_R1-final_fastqc.html
│   │   │       ├── <SAMPLE_NAME>_R1-final_fastqc.zip
│   │   │       ├── <SAMPLE_NAME>_R1-original.json
│   │   │       ├── <SAMPLE_NAME>_R1-original_fastqc.html
│   │   │       ├── <SAMPLE_NAME>_R1-original_fastqc.zip
│   │   │       ├── <SAMPLE_NAME>_R2-final.json
│   │   │       ├── <SAMPLE_NAME>_R2-final_fastqc.html
│   │   │       ├── <SAMPLE_NAME>_R2-final_fastqc.zip
│   │   │       ├── <SAMPLE_NAME>_R2-original.json
│   │   │       ├── <SAMPLE_NAME>_R2-original_fastqc.html
│   │   │       └── <SAMPLE_NAME>_R2-original_fastqc.zip
│   │   └── sketcher
│   │       ├── <SAMPLE_NAME>-k21.msh
│   │       ├── <SAMPLE_NAME>-k31.msh
│   │       ├── <SAMPLE_NAME>-mash-refseq88-k21.txt
│   │       ├── <SAMPLE_NAME>-sourmash-gtdb-rs207-k31.txt
│   │       ├── <SAMPLE_NAME>.sig
│   │       └── logs
│   │           ├── nf.command.{begin,err,log,out,run,sh,trace}
│   │           └── versions.yml
│   └── tools
│       ├── agrvate
│       │   ├── <SAMPLE_NAME>.tsv
│       │   ├── logs
│       │   │   ├── nf.command.{begin,err,log,out,run,sh,trace}
│       │   │   └── versions.yml
│       │   └── supplemental
│       │       ├── <SAMPLE_NAME>-agr_gp.tab
│       │       ├── <SAMPLE_NAME>-blastn_log.txt
│       │       ├── <SAMPLE_NAME>-hmm-log.txt
│       │       ├── <SAMPLE_NAME>-hmm.tab
│       │       └── <SAMPLE_NAME>.fna-error-report.tab
│       ├── amrfinderplus
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf.command.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── mlst
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf.command.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── sccmec
│       │   ├── <SAMPLE_NAME>.regions.blastn.tsv
│       │   ├── <SAMPLE_NAME>.regions.details.tsv
│       │   ├── <SAMPLE_NAME>.targets.blastn.tsv
│       │   ├── <SAMPLE_NAME>.targets.details.tsv
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf.command.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       ├── spatyper
│       │   ├── <SAMPLE_NAME>.tsv
│       │   └── logs
│       │       ├── nf.command.{begin,err,log,out,run,sh,trace}
│       │       └── versions.yml
│       └── staphscan
│           ├── <SAMPLE_NAME>.tsv
│           └── logs
│               ├── nf.command.{begin,err,log,out,run,sh,trace}
│               └── versions.yml
└── bactopia-runs
    └── staphopia-<TIMESTAMP>
        ├── merged-results
        │   ├── agrvate.tsv
        │   ├── amrfinderplus.tsv
        │   ├── assembly-scan.tsv
        │   ├── logs
        │   │   ├── agrvate-concat
        │   │   │   ├── nf.command.{begin,err,log,out,run,sh,trace}
        │   │   │   └── versions.yml
        │   │   ├── amrfinderplus-concat
        │   │   │   ├── nf.command.{begin,err,log,out,run,sh,trace}
        │   │   │   └── versions.yml
        │   │   ├── assembly-scan-concat
        │   │   │   ├── nf.command.{begin,err,log,out,run,sh,trace}
        │   │   │   └── versions.yml
        │   │   ├── meta-concat
        │   │   │   ├── nf.command.{begin,err,log,out,run,sh,trace}
        │   │   │   └── versions.yml
        │   │   ├── mlst-concat
        │   │   │   ├── nf.command.{begin,err,log,out,run,sh,trace}
        │   │   │   └── versions.yml
        │   │   ├── sccmec-concat
        │   │   │   ├── nf.command.{begin,err,log,out,run,sh,trace}
        │   │   │   └── versions.yml
        │   │   ├── spatyper-concat
        │   │   │   ├── nf.command.{begin,err,log,out,run,sh,trace}
        │   │   │   └── versions.yml
        │   │   └── staphscan-concat
        │   │       ├── nf.command.{begin,err,log,out,run,sh,trace}
        │   │       └── versions.yml
        │   ├── meta.tsv
        │   ├── mlst.tsv
        │   ├── sccmec.tsv
        │   ├── spatyper.tsv
        │   └── staphscan.tsv
        └── nf-reports
            ├── staphopia-dag.dot
            ├── staphopia-report.html
            └── staphopia-timeline.html

Quality Control

File	Description
`supplemental/_fastqc.`	FastQC quality control reports for raw and cleaned reads
`supplemental/-NanoPlot.`	NanoPlot reports for Nanopore reads
`supplemental/.fastp.`	Fastp quality reports (when applicable)

Assembly

File	Description
`*.fna`	Assembled genome sequences in FASTA format
`assembly-stats.tsv`	Assembly quality metrics per sample

Annotation

note

Output format depends on chosen annotation tool (Bakta or Prokka)

File	Description
`*.gff.gz`	Genome annotation in GFF3 format (compressed)
`*.gbk.gz`	Genome annotation in GenBank format (compressed)
`*.faa.gz`	Protein sequences (compressed)
`*.fna.gz`	Nucleotide sequences from annotation (compressed)
`annotation.tsv`	Annotation summary tables

Typing

File	Description
`mlst.tsv`	MLST sequence type results
`agrvate-*`	Agr locus typing results
`spatyper-*`	spa typing results
`sccmec-*`	SCCmec typing results (targets, regions, details)

Antimicrobial Resistance

File	Description
`amrfinderplus.tsv`	AMR gene detection results
`amrfinderplus.mutation.tsv`	AMR point mutation results

Comparative Analysis

File	Description
`*-k21.msh`	Mash sketch files (k=21)
`*-k31.msh`	Mash sketch files (k=31)
`-mash-refseq88-.txt`	Mash screening results against RefSeq
`*.sig`	Sourmash signatures
`sourmash-*.txt`	Sourmash classification results

Merged Results

note

Run-level aggregated results from all samples

File	Description
`merged-assembly-stats.tsv`	Consolidated assembly statistics
`merged-mlst.tsv`	Consolidated MLST results
`staphtyper.tsv`	Consolidated Staphylococcus typing summary

Audit Trail

Below are files that can assist you in understanding which parameters and program versions were used.

Logs

Each process that is executed will have a folder named logs. In this folder are helpful files for you to review if the need ever arises.

Extension	Description
.begin	An empty file used to designate the process started
.err	Contains STDERR outputs from the process
.log	Contains both STDERR and STDOUT outputs from the process
.out	Contains STDOUT outputs from the process
.run	The script Nextflow uses to stage/unstage files and queue processes based on given profile
.sh	The script executed by bash for the process
.trace	The Nextflow trace report for the process
versions.yml	A YAML formatted file with program versions

Nextflow Reports

These Nextflow reports provide great a great summary of your run. These can be used to optimize resource usage and estimate expected costs if using cloud platforms.

Filename	Description
staphopia-dag.dot	The Nextflow DAG visualization
staphopia-report.html	The Nextflow Execution Report
staphopia-timeline.html	The Nextflow Timeline Report
staphopia-trace.txt	The Nextflow Trace report

Parameters

Required Parameters

The following parameters are how you will provide either local or remote samples to be processed by Bactopia.

Parameter	Type	Default	Description
`--samples`	string		A FOFN (via bactopia prepare) with sample names and paths to FASTQ/FASTAs to process

`--r1`	string		First set of compressed (gzip) Illumina paired-end FASTQ reads (requires --r2 and --sample)
`--r2`	string		Second set of compressed (gzip) Illumina paired-end FASTQ reads (requires --r1 and --sample)
`--se`	string		Compressed (gzip) Illumina single-end FASTQ reads (requires --sample)
`--ont`	string		Compressed (gzip) Oxford Nanopore FASTQ reads (requires --sample)
`--hybrid`	boolean	`false`	Create hybrid assembly using Unicycler. (requires --r1, --r2, --ont and --sample)
`--short_polish`	boolean	`false`	Create hybrid assembly from long-read assembly and short read polishing. (requires --r1, --r2, --ont and --sample)
`--sample`	string		Sample name to use for the input sequences

`--accessions`	string		A file containing ENA/SRA Experiment accessions or NCBI Assembly accessions to processed
`--accession`	string		Sample name to use for the input sequences

`--assembly`	string		A assembled genome in compressed FASTA format. (requires --sample)
`--check_samples`	boolean	`false`	Validate the input FOFN provided by --samples

AMRFinder+ Parameters

Parameter	Type	Default	Description
`--amrfinderplus_ident_min`	number	`-1`	Minimum proportion of identical amino acids in alignment for hit (0..1)
`--amrfinderplus_coverage_min`	number	`0.5`	Minimum coverage of the reference protein (0..1)
`--amrfinderplus_organism`	string		Taxonomy group to run additional screens against
`--amrfinderplus_translation_table`	integer	`11`	NCBI genetic code for translated BLAST
`--amrfinderplus_noplus`	boolean	`false`	Disable running AMRFinder+ with the --plus option
`--amrfinderplus_report_common`	boolean	`false`	Report proteins common to a taxonomy group
`--amrfinderplus_report_all_equal`	boolean	`false`	Report all equally-scoring BLAST and HMM matches
`--amrfinderplus_opts`	string		Extra AMRFinder+ options in quotes.
`--amrfinderplus_db`	string		A custom AMRFinder+ database to use, either a tarball or a folder

csvtk concat Parameters

Parameter	Type	Default	Description
`--csvtk_concat_opts`	string		Extra csvtk concat options in quotes

Assembler Parameters

Parameter	Type	Default	Description
`--shovill_assembler`	string	`skesa`	Assembler to be used by Shovill (choices: `skesa`, `megahit`, `spades`, `velvet`)
`--dragonflye_assembler`	string	`flye`	Assembler to be used by Dragonflye (choices: `flye`, `miniasm`, `raven`)
`--use_unicycler`	boolean		Use unicycler for paired end assembly
`--min_contig_len`	integer	`500`	Minimum contig length <0=AUTO>
`--min_contig_cov`	integer	`2`	Minimum contig coverage <0=AUTO>
`--contig_namefmt`	string		Format of contig FASTA IDs in 'printf' style
`--shovill_opts`	string		Extra assembler options in quotes for Shovill
`--shovill_kmers`	string		K-mers to use <blank=AUTO>
`--dragonflye_opts`	string		Extra assembler options in quotes for Dragonflye
`--trim`	boolean		Enable adaptor trimming
`--no_stitch`	boolean		Disable read stitching for paired-end reads
`--no_corr`	boolean		Disable post-assembly correction
`--unicycler_mode`	string	`normal`	Bridging mode used by Unicycler (choices: `conservative`, `normal`, `bold`)
`--min_component_size`	integer	`1000`	Graph dead ends smaller than this size (bp) will be removed from the final graph
`--min_dead_end_size`	integer	`1000`	Graph dead ends smaller than this size (bp) will be removed from the final graph
`--nanohq`	boolean	`false`	For Flye, use '--nano-hq' instead of --nano-raw
`--medaka_model`	string		The model to use for Medaka polishing
`--medaka_rounds`	integer	`0`	The number of Medaka polishing rounds to conduct
`--racon_rounds`	integer	`1`	The number of Racon polishing rounds to conduct
`--no_polish`	boolean		Skip the assembly polishing step
`--no_miniasm`	boolean		Skip miniasm+Racon bridging
`--no_rotate`	boolean		Do not rotate completed replicons to start at a standard gene
`--reassemble`	boolean	`false`	If reads were simulated, they will be used to create a new assembly.
`--polypolish_rounds`	integer	`1`	Number of polishing rounds to conduct with Polypolish for short read polishing
`--pilon_rounds`	integer	`0`	Number of polishing rounds to conduct with Pilon for short read polishing

Gather Parameters

Parameter	Type	Default	Description
`--skip_fastq_check`	boolean		Skip minimum requirement checks for input FASTQs
`--min_basepairs`	integer	`2241820`	The minimum amount of basepairs required to continue downstream analyses.
`--min_reads`	integer	`7472`	The minimum amount of reads required to continue downstream analyses.
`--min_coverage`	integer	`10`	The minimum amount of coverage required to continue downstream analyses.
`--min_proportion`	number	`0.5`	The minimum proportion of basepairs for paired-end reads to continue downstream analyses.
`--min_genome_size`	integer	`100000`	The minimum estimated genome size allowed for the input sequence to continue downstream analyses.
`--max_genome_size`	integer	`18040666`	The maximum estimated genome size allowed for the input sequence to continue downstream analyses.
`--attempts`	integer	`3`	Maximum times to attempt downloads
`--use_ena`	boolean		Download FASTQs from ENA
`--no_cache`	boolean		Skip caching the assembly summary file from ncbi-genome-download

Sketcher Parameters

Parameter	Type	Default	Description
`--sketch_size`	integer	`10000`	Sketch size. Each sketch will have at most this many non-redundant min-hashes.
`--sourmash_scale`	integer	`10000`	Choose number of hashes as 1 in FRACTION of input k-mers
`--no_winner_take_all`	boolean		Disable winner-takes-all strategy for identity estimates
`--screen_i`	number	`0.8`	Minimum identity to report.

MLST Parameters

Parameter	Type	Default	Description
`--mlst_scheme`	string		Don't autodetect, force this scheme on all inputs
`--mlst_minid`	integer	`95`	Minimum DNA percent identity of full allele to consider 'similar'
`--mlst_mincov`	integer	`10`	Minimum DNA percent coverage to report partial allele at all
`--mlst_minscore`	integer	`50`	Minimum score out of 100 to match a scheme
`--mlst_nopath`	boolean	`false`	Strip filename paths from FILE column
`--mlst_db`	string		A custom MLST database to use, either a tarball or a directory

QC Parameters

Parameter	Type	Default	Description
`--use_bbmap`	boolean		Illumina reads will be QC'd using BBMap
`--use_porechop`	boolean	`false`	Use Porechop to remove adapters from ONT reads
`--skip_qc`	boolean		The QC step will be skipped and it will be assumed the inputs sequences have already been QCed.
`--skip_qc_plots`	boolean		QC Plot creation by FastQC or Nanoplot will be skipped
`--skip_error_correction`	boolean		FLASH error correction of reads will be skipped.
`--adapters`	string		A FASTA file containing adapters to remove
`--adapter_k`	integer	`23`	Kmer length used for finding adapters.
`--phix`	string		phiX174 reference genome to remove
`--phix_k`	integer	`31`	Kmer length used for finding phiX174.
`--ktrim`	string	`r`	Trim reads to remove bases matching reference kmers (choices: `f`, `r`, `l`)
`--mink`	integer	`11`	Look for shorter kmers at read tips down to this length, when k-trimming or masking.
`--hdist`	integer	`1`	Maximum Hamming distance for ref kmers (subs only)
`--tpe`	string	`t`	When kmer right-trimming, trim both reads to the minimum length of either (choices: `f`, `t`)
`--tbo`	string	`t`	Trim adapters based on where paired reads overlap (choices: `f`, `t`)
`--qtrim`	string	`rl`	Trim read ends to remove bases with quality below trimq. (choices: `rl`, `f`, `r`, `l`, `w`)
`--trimq`	integer	`6`	Regions with average quality BELOW this will be trimmed if qtrim is set to something other than f
`--maq`	integer	`10`	Reads with average quality (after trimming) below this will be discarded
`--minlength`	integer	`35`	Reads shorter than this after trimming will be discarded
`--ftm`	integer	`5`	If positive, right-trim length to be equal to zero, modulo this number
`--tossjunk`	string	`t`	Discard reads with invalid characters as bases (choices: `f`, `t`)
`--ain`	string	`f`	When detecting pair names, allow identical names (choices: `f`, `t`)
`--qout`	string	`33`	PHRED offset to use for output FASTQs (choices: `33`, `64`)
`--maxcor`	integer	`1`	Max number of corrections within a 20bp window
`--sampleseed`	integer	`42`	Set to a positive number to use as the random number generator seed for sampling
`--ont_minlength`	integer	`1000`	ONT Reads shorter than this will be discarded
`--ont_minqual`	integer	`0`	Minimum average read quality filter of ONT reads
`--porechop_opts`	string		Extra Porechop options in quotes
`--nanoplot_opts`	string		Extra NanoPlot options in quotes
`--bbduk_opts`	string		Extra BBDuk options in quotes
`--fastp_opts`	string		Extra fastp options in quotes

Bakta Download Parameters

Parameter	Type	Default	Description
`--bakta_db`	string		Tarball or path to the Bakta database
`--bakta_db_type`	string	`full`	Which Bakta DB to download 'full' (~30GB) or 'light' (~2GB) (choices: `full`, `light`)
`--bakta_save_as_tarball`	boolean	`false`	Save the Bakta database as a tarball
`--download_bakta`	boolean	`false`	Download the Bakta database to the path given by --bakta_db

Bakta Parameters

Parameter	Type	Default	Description
`--bakta_proteins`	string		FASTA file of trusted proteins to first annotate from
`--bakta_prodigal_tf`	string		Training file to use for Prodigal
`--bakta_replicons`	string		Replicon information table (tsv/csv)
`--bakta_min_contig_length`	integer	`1`	Minimum contig size to annotate
`--bakta_keep_contig_headers`	boolean	`false`	Keep original contig headers
`--bakta_compliant`	boolean	`false`	Force Genbank/ENA/DDJB compliance
`--bakta_skip_trna`	boolean	`false`	Skip tRNA detection & annotation
`--bakta_skip_tmrna`	boolean	`false`	Skip tmRNA detection & annotation
`--bakta_skip_rrna`	boolean	`false`	Skip rRNA detection & annotation
`--bakta_skip_ncrna`	boolean	`false`	Skip ncRNA detection & annotation
`--bakta_skip_ncrna_region`	boolean	`false`	Skip ncRNA region detection & annotation
`--bakta_skip_crispr`	boolean	`false`	Skip CRISPR array detection & annotation
`--bakta_skip_cds`	boolean	`false`	Skip CDS detection & annotation
`--bakta_skip_sorf`	boolean	`false`	Skip sORF detection & annotation
`--bakta_skip_gap`	boolean	`false`	Skip gap detection & annotation
`--bakta_skip_ori`	boolean	`false`	Skip oriC/oriT detection & annotation
`--bakta_opts`	string		Extra Bakta options in quotes. Example: '--gram +'

Prokka Parameters

Parameter	Type	Default	Description
`--prokka_proteins`	string	`${projectDir}/data/proteins.faa`	FASTA file of trusted proteins to first annotate from
`--prokka_prodigal_tf`	string		Training file to use for Prodigal
`--prokka_compliant`	boolean	`false`	Force Genbank/ENA/DDJB compliance
`--prokka_centre`	string	`Bactopia`	Sequencing centre ID
`--prokka_coverage`	integer	`80`	Minimum coverage on query protein
`--prokka_evalue`	string	`1e-09`	Similarity e-value cut-off
`--prokka_opts`	string		Extra Prokka options in quotes.
`--prokka_debug`	boolean	`false`	Enable debug mode for Prokka

AgrVATE Parameters

Parameter	Type	Default	Description
`--agrvate_typing_only`	boolean	`false`	agr typing only. Skips agr operon extraction and frameshift detection

spaTyper Parameters

Parameter	Type	Default	Description
`--spatyper_repeats`	string		List of spa repeats
`--spatyper_repeat_order`	string		List spa types and order of repeats
`--spatyper_do_enrich`	boolean	`false`	Do PCR product enrichment

sccmec Parameters

Parameter	Type	Default	Description
`--sccmec_min_targets_pident`	integer	`90`	Minimum percent identity to count a target hit
`--sccmec_min_targets_coverage`	integer	`80`	Minimum percent coverage to count a target hit
`--sccmec_min_regions_pident`	integer	`85`	Minimum percent identity to count a region hit
`--sccmec_min_regions_coverage`	integer	`93`	Minimum percent coverage to count a region hit

StaphSCAN Parameters

Parameter	Type	Default	Description
`--staphscan_modules`	string		Comma-separated list of modules to run
`--staphscan_db_mlst`	string		Path or tarball to custom MLST database

Dataset Parameters

Define where the pipeline should find input data and save output data.

Parameter	Type	Default	Description
`--species`	string		Name of species for species-specific dataset to use
`--ask_merlin`	boolean		Ask Merlin to execute species specific Bactopia tools based on Mash distances
`--coverage`	integer	`100`	Reduce samples to a given coverage, requires a genome size
`--genome_size`	integer	`0`	Expected genome size (bp) for all samples, required for read error correction and read subsampling
`--use_bakta`	boolean		Use Bakta for annotation, instead of Prokka

Optional Parameters

These optional parameters can be useful in certain settings.

Parameter	Type	Default	Description
`--outdir`	string	`bactopia`	Base directory to write results to
`--skip_compression`	boolean	`false`	Output files will not be compressed
`--datasets`	string		The path to cache datasets to
`--keep_all_files`	boolean	`false`	Keeps all analysis files created

Max Job Request Parameters

Set the top limit for requested resources for any single job.

Parameter	Type	Default	Description
`--max_retry`	integer	`3`	Maximum times to retry a process before allowing it to fail.
`--max_cpus`	integer	`4`	Maximum number of CPUs that can be requested for any single job.
`--max_memory`	string	`128.GB`	Maximum amount of memory that can be requested for any single job.
`--max_time`	string	`240.h`	Maximum amount of time that can be requested for any single job.
`--max_downloads`	integer	`3`	Maximum number of samples to download at a time

Nextflow Configuration Parameters

Parameters to fine-tune your Nextflow setup.

Parameter	Type	Default	Description
`--nfconfig`	string		A Nextflow compatible config file for custom profiles, loaded last and will overwrite existing variables if set.
`--publish_dir_mode`	string	`copy`	Method used to save pipeline results to output directory. (choices: `symlink`, `rellink`, `link`, `copy`, `copyNoFollow`, `move`)
`--infodir`	string	`${params.outdir}/pipeline_info`	Directory to keep pipeline Nextflow logs and reports.
`--force`	boolean	`false`	Nextflow will overwrite existing output files.
`--cleanup_workdir`	boolean	`false`	After Bactopia is successfully executed, the `work` directory will be deleted.

Institutional config options

Parameters used to describe centralized config profiles. These should not be edited.

Parameter	Type	Default	Description
`--custom_config_version`	string	`master`	Git commit id for Institutional configs.
`--custom_config_base`	string	`https://raw.githubusercontent.com/nf-core/configs/master`	Base directory for Institutional configs.
`--config_profile_name`	string		Institutional config name.
`--config_profile_description`	string		Institutional config description.
`--config_profile_contact`	string		Institutional config contact information.
`--config_profile_url`	string		Institutional config URL link.

Nextflow Profile Parameters

Parameters to fine-tune your Nextflow setup.

Parameter	Type	Default	Description
`--condadir`	string		Directory to Nextflow should use for Conda environments
`--registry`	string	`quay.io`	Registry to pull Docker containers from.
`--datasets_cache`	string	`<HOME>/.bactopia/datasets`	Directory where downloaded datasets should be stored.
`--singularity_cache`	string		Directory where remote Singularity images are stored.
`--singularity_pull_docker_container`	boolean		Instead of directly downloading Singularity images for use with Singularity, force the workflow to pull and convert Docker containers instead.
`--force_rebuild`	boolean	`false`	Force overwrite of existing pre-built environments.
`--queue`	string	`general,high-memory`	Comma-separated name of the queue(s) to be used by a job scheduler (e.g. AWS Batch or SLURM)
`--cluster_opts`	string		Additional options to pass to the executor. (e.g. SLURM: '--account=my_acct_name'
`--container_opts`	string		Additional options to pass to Apptainer, Docker, or Singularity. (e.g. Singularity: '-D `pwd`'
`--disable_scratch`	boolean	`false`	All intermediate files created on worker nodes of will be transferred to the head node.

Helpful Parameters

Uncommonly used parameters that might be useful.

Parameter	Type	Default	Description
`--monochrome_logs`	boolean		Do not use coloured log outputs.
`--nfdir`	boolean		Print directory Nextflow has pulled Bactopia to
`--sleep_time`	integer	`5`	The amount of time (seconds) Nextflow will wait after setting up datasets before execution.
`--validate_params`	boolean	`true`	Boolean whether to validate parameters against the schema at runtime
`--help`	boolean		Display help text.
`--wf`	string	`bactopia`	Specify which workflow or Bactopia Tool to execute
`--list_wfs`	boolean		List the available workflows and Bactopia Tools to use with '--wf'
`--show_hidden_params`	boolean		Show all params when using `--help`
`--help_all`	boolean		An alias for --help --show_hidden_params
`--version`	boolean		Display version text.

Composition

This workflow uses the following subworkflows:

amrfinderplus - Find antimicrobial resistance genes and point mutations.
bactopia_assembler - Assemble bacterial genomes using automated assembler selection.
bactopia_datasets - Download and provide pre-compiled datasets required by Bactopia.
bactopia_gather - Search, validate, gather, and standardize input samples.
bactopia_qc - Perform comprehensive quality control on sequencing reads.
bactopia_sketcher - Create genomic sketches and perform rapid taxonomic classification.
bakta - Rapid bacterial genome annotation.
mlst - Determine multilocus sequence types (MLST) from bacterial assemblies.
prokka - Annotate bacterial genomes with functional information.
staphtyper - Determine the agr, spa, SCCmec types and perform genome-based surveillance for Staphylococcus aureus genomes.

Citations

If you use this in your analysis, please cite the following.

Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020)
Staphopia
Petit III RA, Read TD Staphylococcus aureus viewed from the perspective of 40,000+ genomes. PeerJ 6, e5261 (2018)

Source

View source on GitHub

Usage​

Outputs​

Expected Output Files​

Quality Control​

Assembly​

Annotation​

Typing​

Antimicrobial Resistance​

Comparative Analysis​

Merged Results​

Audit Trail​

Logs​

Nextflow Reports​

Parameters​

Required Parameters​

AMRFinder+ Parameters​

csvtk concat Parameters​

Assembler Parameters​

Gather Parameters​

Sketcher Parameters​

MLST Parameters​

QC Parameters​

Bakta Download Parameters​

Bakta Parameters​

Prokka Parameters​

AgrVATE Parameters​

spaTyper Parameters​

sccmec Parameters​

StaphSCAN Parameters​

Composition​

Citations​

Source​