funcscan: Parameters

Define where the pipeline should find input data and save output data.

Path to comma-separated file containing sample names and paths to corresponding FASTA files, and optional annotation files.

required

type: string

pattern: ^\S+\.csv$

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

These parameters influence which workflow (ARG, AMP and/or BGC) to activate.

Activate antimicrobial peptide genes screening tools.

type: boolean

Activate antimicrobial resistance gene screening tools.

type: boolean

Activate biosynthetic gene cluster screening tools.

type: boolean

These options influence whether to activate the taxonomic classification of the input nucleotide sequences.

Activates the taxonomic classification of input nucleotide sequences.

type: boolean

Specifies the tool used for taxonomic classification.

type: string

If MMseqs2 is chosen as taxonomic classification tool: Specifies if the output of all MMseqs2 subcommands shall be compressed.

type: integer

These parameters influence the database to be used in classifying the taxonomy.

Specify a path to MMseqs2-formatted database.

type: string

Specify the label of the database to be used.

type: string

default: Kalamari

Specify whether the temporary files should be saved.

type: boolean

These parameters influence the taxonomic classification step.

Specify whether to save the temporary files.

type: boolean

Specify the alignment type between database and query.

type: integer

default: 2

Specify the taxonomic levels to display in the result table.

type: string

default: kingdom,phylum,class,order,family,genus,species

Specify whether to include or remove the taxonomic lineage.

type: integer

default: 1

Specify the speed and sensitivity for taxonomy assignment.

type: number

default: 5

Specify the ORF search sensitivity in the prefilter step.

type: number

default: 2

Specify the mode to assign the taxonomy.

type: integer

default: 3

Specify the weights of the taxonomic assignment.

type: integer

default: 1

These options influence the generation of annotation files required for downstream steps in ARG, AMP, and BGC workflows.

Specify which annotation tool to use for some downstream tools.

type: string

Specify whether to save gene annotations in the results directory.

type: boolean

BAKTA is a tool developed to annotate bacterial genomes and plasmids from both isolates and MAGs. More info: https://github.com/oschwengers/bakta

Specify a path to a local copy of a BAKTA database.

type: string

Download full or light version of the Bakta database if not supplying own database.

type: string

Use the default genome-length optimised mode (rather than the metagenome mode).

type: boolean

Specify the minimum contig size.

type: integer

default: 1

Specify the genetic code translation table.

type: integer

default: 11

Specify the type of bacteria to be annotated to detect signaling peptides.

type: string

Specify that all contigs are complete replicons.

type: boolean

Changes the original contig headers.

type: boolean

Clean the result annotations to standardise them to Genbank/ENA conventions.

type: boolean

Activate tRNA detection & annotation.

type: boolean

Activate tmRNA detection & annotation.

type: boolean

Activate rRNA detection & annotation.

type: boolean

Activate ncRNA detection & annotation.

type: boolean

Activate ncRNA region detection & annotation.

type: boolean

Activate CRISPR array detection & annotation.

type: boolean

Skip CDS detection & annotation.

type: boolean

Activate pseudogene detection & annotation.

type: boolean

Skip sORF detection & annotation.

type: boolean

Activate gap detection & annotation.

type: boolean

Activate oriC/oriT detection & annotation.

type: boolean

Activate generation of circular genome plots.

type: boolean

Supply a path of an HMM file of trusted hidden markov models in HMMER format for CDS annotation

type: string

Prokka annotates genomic sequences belonging to bacterial, archaeal and viral genomes. More info: https://github.com/tseemann/prokka

Use the default genome-length optimised mode (rather than the metagenome mode).

type: boolean

Suppress the default clean-up of the gene annotations.

type: boolean

Specify the kingdom that the input represents.

type: string

Specify the translation table used to annotate the sequences.

type: integer

default: 11

Minimum contig size required for annotation (bp).

type: integer

default: 1

E-value cut-off.

type: number

default: 0.000001

Set the assigned minimum coverage.

type: integer

default: 80

Allow transfer RNA (trRNA) to overlap coding sequences (CDS).

type: boolean

Use RNAmmer for rRNA prediction.

type: boolean

Force contig name to Genbank/ENA/DDJB naming rules.

type: boolean

default: true

Add the gene features for each CDS hit.

type: boolean

Retains contig names.

type: boolean

Prodigal is a protein-coding gene prediction tool developed to run on bacterial and archaeal genomes. More info: https://github.com/hyattpd/prodigal/wiki

Specify whether to use Prodigal’s single-genome mode for long sequences.

type: boolean

Does not allow partial genes on contig edges.

type: boolean

Specifies the translation table used for gene annotation.

type: integer

default: 11

Forces Prodigal to scan for motifs.

type: boolean

Pyrodigal is a resource-optimized wrapper around Prodigal, producing protein-coding gene predictions of bacterial and archaeal genomes. Read more at the Pyrodigal GitHub repository (https://github.com/althonos/pyrodigal) or its documentation (https://pyrodigal.readthedocs.io).

Specify whether to use Pyrodigal’s single-genome mode for long sequences.

type: boolean

Does not allow partial genes on contig edges.

type: boolean

Specifies the translation table used for gene annotation.

type: integer

default: 11

Forces Pyrodigal to scan for motifs.

type: boolean

This forces Pyrodigal to append asterisks (*) as stop codon indicators. Do not use when running AMP workflow.

type: boolean

Functionally annotates all annotated coding regions.

Activates the functional annotation of annotated coding regions to provide more information about the codon regions classified.

type: boolean

Specifies the tool used for further protein annotation.

type: string

Change the database version used for annotation.

type: string

default: https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.72-103.0/interproscan-5.72-103.0-64-bit.tar.gz

Path to pre-downloaded InterProScan database.

type: string

Assigns the database(s) to be used to annotate the coding regions.

type: string

default: PANTHER,ProSiteProfiles,ProSitePatterns,Pfam

pattern: ^\w+(,\w+)*

Pre-calculates residue mutual matches.

type: boolean

General options for database downloading

Specify whether to save pipeline-downloaded databases in your results directory.

type: boolean

Antimicrobial Peptide detection using a deep learning model. More info: https://github.com/bcgsc/AMPlify

Skip AMPlify during AMP screening.

type: boolean

Antimicrobial Peptide detection using machine learning. ampir uses a supervised statistical machine learning approach to predict AMPs. It incorporates two support vector machine classification models, ‘precursor’ and ‘mature’ that have been trained on publicly available antimicrobial peptide data. More info: https://github.com/Legana/ampir

Skip ampir during AMP screening.

type: boolean

Specify which machine learning classification model to use.

type: string

Specify minimum protein length for prediction calculation.

type: integer

default: 10

Antimicrobial Peptide detection based on predefined HMM models. This tool implements methods using probabilistic models called profile hidden Markov models (profile HMMs) to search against a sequence database. More info: http://eddylab.org/software/hmmer/Userguide.pdf

Run hmmsearch during AMP screening.

type: boolean

Specify path to the AMP hmm model file(s) to search against. Must have quotes if wildcard used.

type: string

Saves a multiple alignment of all significant hits to a file.

type: boolean

Save a simple tabular file summarising the per-target output.

type: boolean

Save a simple tabular file summarising the per-domain output.

type: boolean

Antimicrobial peptide detection from metagenomes. More info: https://github.com/BigDataBiology/macrel

Skip Macrel during AMP screening.

type: boolean

Antimicrobial peptides parsing, filtering, and annotating submodule of AMPcombi2. More info: https://github.com/Darcy220606/AMPcombi

The name of the database used to classify the AMPs.

type: string

The path to the folder containing the reference database files.

type: string

Specifies the prediction tools’ cut-offs.

type: number

default: 0.6

Filter out all amino acid fragments shorter than this number.

type: integer

default: 120

Remove all DRAMP annotations that have an e-value greater than this value.

type: number

default: 5

Retain HMM hits that have an e-value lower than this.

type: number

default: 0.06

Assign the number of codons used to look for stop codons, upstream and downstream of the AMP hit.

type: integer

default: 60

Assign the number of CDSs upstream and downstream of the AMP to look for a transport protein.

type: integer

default: 11

Remove hits that have no stop codon upstream and downstream of the AMP.

type: boolean

Assigns the file extension used to identify AMPIR output.

type: string

default: .ampir.tsv

Assigns the file extension used to identify AMPLIFY output.

type: string

default: .amplify.tsv

Assigns the file extension used to identify MACREL output.

type: string

default: .macrel.prediction

Assigns the file extension used to identify HMMER/HMMSEARCH output.

type: string

default: .hmmer_hmmsearch.txt

Clusters the AMP candidates identified with AMPcombi. More info: https://github.com/Darcy220606/AMPcombi

MMseqs2 coverage mode.

type: number

Remove hits that have no stop codon upstream and downstream of the AMP.

type: number

default: 4

Remove clusters that don’t have more AMP hits than this number.

type: integer

MMseqs2 clustering mode.

type: number

default: 1

MMseqs2 alignment coverage.

type: number

default: 0.8

MMseqs2 sequence identity.

type: number

default: 0.4

Remove any hits that form a single member cluster.

type: boolean

Antimicrobial resistance gene detection based on NCBI’s curated Reference Gene Database and curated collection of Hidden Markov Models. identifies AMR genes, resistance-associated point mutations, and select other classes of genes using protein annotations and/or assembled nucleotide sequences. More info: https://github.com/ncbi/amr/wiki

Skip AMRFinderPlus during the ARG screening.

type: boolean

Specify the path to a local version of the ARMFinderPlus database.

type: string

Minimum percent identity to reference sequence.

type: number

default: -1

Minimum coverage of the reference protein.

type: number

default: 0.5

Specify which NCBI genetic code to use for translated BLAST.

type: integer

default: 11

Add the plus genes to the report.

type: boolean

Add identified column to AMRFinderPlus output.

type: boolean

Antimicrobial resistance gene detection using a deep learning model. DeepARG is composed of two models for two types of input: short sequence reads and gene-like sequences. In this pipeline we use the ls model, which is suitable for annotating full sequence genes and to discover novel antibiotic resistance genes from assembled samples. The tool Diamond is used as an aligner. More info: https://bitbucket.org/gusphdproj/deeparg-ss/src/master

Skip DeepARG during the ARG screening.

type: boolean

Specify the path to the DeepARG database.

type: string

Specify the numeric version number of a user supplied DeepaRG database.

type: integer

default: 2

Specify which model to use (short or long sequences).

type: string

Specify minimum probability cutoff under which hits are discarded.

type: number

default: 0.8

Specify E-value cutoff under which hits are discarded.

type: number

default: 1e-10

Specify percent identity cutoff for sequence alignment under which hits are discarded.

type: integer

default: 50

Specify alignment read overlap.

type: number

default: 0.8

Specify minimum number of alignments per entry for DIAMOND step of DeepARG.

type: integer

default: 1000

Antimicrobial resistance gene detection using a deep learning model. The tool includes developed and optimised models for a number or resistance gene types, and the functionality to create and optimize models of your own choice of resistance genes. More info: https://github.com/fannyhb/fargene

Skip fARGene during the ARG screening.

type: boolean

Specify comma-separated list of which pre-defined HMM models to screen against

type: string

default: class_a,class_b_1_2,class_b_3,class_c,class_d_1,class_d_2,qnr,tet_efflux,tet_rpg,tet_enzyme

Specify to save intermediate temporary files to results directory.

type: boolean

The threshold score for a sequence to be classified as a (almost) complete gene.

type: number

The minimum length of a predicted ORF retrieved from annotating the nucleotide sequences.

type: integer

default: 90

Defines which ORF finding algorithm to use.

type: boolean

The translation table/format to use for sequence annotation.

type: string

default: pearson

Antimicrobial resistance gene detection, based on alignment to the CARD database based on homology and SNP models. More info: https://github.com/arpcard/rgi

Skip RGI during the ARG screening.

type: boolean

Path to user-defined local CARD database.

type: string

Save RGI output .json file.

type: boolean

Specify to save intermediate temporary files in the results directory.

type: boolean

Specify the alignment tool to be used.

type: string

Include all of loose, strict and perfect hits (i.e. ≥ 95% identity) found by RGI.

type: boolean

Suppresses the default behaviour of RGI with --arg_rgi_includeloose.

type: boolean

Include screening of low quality contigs for partial genes.

type: boolean

Specify a more specific data-type of input (e.g. plasmid, chromosome).

type: string

Run multiple prodigal jobs simultaneously for contigs in a fasta file.

type: boolean

default: true

Antimicrobial resistance gene detection based on alignment to CBI, CARD, ARG-ANNOT, ResFinder, MEGARES, EcOH, PlasmidFinder, Ecoli_VF and VFDB. More info: https://github.com/tseemann/abricate

Skip ABRicate during the ARG screening.

type: boolean

Specify the name of the ABRicate database to use. Names of non-default databases can be supplied if --arg_abricate_db provided.

type: string

default: ncbi

Path to user-defined local ABRicate database directory for using custom databases.

type: string

Minimum percent identity of alignment required for a hit to be considered.

type: integer

default: 80

Minimum percent coverage of alignment required for a hit to be considered.

type: integer

default: 80

Influences parameters required for the ARG summary by hAMRonization.

Specifies summary output format.

type: string

Influences parameters required for the normalization of ARG annotations by argNorm. More info: https://github.com/BigDataBiology/argNorm

Skip argNorm during ARG screening.

type: boolean

These parameters influence general BGC settings like minimum input sequence length.

Specify the minimum length of contigs that go into BGC screening.

type: integer

default: 3000

Specify to save the length-filtered (unannotated) FASTAs used for BGC screening.

type: boolean

Biosynthetic gene cluster detection. More info: https://docs.antismash.secondarymetabolites.org

Skip antiSMASH during the BGC screening.

type: boolean

Path to user-defined local antiSMASH database.

type: string

Path to user-defined local antiSMASH directory. Only required when running with docker/singularity.

type: string

Minimum length a contig must have to be screened with antiSMASH.

type: integer

default: 3000

Turn on clusterblast comparison against database of antiSMASH-predicted clusters.

type: boolean

Turn on clusterblast comparison against known gene clusters from the MIBiG database.

type: boolean

Turn on clusterblast comparison against known subclusters responsible for synthesising precursors.

type: boolean

Turn on ClusterCompare comparison against known gene clusters from the MIBiG database.

type: boolean

Generate phylogenetic trees of secondary metabolite group orthologs.

type: boolean

Defines which level of strictness to use for HMM-based cluster detection.

type: string

Run Pfam to Gene Ontology mapping module.

type: boolean

Run RREFinder precision mode on all RiPP gene clusters.

type: boolean

Specify which taxonomic classification of input sequence to use.

type: string

Run TFBS finder on all gene clusters.

type: boolean

A deep learning genome-mining strategy for biosynthetic gene cluster prediction. More info: https://github.com/Merck/deepbgc/tree/master/deepbgc

Skip DeepBGC during the BGC screening.

type: boolean

Path to local DeepBGC database folder.

type: string

Average protein-wise DeepBGC score threshold for extracting BGC regions from Pfam sequences.

type: number

default: 0.5

Run DeepBGC’s internal Prodigal step in single mode to restrict detecting genes to long contigs

type: boolean

Merge detected BGCs within given number of proteins.

type: integer

Merge detected BGCs within given number of nucleotides.

type: integer

Minimum BGC nucleotide length.

type: integer

default: 1

Minimum number of proteins in a BGC.

type: integer

default: 1

Minimum number of protein domains in a BGC.

type: integer

default: 1

Minimum number of known biosynthetic (as defined by antiSMASH) protein domains in a BGC.

type: integer

DeepBGC classification score threshold for assigning classes to BGCs.

type: number

default: 0.5

Biosynthetic gene cluster detection using Conditional Random Fields (CRFs). More info: https://gecco.embl.de

Skip GECCO during the BGC screening.

type: boolean

Enable unknown region masking to prevent genes from stretching across unknown nucleotides.

type: boolean

The minimum number of coding sequences a valid cluster must contain.

type: integer

default: 3

The p-value cutoff for protein domains to be included.

type: number

default: 1e-9

The probability threshold for cluster detection.

type: number

default: 0.8

The minimum number of annotated genes that must separate a cluster from the edge.

type: integer

Biosynthetic Gene Cluster detection based on predefined HMM models. This tool implements methods using probabilistic models called profile hidden Markov models (profile HMMs) to search against a sequence database. More info: http://eddylab.org/software/hmmer/Userguide.pdf

Run hmmsearch during BGC screening.

type: boolean

Specify path to the BGC hmm model file(s) to search against. Must have quotes if wildcard used.

type: string

Saves a multiple alignment of all significant hits to a file.

type: boolean

Save a simple tabular file summarising the per-target output.

type: boolean

Save a simple tabular file summarising the per-domain output.

type: boolean

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden

type: boolean

Incoming hook URL for messaging service

hidden

type: string

Custom config file to supply to MultiQC.

hidden

type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden

type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Base URL or local path to location of pipeline test dataset files

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

hidden

type: string

nf-core/funcscan