nf-core/funcscan
(Meta-)genome screening for functional and natural product gene sequences
Define where the pipeline should find input data and save output data.
Path to comma-separated file containing sample names and paths to corresponding FASTA files, and optional annotation files.
string
^\S+\.csv$
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
MultiQC report title. Printed as page header, used for filename if not otherwise specified.
string
These parameters influence which workflow (ARG, AMP and/or BGC) to activate.
Activate antimicrobial peptide genes screening tools.
boolean
Activate antimicrobial resistance gene screening tools.
boolean
Activate biosynthetic gene cluster screening tools.
boolean
These options influence whether to activate the taxonomic classification of the input nucleotide sequences.
Activates the taxonomic classification of input nucleotide sequences.
boolean
Specifies the tool used for taxonomic classification.
string
If MMseqs2 is chosen as taxonomic classification tool: Specifies if the output of all MMseqs2 subcommands shall be compressed.
integer
These parameters influence the database to be used in classifying the taxonomy.
Specify a path to MMseqs2-formatted database.
string
Specify the label of the database to be used.
string
Kalamari
Specify whether the temporary files should be saved.
boolean
These parameters influence the taxonomic classification step.
Specify whether to save the temporary files.
boolean
Specify the alignment type between database and query.
integer
2
Specify the taxonomic levels to display in the result table.
string
kingdom,phylum,class,order,family,genus,species
Specify whether to include or remove the taxonomic lineage.
integer
1
Specify the speed and sensitivity for taxonomy assignment.
number
5
Specify the ORF search sensitivity in the prefilter step.
number
2
Specify the mode to assign the taxonomy.
integer
3
Specify the weights of the taxonomic assignment.
integer
1
These options influence the generation of annotation files required for downstream steps in ARG, AMP, and BGC workflows.
Specify which annotation tool to use for some downstream tools.
string
Specify whether to save gene annotations in the results directory.
boolean
BAKTA is a tool developed to annotate bacterial genomes and plasmids from both isolates and MAGs. More info: https://github.com/oschwengers/bakta
Specify a path to a local copy of a BAKTA database.
string
Download full or light version of the Bakta database if not supplying own database.
string
Use the default genome-length optimised mode (rather than the metagenome mode).
boolean
Specify the minimum contig size.
integer
1
Specify the genetic code translation table.
integer
11
Specify the type of bacteria to be annotated to detect signaling peptides.
string
Specify that all contigs are complete replicons.
boolean
Changes the original contig headers.
boolean
Clean the result annotations to standardise them to Genbank/ENA conventions.
boolean
Activate tRNA detection & annotation.
boolean
Activate tmRNA detection & annotation.
boolean
Activate rRNA detection & annotation.
boolean
Activate ncRNA detection & annotation.
boolean
Activate ncRNA region detection & annotation.
boolean
Activate CRISPR array detection & annotation.
boolean
Skip CDS detection & annotation.
boolean
Activate pseudogene detection & annotation.
boolean
Skip sORF detection & annotation.
boolean
Activate gap detection & annotation.
boolean
Activate oriC/oriT detection & annotation.
boolean
Activate generation of circular genome plots.
boolean
Supply a path of an HMM file of trusted hidden markov models in HMMER format for CDS annotation
string
Prokka annotates genomic sequences belonging to bacterial, archaeal and viral genomes. More info: https://github.com/tseemann/prokka
Use the default genome-length optimised mode (rather than the metagenome mode).
boolean
Suppress the default clean-up of the gene annotations.
boolean
Specify the kingdom that the input represents.
string
Specify the translation table used to annotate the sequences.
integer
11
Minimum contig size required for annotation (bp).
integer
1
E-value cut-off.
number
0.000001
Set the assigned minimum coverage.
integer
80
Allow transfer RNA (trRNA) to overlap coding sequences (CDS).
boolean
Use RNAmmer for rRNA prediction.
boolean
Force contig name to Genbank/ENA/DDJB naming rules.
boolean
true
Add the gene features for each CDS hit.
boolean
Retains contig names.
boolean
Prodigal is a protein-coding gene prediction tool developed to run on bacterial and archaeal genomes. More info: https://github.com/hyattpd/prodigal/wiki
Specify whether to use Prodigal’s single-genome mode for long sequences.
boolean
Does not allow partial genes on contig edges.
boolean
Specifies the translation table used for gene annotation.
integer
11
Forces Prodigal to scan for motifs.
boolean
Pyrodigal is a resource-optimized wrapper around Prodigal, producing protein-coding gene predictions of bacterial and archaeal genomes. Read more at the Pyrodigal GitHub repository (https://github.com/althonos/pyrodigal) or its documentation (https://pyrodigal.readthedocs.io).
Specify whether to use Pyrodigal’s single-genome mode for long sequences.
boolean
Does not allow partial genes on contig edges.
boolean
Specifies the translation table used for gene annotation.
integer
11
Forces Pyrodigal to scan for motifs.
boolean
This forces Pyrodigal to append asterisks (*
) as stop codon indicators. Do not use when running AMP workflow.
boolean
Functionally annotates all annotated coding regions.
Activates the functional annotation of annotated coding regions to provide more information about the codon regions classified.
boolean
Specifies the tool used for further protein annotation.
string
Change the database version used for annotation.
string
https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.72-103.0/interproscan-5.72-103.0-64-bit.tar.gz
Path to pre-downloaded InterProScan database.
string
Assigns the database(s) to be used to annotate the coding regions.
string
PANTHER,ProSiteProfiles,ProSitePatterns,Pfam
^\w+(,\w+)*
Pre-calculates residue mutual matches.
boolean
General options for database downloading
Specify whether to save pipeline-downloaded databases in your results directory.
boolean
Antimicrobial Peptide detection using a deep learning model. More info: https://github.com/bcgsc/AMPlify
Skip AMPlify during AMP screening.
boolean
Antimicrobial Peptide detection using machine learning. ampir uses a supervised statistical machine learning approach to predict AMPs. It incorporates two support vector machine classification models, ‘precursor’ and ‘mature’ that have been trained on publicly available antimicrobial peptide data. More info: https://github.com/Legana/ampir
Skip ampir during AMP screening.
boolean
Specify which machine learning classification model to use.
string
Specify minimum protein length for prediction calculation.
integer
10
Antimicrobial Peptide detection based on predefined HMM models. This tool implements methods using probabilistic models called profile hidden Markov models (profile HMMs) to search against a sequence database. More info: http://eddylab.org/software/hmmer/Userguide.pdf
Run hmmsearch during AMP screening.
boolean
Specify path to the AMP hmm model file(s) to search against. Must have quotes if wildcard used.
string
Saves a multiple alignment of all significant hits to a file.
boolean
Save a simple tabular file summarising the per-target output.
boolean
Save a simple tabular file summarising the per-domain output.
boolean
Antimicrobial peptide detection from metagenomes. More info: https://github.com/BigDataBiology/macrel
Skip Macrel during AMP screening.
boolean
Antimicrobial peptides parsing, filtering, and annotating submodule of AMPcombi2. More info: https://github.com/Darcy220606/AMPcombi
The name of the database used to classify the AMPs.
string
The path to the folder containing the reference database files.
string
Specifies the prediction tools’ cut-offs.
number
0.6
Filter out all amino acid fragments shorter than this number.
integer
120
Remove all DRAMP annotations that have an e-value greater than this value.
number
5
Retain HMM hits that have an e-value lower than this.
number
0.06
Assign the number of codons used to look for stop codons, upstream and downstream of the AMP hit.
integer
60
Assign the number of CDSs upstream and downstream of the AMP to look for a transport protein.
integer
11
Remove hits that have no stop codon upstream and downstream of the AMP.
boolean
Assigns the file extension used to identify AMPIR output.
string
.ampir.tsv
Assigns the file extension used to identify AMPLIFY output.
string
.amplify.tsv
Assigns the file extension used to identify MACREL output.
string
.macrel.prediction
Assigns the file extension used to identify HMMER/HMMSEARCH output.
string
.hmmer_hmmsearch.txt
Clusters the AMP candidates identified with AMPcombi. More info: https://github.com/Darcy220606/AMPcombi
MMseqs2 coverage mode.
number
Remove hits that have no stop codon upstream and downstream of the AMP.
number
4
Remove clusters that don’t have more AMP hits than this number.
integer
MMseqs2 clustering mode.
number
1
MMseqs2 alignment coverage.
number
0.8
MMseqs2 sequence identity.
number
0.4
Remove any hits that form a single member cluster.
boolean
Antimicrobial resistance gene detection based on NCBI’s curated Reference Gene Database and curated collection of Hidden Markov Models. identifies AMR genes, resistance-associated point mutations, and select other classes of genes using protein annotations and/or assembled nucleotide sequences. More info: https://github.com/ncbi/amr/wiki
Skip AMRFinderPlus during the ARG screening.
boolean
Specify the path to a local version of the ARMFinderPlus database.
string
Minimum percent identity to reference sequence.
number
-1
Minimum coverage of the reference protein.
number
0.5
Specify which NCBI genetic code to use for translated BLAST.
integer
11
Add the plus genes to the report.
boolean
Add identified column to AMRFinderPlus output.
boolean
Antimicrobial resistance gene detection using a deep learning model. DeepARG is composed of two models for two types of input: short sequence reads and gene-like sequences. In this pipeline we use the ls
model, which is suitable for annotating full sequence genes and to discover novel antibiotic resistance genes from assembled samples. The tool Diamond
is used as an aligner. More info: https://bitbucket.org/gusphdproj/deeparg-ss/src/master
Skip DeepARG during the ARG screening.
boolean
Specify the path to the DeepARG database.
string
Specify the numeric version number of a user supplied DeepaRG database.
integer
2
Specify which model to use (short or long sequences).
string
Specify minimum probability cutoff under which hits are discarded.
number
0.8
Specify E-value cutoff under which hits are discarded.
number
1e-10
Specify percent identity cutoff for sequence alignment under which hits are discarded.
integer
50
Specify alignment read overlap.
number
0.8
Specify minimum number of alignments per entry for DIAMOND step of DeepARG.
integer
1000
Antimicrobial resistance gene detection using a deep learning model. The tool includes developed and optimised models for a number or resistance gene types, and the functionality to create and optimize models of your own choice of resistance genes. More info: https://github.com/fannyhb/fargene
Skip fARGene during the ARG screening.
boolean
Specify comma-separated list of which pre-defined HMM models to screen against
string
class_a,class_b_1_2,class_b_3,class_c,class_d_1,class_d_2,qnr,tet_efflux,tet_rpg,tet_enzyme
Specify to save intermediate temporary files to results directory.
boolean
The threshold score for a sequence to be classified as a (almost) complete gene.
number
The minimum length of a predicted ORF retrieved from annotating the nucleotide sequences.
integer
90
Defines which ORF finding algorithm to use.
boolean
The translation table/format to use for sequence annotation.
string
pearson
Antimicrobial resistance gene detection, based on alignment to the CARD database based on homology and SNP models. More info: https://github.com/arpcard/rgi
Skip RGI during the ARG screening.
boolean
Path to user-defined local CARD database.
string
Save RGI output .json file.
boolean
Specify to save intermediate temporary files in the results directory.
boolean
Specify the alignment tool to be used.
string
Include all of loose, strict and perfect hits (i.e. ≥ 95% identity) found by RGI.
boolean
Suppresses the default behaviour of RGI with --arg_rgi_includeloose
.
boolean
Include screening of low quality contigs for partial genes.
boolean
Specify a more specific data-type of input (e.g. plasmid, chromosome).
string
Run multiple prodigal jobs simultaneously for contigs in a fasta file.
boolean
true
Antimicrobial resistance gene detection based on alignment to CBI, CARD, ARG-ANNOT, ResFinder, MEGARES, EcOH, PlasmidFinder, Ecoli_VF and VFDB. More info: https://github.com/tseemann/abricate
Skip ABRicate during the ARG screening.
boolean
Specify the name of the ABRicate database to use. Names of non-default databases can be supplied if --arg_abricate_db
provided.
string
ncbi
Path to user-defined local ABRicate database directory for using custom databases.
string
Minimum percent identity of alignment required for a hit to be considered.
integer
80
Minimum percent coverage of alignment required for a hit to be considered.
integer
80
Influences parameters required for the ARG summary by hAMRonization.
Specifies summary output format.
string
Influences parameters required for the normalization of ARG annotations by argNorm. More info: https://github.com/BigDataBiology/argNorm
Skip argNorm during ARG screening.
boolean
These parameters influence general BGC settings like minimum input sequence length.
Specify the minimum length of contigs that go into BGC screening.
integer
3000
Specify to save the length-filtered (unannotated) FASTAs used for BGC screening.
boolean
Biosynthetic gene cluster detection. More info: https://docs.antismash.secondarymetabolites.org
Skip antiSMASH during the BGC screening.
boolean
Path to user-defined local antiSMASH database.
string
Path to user-defined local antiSMASH directory. Only required when running with docker/singularity.
string
Minimum length a contig must have to be screened with antiSMASH.
integer
3000
Turn on clusterblast comparison against database of antiSMASH-predicted clusters.
boolean
Turn on clusterblast comparison against known gene clusters from the MIBiG database.
boolean
Turn on clusterblast comparison against known subclusters responsible for synthesising precursors.
boolean
Turn on ClusterCompare comparison against known gene clusters from the MIBiG database.
boolean
Generate phylogenetic trees of secondary metabolite group orthologs.
boolean
Defines which level of strictness to use for HMM-based cluster detection.
string
Run Pfam to Gene Ontology mapping module.
boolean
Run RREFinder precision mode on all RiPP gene clusters.
boolean
Specify which taxonomic classification of input sequence to use.
string
Run TFBS finder on all gene clusters.
boolean
A deep learning genome-mining strategy for biosynthetic gene cluster prediction. More info: https://github.com/Merck/deepbgc/tree/master/deepbgc
Skip DeepBGC during the BGC screening.
boolean
Path to local DeepBGC database folder.
string
Average protein-wise DeepBGC score threshold for extracting BGC regions from Pfam sequences.
number
0.5
Run DeepBGC’s internal Prodigal step in single
mode to restrict detecting genes to long contigs
boolean
Merge detected BGCs within given number of proteins.
integer
Merge detected BGCs within given number of nucleotides.
integer
Minimum BGC nucleotide length.
integer
1
Minimum number of proteins in a BGC.
integer
1
Minimum number of protein domains in a BGC.
integer
1
Minimum number of known biosynthetic (as defined by antiSMASH) protein domains in a BGC.
integer
DeepBGC classification score threshold for assigning classes to BGCs.
number
0.5
Biosynthetic gene cluster detection using Conditional Random Fields (CRFs). More info: https://gecco.embl.de
Skip GECCO during the BGC screening.
boolean
Enable unknown region masking to prevent genes from stretching across unknown nucleotides.
boolean
The minimum number of coding sequences a valid cluster must contain.
integer
3
The p-value cutoff for protein domains to be included.
number
1e-9
The probability threshold for cluster detection.
number
0.8
The minimum number of annotated genes that must separate a cluster from the edge.
integer
Biosynthetic Gene Cluster detection based on predefined HMM models. This tool implements methods using probabilistic models called profile hidden Markov models (profile HMMs) to search against a sequence database. More info: http://eddylab.org/software/hmmer/Userguide.pdf
Run hmmsearch during BGC screening.
boolean
Specify path to the BGC hmm model file(s) to search against. Must have quotes if wildcard used.
string
Saves a multiple alignment of all significant hits to a file.
boolean
Save a simple tabular file summarising the per-target output.
boolean
Save a simple tabular file summarising the per-domain output.
boolean
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string
Less common options for the pipeline, typically set in a config file.
Display version and exit.
boolean
Method used to save pipeline results to output directory.
string
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Send plain-text email instead of HTML.
boolean
File size limit when attaching MultiQC reports to summary emails.
string
25.MB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Do not use coloured log outputs.
boolean
Incoming hook URL for messaging service
string
Custom config file to supply to MultiQC.
string
Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
string
Custom MultiQC yaml file containing HTML including a methods description.
string
Boolean whether to validate parameters against the schema at runtime
boolean
true
Base URL or local path to location of pipeline test dataset files
string
https://raw.githubusercontent.com/nf-core/test-datasets/
Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
string