nf-core/variantprioritization
Analysis pipeline for the functional annotation and translation of somatic SNVs/InDels and copy number abberations for precision cancer medicine.
Introduction
nf-core/variantprioritization is a bioinformatics analysis pipeline for the functional annotation and translation of somatic SNVs/InDels and copy number abberations for precision cancer medicine using [Personal Cancer Genome Reporter (PCGR)]. nf-core/variantprioritization offers germline SNVs/INDELS intepretation and annotation using Cancer Predisposition Sequencing Reporter (CPSR).
The workflow has been designed to accept outputs generated by nf-core/sarek:
| Tool | Germline | Somatic tumor-normal | Somatic tumor-only |
|---|---|---|---|
| ASCAT | ✔️ | ✔️ | |
| DeepVariant | ✔️ | ||
| HaplotypeCaller | ✔️ | ||
| Mutect2 | ✔️ | ✔️ | |
| Strelka somatic indels | ✔️ | ||
| Strelka somatic snvs | ✔️ |
Usage
The workflow accepts as input a samplesheet.csv file containing the paths to SNV/InDel VCF files and ASCAT copy number abberation files. We have efforted to mimick the samplesheet specifications of nf-core/sarek for ease of use:
| Column | Description |
|---|---|
| patient | Designates the patient/subject; must be unique for each patient, but one patient can have multiple samples |
| status | Normal/tumor (0/1) status of sample |
| sample | Designates the sample ID; must be unique. A patient may have multiple samples e.g a paired tumor-normal, tumor-only. |
| vcf | Full path to VCF file(s) |
| cna | Full path to segment file |
An example of a valid samplesheet is given below:
patient,status,sample,vcf,cna
HCC1395,1,HCC1395T,HCC1395T_vs_HCC1395N.mutect2.vcf.gz,HCC1395T.segments.txt
HCC1395,1,HCC1395T,HCC1395T_vs_HCC1395N.freebayes.vcf.gz,HCC1395T.segments.txt
HCC1395,1,HCC1395T,HCC1395T_vs_HCC1395N.strelka.somatic_snvs.vcf.gz,HCC1395T.segments.txt
HCC1395,1,HCC1395T,HCC1395T_vs_HCC1395N.strelka.somatic_indels.vcf.gz,HCC1395T.segments.txt
HCC1395,0,HCC1395N,HCC1395N.deepvariant.vcf.gz,
HCC1395,0,HCC1395N,HCC1395N.haplotypecaller.vcf.gz,
HCC1396,1,HCC1396T,HCC1396T_vs_HCC1396N.mutect2.vcf.gz,
HCC1396,1,HCC1396T,HCC1396T_vs_HCC1396N.strelka.somatic_snvs.vcf.gz,
HCC1396,1,HCC1396T,HCC1396T_vs_HCC1396N.strelka.somatic_indels.vcf.gz,copy number abberation files must be present for every sample entry when
--cna_analysis true.
Now, you can run the pipeline using:
nextflow run nf-core/variantprioritization \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.
For more details and further functionality, please refer to the usage documentation and the parameter documentation.
Credits
nf-core/variantprioritization was originally written by @barrydigby, @yussab and @matbonfanti. @famosab joined to adapt the pipeline to nf-core standards towards a first release.
We thank the following people for their extensive assistance in the development of this pipeline:
Contributions and Support
Please open an issue or reach out to me (Youssef Abili) on the nf-core slack channel.
I am interested in adding compatability for additional variant calling tools and optimising the intake of large VCF files.
Citations
Cancer Predisposition Sequencing Reporter (CPSR): A flexible variant report engine for high-throughput germline screening in cancer Nakken S, Saveliev V, Hofmann O, Møller P, Myklebost O, Hovig E.
Int J Cancer. 2021 Dec 1;149(11):1955-1960. doi:10.1002/ijc.33749
Personal Cancer Genome Reporter: variant interpretation report for precision oncology Nakken S, Fournous G, Vodák D, Aasheim LB, Myklebost O, Hovig E.
Bioinformatics. 2018 May 15;34(10):1778-1780. doi: 10.1093/bioinformatics/btx817
Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants Garcia M, Juhos S, Larsson M, Olason PI, Martin M, Eisfeldt J, DiLorenzo S, Sandgren J, Díaz De Ståhl T, Ewels P, Wirta V, Nistér M, Käller M, Nystedt B.
F1000Res. 2020 Jan 29;9:63. doi: 10.12688/f1000research.16665.2
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Aln