nf-core/epigenomesegmentation
Edit

An nf-core pipeline for epigenome segmentation using EpiSegMix/Meth — a hidden Markov model with flexible read count distributions and state duration modeling for histone, open chromatin, and methylation signals.

epigenomemulti-omics-integrationsegmentation

This is the development version of the pipeline.

Launch development version https://github.com/nf-core/epigenomesegmentation

Introduction

This document describes the output produced by the pipeline.

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

References - Genomic bins and chromosome sizes used for analysis.
IndexFiles - Processed and indexed BAM files.
Counts - Binned count matrices for Histones and Methylation.
EpiSegMix - Trained models, segmentation BED files, and diagnostic plots.
Pipeline information - Report metrics generated during the workflow execution.

References

Output files

References/
- [Genome]/
  - *_bins.bed: Genomic windows (e.g., 200bp) used for signal aggregation.
  - *.chrom.sizes: The chromosome sizes file fetched for the reference genome.

This directory contains the structural files generated during Genome Preparation. These files ensure that all downstream counting and modeling are performed on a consistent genomic coordinate system.

IndexFiles

Output files

IndexFiles/
- [SampleID]/
  - *.nochr.bam: Filtered BAM files used for the counting process.
  - *.nochr.bam.bai: Coordinate-sorted index files for the BAMs.

Processed alignment files that have been filtered (e.g., “nochr” suffix) and indexed to allow for efficient count matrix generation.

Counts

Output files

Counts/
- [SampleID]/
  - [SampleID]_Histones/: Contains binned histone mark counts.
  - *_refined_counts.txt: The final count matrix used as input for the EpiSegMix model.

If --merge is enabled, these matrices will also include WGBS (Methylation) data intersected with the histone bins.

EpiSegMix

This is the core results directory, containing the output of the segmentation modeling. Files are organized by sample and state number (e.g., _s10).

1. Segmentation

Output files

EpiSegMix/[SampleID]/Segmentation/
- *.bed.gz: Compressed BED file containing the genomic coordinates and assigned chromatin states.
- *.txt: A tab-delimited text version of the segmentation results.

2. Models

Output files

EpiSegMix/[SampleID]/Models/
- final-model-*.json: The trained HMM parameters.
- *.yaml: The configuration used for the modeling run.
- *.log: Log files tracking the training and decoding steps.
- *-train-counts.txt: The specific data matrix used during the training phase.

3. Plots

Output files

EpiSegMix/[SampleID]/Plots/
- *-correlation.png: Correlation matrix of input marks.
- *-histogram.png: Signal distribution for each mark.
- *-transitionMatrix.png: Probabilities of transitioning between chromatin states.
- *-meanEmission-viterbi.png / *-normEmission-viterbi.png: Heatmaps showing the signal signature for each state.
- *-stateDistribution-viterbi.png: Percentage of the genome occupied by each state.
- *-viterbi.html: Interactive HTML report for exploring the segmentation results.

Pipeline information

Output files

pipeline_info/
EpiSegMix/[SampleID]/Plots/
- *-correlation.png: Correlation matrix of input marks.
- *-histogram.png: Signal distribution for each mark.
- *-transitionMatrix.png: Probabilities of transitioning between chromatin states.
- *-meanEmission-viterbi.png / *-normEmission-viterbi.png: Heatmaps showing the signal signature for each state.
- *-stateDistribution-viterbi.png: Percentage of the genome occupied by each state.
- *-viterbi.html: Interactive HTML report for exploring the segmentation results.

Pipeline information

Output files

- Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. - Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline. - Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`. - Parameters used by the pipeline run: `params.json`.

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

On this page

nf-core/epigenomesegmentation Edit

Introduction

Pipeline overview

References

IndexFiles

Counts

EpiSegMix

1. Segmentation

2. Models

3. Plots

Pipeline information

Pipeline information

nf-core/epigenomesegmentation
Edit