nf-core/configs: IRIS Configuration

All nf-core pipelines have been successfully configured for use on the IRIS cluster at Memorial Sloan Kettering Cancer Center (MSKCC).

To use, run the pipeline with -profile iris. This will download and launch the iris.config which has been pre-configured with a setup suitable for the IRIS cluster. Using this profile, Singularity images containing all required software will be pulled from our local library or downloaded and cached before execution of the pipeline.

Before running the pipeline

Before running a pipeline for the first time, you will need to ensure that right version of Java, Nextflow and Singularity are available on the cluster. The IRIS cluster uses the SLURM job scheduler, and Nextflow will automatically submit jobs via SLURM.

Load Java and Singularity

module load java/23.0.1

Singularity 4.1 should be loaded by default at /usr/bin/singularity

Install Nextflow

You can install nextflow by running:

curl -s https://get.nextflow.io | bash
chmod +x nextflow

Running the pipeline

A typical command to run an nf-core pipeline on IRIS would look like:

nextflow run nf-core/<PIPELINE> -profile iris [additional pipeline parameters]

Optional Parameters

The IRIS config provides several optional parameters to customize job submission and paths:

  • --group: Your IRIS group name (e.g., core006). When specified, sets the default working directory to /scratch/<YOUR_GROUP>/work. If not specified, the working directory defaults to ./work in your current directory.
  • --partition: Specify a SLURM partition (default: uses $NXF_SLURM_PARTITION environment variable or cpu)
  • --qos: Set Quality of Service specification for SLURM jobs (e.g., priority)
  • --preemptable: Set to true to use preemptable queues for faster job submission (default: false)
  • --isolated: Set to true to restrict jobs to only the specified partition (default: false)

Example Commands

Basic usage:

nextflow run nf-core/rnaseq -profile iris --input samplesheet.csv --genome GRCh38

Using the group parameter to set default working directory:

nextflow run nf-core/rnaseq -profile iris --group mygroup --input samplesheet.csv --genome GRCh38

Explicitly setting work and output directories:

nextflow run nf-core/rnaseq -profile iris \
  -work-dir /scratch/mygroup/work \
  --outdir /data1/mygroup/results \
  --input samplesheet.csv --genome GRCh38

Using preemptable queue for faster submission:

nextflow run nf-core/rnaseq -profile iris --preemptable true --input samplesheet.csv --genome GRCh38

Using a QoS for priority:

nextflow run nf-core/rnaseq -profile iris --partition cpu --qos priority --input samplesheet.csv --genome GRCh38

Cluster Details

Resource Limits

The IRIS config sets the following maximum resource limits:

  • CPUs: 52 cores per job
  • Memory: 550 GB per job
  • Time: 7 days per job

Queue Selection

The config automatically selects appropriate SLURM queues based on job requirements:

  • cpushort: Jobs with runtime ≤ 2 hours (CPU only)
  • gpushort: GPU jobs with runtime ≤ 2 hours
  • gpu: Regular GPU jobs
  • cpu_highmem: Jobs requiring ≥ 512 GB memory or ≥ 50 GB per CPU
  • preemptable: Use the preemptable queue when --preemptable true is set
  • cpu: Default queue for standard CPU jobs

GPU Support

The config includes support for GPU jobs. Processes labeled with process_gpu or process_gpu_low will automatically:

  • Request GPU resources via SLURM (--gres=gpu:1)
  • Use appropriate GPU queues (gpu or gpushort)
  • Enable GPU support in Singularity containers (--nv flag)

Proactive Resource Detection

The system also monitors resource usage patterns to prevent failures:

  • Near Out of Memory: When peak RSS reaches ≥80% of allocated memory
  • Near Out of Time: When realtime reaches ≥80% of allocated time
  • CPU Starved: When CPU usage reaches ≥80% of available CPU capacity

These conditions trigger proactive resource increases even if the job completes successfully, helping prevent failures in subsequent similar jobs.

Process Labels

The config defines several process labels with default resources that scale automatically:

LabelCPUsMemoryTime
process_single11 GB4 h
process_low212 GB2 h
process_medium636 GB8 h
process_high1272 GB16 h
process_long212 GB20 h
process_high_memory6200 GB8 h
process_gpu625 GB8 h
process_gpu_low625 GB2 h

These are starting values that will be automatically increased on retry if needed.

Singularity Configuration

The config uses Singularity for containerization with the following settings:

  • Cache Directory: Automatically set based on working directory or $NXF_SINGULARITY_CACHEDIR
  • Library Directory: Uses the shared library, /data1/core006/resources/singularity_image_library (or $NXF_SINGULARITY_LIBRARYDIR)
  • Auto-mounting: Enabled for seamless file access
  • Scratch Space: Uses /localscratch when available

Working Directory

  • If the working directory is not set it is automatically configured based on your group (via —group <YOUR_GROUP>).
  • Otherwise, the work directory is ./work in your current directory
  • Automatic cleanup is enabled when using /scratch to save space

Automatic Resource Management

The IRIS config includes intelligent retry logic that automatically adjusts resources when jobs fail. The system monitors job execution and dynamically scales resources based on failure patterns and resource utilization.

Retry Strategy Overview

  • Jobs are automatically retried up to 3 times on failure
  • Resources are dynamically increased based on the failure type and attempt number
  • The system uses both multiplicative (scales with attempt) and additive (fixed increment) strategies

Resource Scaling Logic

Memory Scaling

Memory is increased based on the failure type:

Failure ConditionAttempt 2Attempt 3Attempt 4+
Out of Memory (exit 125, 137)Previous + 10 GBPrevious + 20 GBPrevious + 30 GB
Out of Time (exit 15, 140)Previous + 4 GBPrevious + 8 GBPrevious + 12 GB
Near Out of Memory (≥80% used)Previous + 4 GBPrevious + 8 GBPrevious + 12 GB
Other failuresPrevious + 2 GBPrevious + 4 GBPrevious + 10 GB

Formula: new_memory = previous_memory + (multiplier × attempt) + base_increment

CPU Scaling

CPUs are increased when jobs are time-constrained or CPU-starved:

Failure ConditionAttempt 2Attempt 3Attempt 4+
Out of Time (exit 15, 140)Previous + 1Previous + 2Previous + 3
Near Out of Time (≥80% used)Previous + 1Previous + 2Previous + 3
CPU Starved (≥80% CPU usage)Previous + 2Previous + 4Previous + 5
Other failuresPreviousPreviousPrevious + 1

Formula: new_cpus = previous_cpus + (multiplier × attempt) + base_increment

Time Scaling

Runtime limits are increased for time-related failures:

Failure ConditionAttempt 2Attempt 3Attempt 4+
Out of Time (exit 15, 140)Previous + 12 hPrevious + 24 hPrevious + 36 h
Near Out of Time (≥80% used)Previous + 12 hPrevious + 24 hPrevious + 36 h
Other failuresPrevious + 2 hPrevious + 4 hPrevious + 1 d

Formula: new_time = previous_time + (multiplier × attempt) + base_increment

Example Retries

Out of Memory Failure

A process_medium job runs out of memory:

Attempt 1: 6 CPUs, 36 GB, 8h    → Out of Memory (exit 137)
Attempt 2: 6 CPUs, 46 GB, 8h    → Out of Memory (exit 137)
Attempt 3: 6 CPUs, 56 GB, 8h    → Success

Loading graph

Out of Time Failure

A process_low job exceeds the time limit:

Attempt 1: 2 CPUs, 12 GB, 2h     → Out of Time (exit 140)
Attempt 2: 3 CPUs, 16 GB, 14h    → Out of Time (exit 140)
Attempt 3: 5 CPUs, 24 GB, 38h    → Success

Loading graph

Complex Multi-Failure Path

A job experiences multiple failure types across retries:

Attempt 1: 6 CPUs, 36 GB, 8h     → Out of Memory (exit 137)
Attempt 2: 6 CPUs, 46 GB, 8h     → Out of Time (exit 140)
Attempt 3: 7 CPUs, 50 GB, 20h    → Success

Loading graph

Getting Help

If you have any questions or issues running nf-core pipelines on IRIS, please contact:

Notes

Note: You will need an account on the IRIS cluster at MSKCC to use this profile.

Note: Nextflow should be run from a compute node (via srun or sbatch), not from the login node, to avoid overloading the login infrastructure.

Note: The config automatically enables trace reports to help monitor pipeline execution and resource usage.

Config file

See config file on GitHub

params {
    config_profile_description        = 'IRIS profile provided to run nextflow pipelines on the IRIS cluster at Memorial Sloan Kettering Cancer Center (MSKCC)'
    config_profile_contact            = 'Nikhil Kumar (kumarn1@mskcc.org)'
    config_profile_url                = 'https://mskcc-omics-workflows.gitbook.io/omics-workflows'

    // Resource Limits
    max_cpus                          = 52
    max_memory                        = 550.GB
    max_time                          = 7.d

    // Job Submission Options
    preemptable                       = false  // Use preemptable queue for faster submission
    isolated                          = false  // Set to true when you can only use the provided paritition
    group                             = ''     // IRIS group for the job work default path (e.g. /scratch/my_group)
    qos                               = ''     // Set Quality of Service specification for SLURM jobs (e.g. priority)
    partition                         = ''     // SLURM partition (uses $NXF_SLURM_PARTITION or 'cpu' if not set)

    // Path config
    scratch_path                      = '/localscratch'
    work_path                         = '/scratch'
    singularity_library               = '/data1/core006/resources/singularity_image_library'

    // Validation Parameters
    ignore_params_list                = [
        'max_cpus', 'max_memory', 'max_time',
        'preemptable', 'scratch_path', 'work_path',
        'singularity_library', 'isolated', 'group',
        'qos', 'partition', 'scratch', 'ignore_params_list',
        'schema_ignore_params', 'validationSchemaIgnoreParams'
    ]
    schema_ignore_params              = params.ignore_params_list.join(',')
    validationSchemaIgnoreParams      = params.ignore_params_list.join(',')
}

validation {
    ignoreParams                      = params.ignore_params_list
}

// Set sensible defaults
def scratch_dir                       = new File(params.scratch_path)
def work_base                         = new File(params.work_path + "/${params.group}")
params.partition                      = params.partition                                        ?: System.getenv('NXF_SLURM_PARTITION')  ?: 'cpu'
params.scratch                        = scratch_dir.exists()                                    ?  scratch_dir.getPath()                  : "${PWD}/scratch"
workDir                               = work_base.exists() && work_base.getPath() != '/scratch' ?  work_base.getPath() + '/work'          : "${PWD}/work"
cleanup                               = workDir.startsWith('/scratch')                          ?  true                                   : false
def singularity_scratch               = System.getenv('NXF_SINGULARITY_CACHEDIR')               ?: workDir + '/singularity_scratch'
def singularity_library               = System.getenv('NXF_SINGULARITY_LIBRARYDIR')             ?: params.singularity_library

executor {
    name              = 'slurm'
    pollInterval      = 45.s
    queueSize         = 5000
    queueStatInterval = '1 min'
    submitRateLimit   = '95/1min'
    retry.delay       = '1s'
    retry.maxDelay    = '1 min'
}

singularity {
    enabled      = true
    autoMounts   = true
    cacheDir     = singularity_scratch
    libraryDir   = singularity_library
    pullTimeout  = 1.hour
}

// IRIS SLURM Exit Codes:
// 15, 140 = Wall time limit exceeded
// 125, 137 = Out of memory
process {
    arch                              = 'linux/x86_64'
    executor                          = 'slurm'
    resourceLimits                    = [
        cpus: params.max_cpus,
        memory: params.max_memory,
        time: params.max_time
    ]

    _out_of_memory      = { task -> task.previousTrace && (task.previousTrace.exit == 125 || task.previousTrace.exit == 137) }
    _out_of_time        = { task -> task.previousTrace && (task.previousTrace.exit == 15  || task.previousTrace.exit == 140) }
    _cpu_starved        = { task -> task.previousTrace && task.previousTrace['%cpu']  && task.previousTrace['%cpu']  / task.previousTrace.cpus   >= .80 }
    _near_out_of_memory = { task -> task.previousTrace && task.previousTrace.peak_rss && task.previousTrace.peak_rss / task.previousTrace.memory >= .80 }
    _near_out_of_time   = { task -> task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time   >= .80 }
    _increase_memory    = { task, multiply, add -> task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (multiply * task.attempt) + add : task.memory + (multiply * task.attempt) + add }
    _increase_time      = { task, multiply, add -> task.previousTrace && task.previousTrace.time   ? (task.previousTrace.time as nextflow.util.Duration)     + (multiply * task.attempt) + add : task.time   + (multiply * task.attempt) + add }
    _increase_cpu       = { task, multiply, add -> task.previousTrace && task.previousTrace.cpus   ? task.previousTrace.cpus + (multiply * task.attempt) + add : task.cpus + (multiply * task.attempt) + add }

    _get_process_memory = { first_attempt, task ->
        {
            task.attempt == 1
                ? first_attempt
            : task.attempt > 1 && process._out_of_memory(task)
                ? process._increase_memory(task, 10.GB, 0.GB)
            : task.attempt > 1 && process._out_of_time(task)
                ? process._increase_memory(task, 4.GB, 0.GB)
            : task.attempt > 1 && process._near_out_of_memory(task)
                ? process._increase_memory(task, 4.GB, 0.GB)
            : task.attempt > 3
                ? process._increase_memory(task, 0.GB, 10.GB)
                : process._increase_memory(task, 0.GB, 2.GB)
        }
    }
    _get_process_cpus   = { first_attempt, task ->
        {
            task.attempt == 1
                ? first_attempt
            : task.attempt > 1 && process._out_of_time(task)
                ? process._increase_cpu(task, 0, 1)
            : task.attempt > 1 && process._near_out_of_time(task)
                ? process._increase_cpu(task, 0, 1)
            : task.attempt > 1 && process._cpu_starved(task)
                ? process._increase_cpu(task, 0, 2)
            : task.attempt > 3
                ? process._increase_cpu(task, 0, 1)
                : process._increase_cpu(task, 0, 0)
        }
    }
    _get_process_time  = { first_attempt, task ->
        {
            task.attempt == 1
                ? first_attempt
            : task.attempt > 1 && process._out_of_time(task)
                ? process._increase_time(task, 12.h, 0.h)
            : task.attempt > 1 && process._near_out_of_time(task)
                ? process._increase_time(task, 0.h, 12.h)
            : task.attempt > 3
                ? process._increase_time(task, 0.h, 1.d)
                : process._increase_time(task, 0.h, 2.h)
        }
    }

    withLabel: process_single {
        cpus                          = { process._get_process_cpus(1, task) }
        memory                        = { process._get_process_memory(1.GB, task) }
        time                          = { process._get_process_time(4.h, task) }
    }
    withLabel: process_low {
        cpus                          = { process._get_process_cpus(2, task) }
        memory                        = { process._get_process_memory(12.GB, task) }
        time                          = { process._get_process_time(2.h, task) }
    }
    withLabel: process_medium {
        cpus                          = { process._get_process_cpus(6, task) }
        memory                        = { process._get_process_memory(36.GB, task) }
        time                          = { process._get_process_time(8.h, task) }
    }
    withLabel: process_high {
        cpus                          = { process._get_process_cpus(12, task) }
        memory                        = { process._get_process_memory(72.GB, task) }
        time                          = { process._get_process_time(16.h, task) }
    }
    withLabel: process_long {
        cpus                          = { process._get_process_cpus(2, task) }
        memory                        = { process._get_process_memory(12.GB, task) }
        time                          = { process._get_process_time(20.h, task) }
    }
    withLabel: process_high_memory {
        cpus                          = { process._get_process_cpus(6, task) }
        memory                        = { process._get_process_memory(200.GB, task) }
        time                          = { process._get_process_time(8.h, task) }
    }
    withLabel: process_gpu {
        cpus                          = { process._get_process_cpus(6, task) }
        memory                        = { process._get_process_memory(25.GB, task) }
        time                          = { process._get_process_time(8.h, task) }
        accelerator                   = 1
    }
    withLabel: process_gpu_low {
        cpus                          = { process._get_process_cpus(6, task) }
        memory                        = { process._get_process_memory(25.GB, task) }
        time                          = { process._get_process_time(2.h, task) }
        accelerator                   = 1
    }

    queue = {
        if (params.isolated && params.preemptable) {
            return "preemptable,${params.partition}"
        }
        // Only use the set partition when isolated
        else if (params.isolated) {
            return params.partition
        }
        // Short CPU jobs
        else if (task.time <= 2.h && !task.accelerator) {
            return "cpushort,cpu,${params.partition}"
        }
        // Short GPU jobs
        else if (task.accelerator && task.time <= 2.h) {
            return 'gpushort,gpu'
        }
        // GPU jobs
        else if (task.accelerator) {
            return 'gpu'
        }
        // High memory jobs
        else if (task.memory >= 512.GB || task.memory / task.cpus >= 50.GB) {
            return "cpu_highmem,cpu,${params.partition}"
        }
        // Preemptable jobs
        else if (task.attempt < 2 && params.preemptable) {
            return "preemptable,cpu,${params.partition}"
        }
        else {
            return params.partition
        }
    }

    // Cluster Options for GPU and QoS
    clusterOptions = {
        if (task.accelerator && params.qos) {
            return "--qos=${params.qos} --gres=gpu:${task.accelerator.request}"
        }
        else if (task.accelerator) {
            return "--gres=gpu:${task.accelerator.request}"
        }
        else if (params.qos) {
            return "--qos=${params.qos}"
        }
        else {
            return ''
        }
    }

    // Container Options for GPU Support
    containerOptions = {
        if (task.accelerator && workflow.containerEngine == 'singularity') {
            return '--nv'
        }
        else if (task.accelerator && workflow.containerEngine == 'docker') {
            return '--gpus all'
        }
        else {
            return ''
        }
    }

    scratch                           = params.scratch
    cache                             = true  // Use 'lenient' if caches are not working
    beforeScript                      = 'unset R_LIBS; export SINGULARITYENV_TMPDIR=$NXF_SCRATCH; export SINGULARITYENV_TMP=$NXF_SCRATCH'
    maxRetries                        = 3
    errorStrategy                     = { task.attempt < 4 ? 'retry' : 'ignore' }

    publishDir.mode                   = 'copy'
    publishDir.enabled                = { publishDir.path ? true : false }
    input.mode                        = 'symlink'
    stageInMode                       = 'symlink'
    stageOutMode                      = 'copy'
}

workflow.output.mode                  = 'copy'

trace {
    enabled                           = true
}