Skip to main content

How to use Snakemake resources

Snakemake resources, a deep dive

Snakemake's "resources" determine how jobs are scheduled within a computational environment. Some resources are consistent across different execution environments, while others are specific to an executor.

In this deep dive, we'll look closer at Snakemake's "resources" functionality and usage. Because different executors have different options, we won't dive too deep into executor-specific options.

How resources work in Snakemake

  1. Define required computational resources like memory (mem_mb) or disk space (disk_mb).
    1. In a Snakefile, use the resources rule attribute.
    2. On the command line, use the --resources option.
    3. In configuration files, use the resources keyword.
  2. You can also define custom resources for specific needs, like API call limits.
  3. Snakemake distinguishes between two types of resources
    1. Local Resources are specific to individual job or subtask submissions (e.g. memory, disk space). If a job is assigned 16GB of memory, that particular job will be reserved 16GB of memory.
    2. Global Resources apply across all jobs, and are helpful for global restrictions like API limits.
  4. Snakemake does its best to schedule resources within the requested resources, but does not actually restrict a task that exceeds its allocation.


Basic usage

rule align:
input: "data/sequences.fasta"
output: "results/aligned.fasta"
resources: mem_mb=32000
shell: "aligner --input {input} --output {output}"

This rule requests 32 GB of memory for a sequence alignment task.

Global vs. local resources

When workflows submit jobs remotely, Snakemake manages resources in two different blocks:

  1. Global resources apply across all jobs, even those executing on different machines.
  2. Local resources apply only to a single job.

By default, only mem_mb, disk_mb, and threads are considered local resources. Any other resources, including new resources, are considered global by default. You can change the scoping of a resource in resource_scopes.



rule download_data:
output: "data/raw_data.csv"
shell: "download_script {output}"

Snakemake run:

snakemake --resources mem_mb=5000

In this example, the scope of mem_mb is modified from a local to global resource restriction. With a limit of five 5000 megabytes, only five download jobs will run simultaneously – even if executed remotely.

Executor specific resource settings, like spot instances

rule align:
input: "data/sequences.fasta"
output: "results/aligned.fasta"
preemptible=True # <------ Added this
shell: "aligner --input {input} --output {output}"

This updates the alignment job from earlier to use preemptible (spot) instances for a FlowDeploy execution.

Using functions as resource values

def set_preemptible(wildcards, attempt):
return attempt < 3

rule align:
input: "data/sequences.fasta"
output: "results/aligned.fasta"
retries: 3
shell: "aligner --input {input} --output {output}"

This runs alignment jobs on preemptible (spot) instances, but retries as non-preemptible (on-demand) instances after two failures by defining a custom set_preemptible function.


Look at what resources your pipeline is actually consuming

Sometimes tasks will be allocated more resources than they actually need, or spot instances might actually terminate at a rate that makes your pipelines cheaper to run on on-demand instances. Without keeping track of resource usage, you won't actually know how to optimize your resource usage!

GPU support is still largely executor dependent

GPU support – and the ability to request specific GPUs – is still largely executor dependent.