How to use Snakemake checkpoints
Snakemake checkpoints, a deep dive
Checkpoints are a special Snakemake rule that let you create dynamic workflows: workflows where the structure of a pipeline recomputes after the checkpoint successfully executes. They're often used at critical junctures where the downstream pipeline diverges for different outcomes of some step, like reading dynamically generated files from a previous step.
In this deep dive, we'll look closer at Snakemake's checkpoint
functionality and usage.
How checkpoints work
-
You define a special "checkpoint" rule in the Snakefile. Checkpoints – at their most basic – can be structured identically to rules.
-
After a checkpoint job succeeds, Snakemake re-evaluates the Directed Acyclic Graph (DAG). This re-evaluation means the workflow structure changes based on the checkpoint's outputs.
-
Other rules and functions can access the globally available checkpoints object. This is often used in input functions for downstream rules, which use the checkpoint's outputs to determine their return value.
Examples
Basic checkpoint usage
Here's how to define a basic checkpoint and use its output:
checkpoint preprocess_sample:
input:
"data/{sample}.fastq"
output:
"results/{sample}.fasta"
shell:
"preprocess_sample {input} > {output}"
rule process_sample:
input: lambda wildcards: checkpoints.preprocess_sample.get(sample=wildcards.sample).output
output:
"results/{sample}.txt"
shell:
"process_data {input} > {output}"
Checkpoint that dynamically changes the path
The results of a checkpoint can also be used to dynamically recompute the rest of the DAG.
checkpoint preprocess_sample:
input:
"data/{sample}.fastq"
output:
"results/{sample}.fasta"
shell:
"preprocess_sample {input} > {output}"
def choose_project(wildcards):
checkpoint_sample = checkpoints.preprocess_sample.get(sample=wildcards.sample)
return "project_one/{sample}.txt" if len(checkpoint_sample.output) === 1 else "project_two/{sample}.txt"
rule prep_project_one_metadata:
input: choose_project
output:
"project_one/{sample}.txt"
shell:
"prep_metadata --project_one {input} > {output}"
In this example, we arbitrarily check the length of the checkpoint's outputs to choose which path to use for the next rule – and thus potentially change while rules are executed. You could also read in the file, call an API, or anything your heart desires.
Tips
Checkpoints use cached files, just like the other rules. Use clean
to avoid this.
Clear previous outputs if you want a checkpoint to re-evaluate with potentially different results.