Snakemake dry runs, a deep dive
--dry-run option in Snakemake is a fast and computationally cheap way to test a workflow. By running
snakemake --dry-run, or its shorthand
-n, you can preview a workflow without an expensive or lengthy full run.
In this deep dive, we'll look closer at Snakemake's dry run functionality and usage.
- When using
--dry-run, Snakemake parses the Snakefile and evaluates the rules without running the commands.
- It generates a Directed Acyclic Graph (DAG) of all the jobs that would be run.
- Snakemake lists the jobs that would be executed, providing a snapshot of what the workflow intends to do.
- The developer – or an automated test – checks the output to catch errors in the workflow logic, such as missing input files or rule misconfigurations.
Basic dry run
To perform a basic dry run of your Snakemake workflow, add the
--dry-run flag to your Snakemake command. This
prints the actions Snakemake would take without executing them.
Dry run with quiet mode
For large workflows, combine the
--dry-run option with
--quiet to avoid lengthy output.
snakemake --dry-run --quiet
Dry run after cleaning
To test the workflow from scratch, especially after making changes to the Snakefile, clean the workflow outputs and then perform a dry run.
snakemake clean && snakemake --dry-run
Dry run for specific rules
To test only a specific part of the workflow, combine
--dry-run with a particular rule or output file. This targets
the dry run to a specific segment of your pipeline.
snakemake --dry-run specific_output_file
snakemake --dry-run --rule my_specific_rule
Dry run with detailed information
For an a deeper understanding of each step, you can use the
--dry-run option with
--printshellcmds. This shows the
shell commands that would execute each step without actually running them.
snakemake --dry-run --printshellcmds
Catching file name mismatches
Some of the omst common errors caught by
--dry-run are filename mismatches, but continuously re-running a dry-run
snakemake clean can lead to those errors not being caught as the cache develops.
Running a dry run after cleaning closes that gap.
Writing fast integration tests
--printshellcmds is a fast way to quickly check that the expected commands are produced
after making changes. If set up in a continuous integration process, this helps catch a lot of issues before going live
with new pipelines changes – without the computational burden of actually running the pipeline.