Python package
How to use the Python package.
Installation
The flowdeploy
Python package is hosted on PyPI. Install it with your favorite package manager, like pip or
Poetry.
pip install flowdeploy
Basic usage
- Nextflow
- Snakemake
- Transfer
flowdeploy.nextflow(
pipeline="nf-core/fetchngs",
release="1.10.0",
outdir="fd://shared/outdir",
profiles=["docker", "test"],
)
flowdeploy.snakemake(
pipeline='trytoolchest/snakemake-testing',
release='v0.1.0',
targets=["results/mapped/A.bam"],
run_location='fd://shared/',
cli_args='--use-conda',
)
flowdeploy.transfer(
transfers=[{
"source": "s3://flowdeploy-public-demo/ids.csv",
"destination": "fd://shared/project_one/",
"destination_name": "ids.csv",
}, {
"source": "https://osf.io/f92qd/download",
"destination": "fd://shared/project_one/",
"destination_name": "test.txt",
}],
)
Once a FlowDeploy instance is spawning, the Python package shows the updated status of the run until it's finished. If you terminate the Python process, the FlowDeploy instance will continue running in the cloud – no need to keep your computer running.
Terminating the Python process after spawning a run with FlowDeploy does not cancel the run; it keeps running until it finishes. You must cancel runs through the FlowDeploy app.
Use the FlowDeploy app to monitor, terminate, retry, and debug the run.
Background
Running a pipeline with FlowDeploy creates an automatically scaling cluster. Each time the FlowDeploy Python package runs a pipeline, it creates a new cluster. Once the pipeline is finished – or cancelled – the cluster scales down.
FlowDeploy clusters use a shared file system. To reference a file in the shared file system from the Python package, you
need to a fd://
prefix – e.g. fd://shared/file.txt
.
You can access the shared file system, but sometimes it's easier to import and export files from somewhere else, like S3.
The inputs
parameter is used to import files, and the export_location
parameter is used to export files.
FlowDeploy automatically checkpoints your run with the workflow manager, and resumes from the last checkpoint if a run
fails. The workflow manager (e.g. Nextflow) determines what can be resumed – usually by checking input files and
parameters. You can set the run_location
parameter to demarcate runs.
The run_location
is also used as the starting point for any relative file paths used in the pipeline.
Raw arguments for the workflow manager are set with cli_args
.
Authentication
Create a key in the FlowDeploy app developer settings.
Use the set_key
function, or set the FLOWDEPLOY_KEY
environment variable.
- Key string
- Key file
- Environment variable
flowdeploy.set_key("YOUR_KEY")
flowdeploy.set_key("~/your_key.txt")
export FLOWDEPLOY_KEY="YOUR_KEY"
Nextflow and Snakemake
- Nextflow
- Snakemake
flowdeploy.nextflow(
pipeline="nf-core/fetchngs",
release="1.10.0",
outdir="fd://shared/outdir",
profiles=["docker", "test"],
)
flowdeploy.snakemake(
pipeline='trytoolchest/snakemake-testing',
release='v0.1.0',
targets=["results/mapped/A.bam"],
run_location='fd://shared/',
cli_args='--use-conda',
)
cli_args
Use cli_args
to pass raw command line arguments to the workflow manager.
Avoid setting these attributes, which produce undefined behavior:
- pipeline names (use
pipeline
) - profiles (use
profiles
) - input files (use
inputs
) - output locations (use
outdir
) - export locations (use
export_location
) - working directory (use
run_location
) - settings for logging, resumption, configuration files, or run naming (FlowDeploy handles these)
Anything passed to cli_args
are passed as raw arguments to the workflow manager.
Example usage
flowdeploy.nextflow(
pipeline="nf-core/rnaseq",
cli_args="--pseudo_aligner salmon',
...
)
translates to:
nextflow run nf-core/rnaseq --pseudo_aligner salmon [..]
export_location
(recommended)
An S3 location to export results.
Example usage
flowdeploy.nextflow(
export_location="s3://example/project_one",
...
)
export_location_source
(Snakemake only, recommended)
Exports the contents of this path to the export_location
destination. Must be a shared file system path (e.g. fd://...).
The export_location_source
is only used for Snakemake runs with the export_location
destination set.
Example usage
flowdeploy.snakemake(
export_location="s3://example/project_one",
export_location_source="fd://shared/project_one/outputs",
...
)
inputs
FlowDeploy accepts a list of input objects, which are transferred to the cluster file system and passed to the workflow manager.
If a file already exists on the FlowDeploy file system, you don't need to include it in inputs
– unless it's passed
directly to the workflow manager with an argument. In other words, always include files that would normally be passed
to the workflow manager on the command line. For example, you would always include "samplesheet.csv" as used in:
nextflow [...] --inputs samplesheet.csv
in the FlowDeploy inputs.
The input object has three attributes: source
, destination
, and arg
.
source
is transferred to destination
. If arg
is set, the file path is passed to the workflow manager with this
argument (e.g. --input
in the example above).
In a bit more detail:
source
: where the input file is located (S3 or FlowDeploy)destination
: if the input file is remote, the absolute path to download to on the FlowDeploy file systemarg
: optionally, set a workflow manager command line argument associated with the file
Example usage
flowdeploy.nextflow(
inputs=inputs,
...
)
Example input objects
An S3 file that's passed as an argument
inputs = [{
"source": "s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta",
"destination": "fd://flowdeploy-shared/Homo_sapiens_assembly19.fasta",
"arg": "--fasta"
}]
s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta
is transferred to
/flowdeploy-shared/Homo_sapiens_assembly19.fasta
in the shared file system.
A FlowDeploy file that's passed as an argument
inputs = [{
"source": "fd://flowdeploy-shared/samplesheet.csv",
"arg": "--input"
}]
Both of the above in one object
inputs = [{
"source": "s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta",
"destination": "fd://flowdeploy-shared/Homo_sapiens_assembly19.fasta",
"arg": "--fasta",
}, {
"source": "fd://flowdeploy-shared/samplesheet.csv",
"arg": "--input",
}]
is_async
If true, exits immediately after spawning a FlowDeploy instance. The transfer will continue, and can be monitored in the FlowDeploy app.
Default: False
.
Example usage
flowdeploy.nextflow(
is_async=True,
...
)
pipeline
(required)
The name of the pipeline.
Example usage
flowdeploy.nextflow(
pipeline="nf-core/fetchngs",
...
)
release
(required if branch
is not set)
The git release to use for execution of the pipeline. Either release or branch must be set.
Example usage
flowdeploy.nextflow(
release="1.1.0",
...
)
branch
(required if release
is not set)
The git branch to use for execution of the pipeline. Either release or branch must be set.
Example usage
flowdeploy.snakemake(
branch="main",
...
)
profiles
Workflow manager configuration profiles.
The FlowDeploy configuration profile is automatically added.
Example usage
flowdeploy.nextflow(
profiles=["docker"],
...
)
run_location
(recommended)
A FlowDeploy path to use as the working directory for the pipeline.
The run location is used for workflow manager caching. Re-running the same command with the same run_location
allows
the workflow manager to resume from cached results.
Defaults to /work/${TASK_ID}
.
Example usage
flowdeploy.nextflow(
run_location="fd://shared/project_one",
...
)
translates to
cd /shared/project_one && nextflow run [..] -resume
outdir
(Nextflow only, required)
Where to place pipeline outputs. Must be a FlowDeploy file path.
Example usage
flowdeploy.nextflow(
outdir="fd://shared/project_one",
...
)
snakemake_folder
(Snakemake only, recommended)
The name of the folder in which Snakemake will run.
FlowDeploy creates this folder if it does not exist, and clones the pipeline into this directory. The run location for
Snakemake is computed as run_location
joined with snakemake_folder
.
Example usage
flowdeploy.snakemake(
run_location="fd://shared/",
snakemake_folder="project_one",
...
)
translates to:
<Create "/shared/project_one" and set up pipeline> && snakemake -d /shared/project_one [...]
snakefile_location
(Snakemake only, recommended)
The path to the Snakefile, relative to the snakemake_folder
. Defaults to workflow/Snakefile
.
If your Snakefile is at the base of your project, set as snakefile_location="Snakefile"
.
Example usage
flowdeploy.snakemake(
run_location="fd://shared/",
snakemake_folder="project_one",
snakefile_location="pipeline_a.snakefile",
...
)
translates to:
[...] cd project_one && snakemake -s "pipeline_a.snakefile" [...]
targets
Snakemake targets
, as a list.
Example usage
flowdeploy.snakemake(
targets=["results/mapped/A.bam", "results/mapped/B.bam"],
...
)
Transfer
transfers
(required)
A list of the files or folders to transfer, with each entry as a dictionary containing:
source
(required): where the file is currently located (s3://
,fd://
, orhttps://
)destination
(required): where to transfer the file (s3://
orfd://
)destination_name
(required for files): the name of the destination file, if applicable
Example usage
flowdeploy.transfer(
transfers=[{
"source": "s3://flowdeploy-public-demo/ids.csv",
"destination": "fd://shared/project_one/",
"destination_name": "ids.csv",
}, {
"source": "https://osf.io/f92qd/download",
"destination": "fd://shared/project_one/",
"destination_name": "test.txt",
}],
)
is_async
If true, exits immediately after spawning. The transfer will continue, and can be monitored in the FlowDeploy app.
Default: False
.
Example usage
flowdeploy.transfer(
is_async=True,
...
)
Restrictions
- Running more than one run concurrently with the same
run_location
has undefined behavior. - Each pipeline is limited to 20 concurrent subtasks (can be increased on request).