Skip to main content

Python package

How to use the Python package.

Installation

The flowdeploy Python package is hosted on PyPI. Install it with your favorite package manager, like pip or Poetry.

pip install flowdeploy

Basic usage

flowdeploy.nextflow(
pipeline="nf-core/fetchngs",
release="1.10.0",
outdir="fd://shared/outdir",
profiles=["docker", "test"],
)

Once a FlowDeploy instance is spawning, the Python package shows the updated status of the run until it's finished. If you terminate the Python process, the FlowDeploy instance will continue running in the cloud – no need to keep your computer running.

info

Terminating the Python process after spawning a run with FlowDeploy does not cancel the run; it keeps running until it finishes. You must cancel runs through the FlowDeploy app.

Use the FlowDeploy app to monitor, terminate, retry, and debug the run.

Background

Running a pipeline with FlowDeploy creates an automatically scaling cluster. Each time the FlowDeploy Python package runs a pipeline, it creates a new cluster. Once the pipeline is finished – or cancelled – the cluster scales down.

FlowDeploy clusters use a shared file system. To reference a file in the shared file system from the Python package, you need to a fd:// prefix – e.g. fd://shared/file.txt.

You can access the shared file system, but sometimes it's easier to import and export files from somewhere else, like S3. The inputs parameter is used to import files, and the export_location parameter is used to export files.

FlowDeploy automatically checkpoints your run with the workflow manager, and resumes from the last checkpoint if a run fails. The workflow manager (e.g. Nextflow) determines what can be resumed – usually by checking input files and parameters. You can set the run_location parameter to demarcate runs.

The run_location is also used as the starting point for any relative file paths used in the pipeline.

Raw arguments for the workflow manager are set with cli_args.

Authentication

Create a key in the FlowDeploy app developer settings.

Use the set_key function, or set the FLOWDEPLOY_KEY environment variable.

flowdeploy.set_key("YOUR_KEY")

Nextflow and Snakemake

flowdeploy.nextflow(
pipeline="nf-core/fetchngs",
release="1.10.0",
outdir="fd://shared/outdir",
profiles=["docker", "test"],
)

cli_args

Use cli_args to pass raw command line arguments to the workflow manager.

Avoid setting these attributes, which produce undefined behavior:

  • pipeline names (use pipeline)
  • profiles (use profiles)
  • input files (use inputs)
  • output locations (use outdir)
  • export locations (use export_location)
  • working directory (use run_location)
  • settings for logging, resumption, configuration files, or run naming (FlowDeploy handles these)

Anything passed to cli_args are passed as raw arguments to the workflow manager.

Example usage

flowdeploy.nextflow(
pipeline="nf-core/rnaseq",
cli_args="--pseudo_aligner salmon',
...
)

translates to:

nextflow run nf-core/rnaseq --pseudo_aligner salmon [..]

An S3 location to export results.

Example usage

flowdeploy.nextflow(
export_location="s3://example/project_one",
...
)

Exports the contents of this path to the export_location destination. Must be a shared file system path (e.g. fd://...).

The export_location_source is only used for Snakemake runs with the export_location destination set.

Example usage

flowdeploy.snakemake(
export_location="s3://example/project_one",
export_location_source="fd://shared/project_one/outputs",
...
)

inputs

FlowDeploy accepts a list of input objects, which are transferred to the cluster file system and passed to the workflow manager.

If a file already exists on the FlowDeploy file system, you don't need to include it in inputs – unless it's passed directly to the workflow manager with an argument. In other words, always include files that would normally be passed to the workflow manager on the command line. For example, you would always include "samplesheet.csv" as used in:

nextflow [...] --inputs samplesheet.csv

in the FlowDeploy inputs.

The input object has three attributes: source, destination, and arg.

source is transferred to destination. If arg is set, the file path is passed to the workflow manager with this argument (e.g. --input in the example above).

In a bit more detail:

  • source: where the input file is located (S3 or FlowDeploy)
  • destination: if the input file is remote, the absolute path to download to on the FlowDeploy file system
  • arg: optionally, set a workflow manager command line argument associated with the file

Example usage

flowdeploy.nextflow(
inputs=inputs,
...
)

Example input objects

An S3 file that's passed as an argument
inputs = [{
"source": "s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta",
"destination": "fd://flowdeploy-shared/Homo_sapiens_assembly19.fasta",
"arg": "--fasta"
}]

s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta is transferred to /flowdeploy-shared/Homo_sapiens_assembly19.fasta in the shared file system.

A FlowDeploy file that's passed as an argument
inputs = [{
"source": "fd://flowdeploy-shared/samplesheet.csv",
"arg": "--input"
}]
Both of the above in one object
inputs = [{
"source": "s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta",
"destination": "fd://flowdeploy-shared/Homo_sapiens_assembly19.fasta",
"arg": "--fasta",
}, {
"source": "fd://flowdeploy-shared/samplesheet.csv",
"arg": "--input",
}]

is_async

If true, exits immediately after spawning a FlowDeploy instance. The transfer will continue, and can be monitored in the FlowDeploy app.

Default: False.

Example usage

flowdeploy.nextflow(
is_async=True,
...
)

pipeline (required)

The name of the pipeline.

Example usage

flowdeploy.nextflow(
pipeline="nf-core/fetchngs",
...
)

release (required if branch is not set)

The git release to use for execution of the pipeline. Either release or branch must be set.

Example usage

flowdeploy.nextflow(
release="1.1.0",
...
)

branch (required if release is not set)

The git branch to use for execution of the pipeline. Either release or branch must be set.

Example usage

flowdeploy.snakemake(
branch="main",
...
)

profiles

Workflow manager configuration profiles.

The FlowDeploy configuration profile is automatically added.

Example usage

flowdeploy.nextflow(
profiles=["docker"],
...
)

A FlowDeploy path to use as the working directory for the pipeline.

The run location is used for workflow manager caching. Re-running the same command with the same run_location allows the workflow manager to resume from cached results.

Defaults to /work/${TASK_ID}.

Example usage

flowdeploy.nextflow(
run_location="fd://shared/project_one",
...
)

translates to

cd /shared/project_one && nextflow run [..] -resume

outdir (Nextflow only, required)

Where to place pipeline outputs. Must be a FlowDeploy file path.

Example usage

flowdeploy.nextflow(
outdir="fd://shared/project_one",
...
)

The name of the folder in which Snakemake will run.

FlowDeploy creates this folder if it does not exist, and clones the pipeline into this directory. The run location for Snakemake is computed as run_location joined with snakemake_folder.

Example usage

flowdeploy.snakemake(
run_location="fd://shared/",
snakemake_folder="project_one",
...
)

translates to:

<Create "/shared/project_one" and set up pipeline> && snakemake -d /shared/project_one [...]

The path to the Snakefile, relative to the snakemake_folder. Defaults to workflow/Snakefile.

If your Snakefile is at the base of your project, set as snakefile_location="Snakefile".

Example usage

flowdeploy.snakemake(
run_location="fd://shared/",
snakemake_folder="project_one",
snakefile_location="pipeline_a.snakefile",
...
)

translates to:

[...] cd project_one && snakemake -s "pipeline_a.snakefile" [...]

targets

Snakemake targets, as a list.

Example usage

flowdeploy.snakemake(
targets=["results/mapped/A.bam", "results/mapped/B.bam"],
...
)

Transfer

transfers (required)

A list of the files or folders to transfer, with each entry as a dictionary containing:

  • source (required): where the file is currently located (s3://, fd://, or https://)
  • destination (required): where to transfer the file (s3:// or fd://)
  • destination_name (required for files): the name of the destination file, if applicable

Example usage

flowdeploy.transfer(
transfers=[{
"source": "s3://flowdeploy-public-demo/ids.csv",
"destination": "fd://shared/project_one/",
"destination_name": "ids.csv",
}, {
"source": "https://osf.io/f92qd/download",
"destination": "fd://shared/project_one/",
"destination_name": "test.txt",
}],
)

is_async

If true, exits immediately after spawning. The transfer will continue, and can be monitored in the FlowDeploy app.

Default: False.

Example usage

flowdeploy.transfer(
is_async=True,
...
)

Restrictions

  1. Running more than one run concurrently with the same run_location has undefined behavior.
  2. Each pipeline is limited to 20 concurrent subtasks (can be increased on request).