Command-line interface
How to use the FlowDeploy command-line interface.
Installation
Get started by installing the flowdeploy
package via pip:
pip install flowdeploy
pip
will automatically add the command-line executable to your environment's path. For a global install, use pipx
.
(Want another install method like brew
? Let us know!)
Basic usage
- Nextflow
- Snakemake
- Transfer
flowdeploy run nextflow nf-core/fetchngs --release 1.11.0 --outdir fd://shared/outdir --profile docker --profile test --input-file ./input.json
Where input.json
contains
[
{
"source": "s3://flowdeploy-public-demo/ids.csv",
"destination": "fd://shared/ids.csv",
"arg": "--input"
}
]
flowdeploy run snakemake trytoolchest/snakemake-testing --release v0.1.0 --target results/mapped/A.bam --run-location fd://shared/ --cli-args '--use-conda'
flowdeploy transfer --input-file ./transfers.json
Where transfers.json
contains
[
{
"source": "s3://flowdeploy-public-demo/ids.csv",
"destination": "fd://shared/",
"destination_name": "ids.csv"
}
]
Once a FlowDeploy instance is spawning, the command-line interface displays the status of the run until it's finished. If you terminate the command, the FlowDeploy instance will continue running in the cloud – no need to keep your computer running.
Terminating the command after spawning a run with FlowDeploy does not cancel the run; it keeps running until it finishes. You must cancel runs through the FlowDeploy app.
Use the FlowDeploy app to monitor, terminate, retry, and debug the run.
Background
Running a pipeline with FlowDeploy creates an automatically scaling cluster. Each time you use the FlowDeploy CLI to run a pipeline, it creates a new cluster. Once the pipeline is finished – or cancelled – the cluster scales down.
FlowDeploy clusters use a shared file system. To reference a file in the shared file system from the Python package, you
need to a fd://
prefix – e.g. fd://shared/file.txt
.
You can access the shared file system, but sometimes it's easier to import and export files from somewhere else, like S3.
The inputs
parameter is used to import files, and the export_location
parameter is used to export files.
FlowDeploy automatically checkpoints your run with the workflow manager, and resumes from the last checkpoint if a run
fails. The workflow manager (e.g. Nextflow) determines what can be resumed – usually by checking input files and
parameters. You can set the run_location
parameter to demarcate runs.
The run_location
is also used as the starting point for any relative file paths used in the pipeline.
Raw arguments for the workflow manager are set with cli_args
.
Commands
flowdeploy set-key
Authorizes your environment by setting your FlowDeploy key in an environment configuration file.
flowdeploy_key
(required)
Define your FlowDeploy key here.
Example usage
flowdeploy set-key YOUR_FLOWDEPLOY_KEY_HERE
--config, -c
(optional)
Specify the path to the configuration file where the key should be set. It defaults to ~/.zshenv
on macOS
and ~/.bashrc
on Linux.
Example usage
flowdeploy set-key YOUR_FLOWDEPLOY_KEY_HERE --config /path/to/config
flowdeploy run
Use flowdeploy run [parameters]
to spawn a new pipeline.
workflow_manager
(required)
The workflow manager to run the pipeline. Supports snakemake
and nextflow
.
Example usage
flowdeploy run snakemake [..]
pipeline
(required)
The name of the pipeline.
Example usage
flowdeploy run nextflow nf-core/fetchngs [..]
--release, -r
(required if --branch
is not set)
The git release to use for execution of the pipeline. Either --release
or --branch
must be set.
Example usage
flowdeploy run [..] --release 1.10.0
--branch, -r
(required if --release
is not set)
The git branch to use for execution of the pipeline. Either --release
or --branch
must be set.
Example usage
flowdeploy run [..] --branch main
--outdir, -o
(required, nextflow only)
Set the output directory to store the results from the pipeline run. Note that the path must be a FlowDeploy file path.
Not applicable for Snakemake.
Example usage
flowdeploy run nextflow [..] --outdir fd://shared/outdir
--input-file, -i
(optional)
The path to a JSON file that contains all inputs.
FlowDeploy accepts a list of input objects, which are transferred to the cluster file system and passed to the workflow manager.
If a file already exists on the FlowDeploy file system, you don't need to include it in inputs
– unless it's passed
directly to the workflow manager with an argument. In other words, always include files that would normally be passed
to the workflow manager on the command line. For example, you would always include "samplesheet.csv" as used in:
nextflow [...] --inputs samplesheet.csv
in the FlowDeploy inputs.
The input object has three attributes: source
, destination
, and arg
.
source
is transferred to destination
. If arg
is set, the file path is passed to the workflow manager with this
argument (e.g. --input
in the example above).
In a bit more detail:
source
: where the input file is located (S3 or FlowDeploy)destination
: if the input file is remote, the absolute path to download to on the FlowDeploy file systemarg
: optionally, set a workflow manager command line argument associated with the file
Example usage
flowdeploy run [..] --input-file ./input.json
Example input files
An S3 file that's passed as an argument
[{
"source": "s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta",
"destination": "fd://shared/Homo_sapiens_assembly19.fasta",
"arg": "--fasta"
}]
s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta
is transferred to
/shared/Homo_sapiens_assembly19.fasta
in the shared file system.
A FlowDeploy file that's passed as an argument
[{
"source": "fd://shared/samplesheet.csv",
"arg": "--input"
}]
Both of the above in one object
[{
"source": "s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta",
"destination": "fd://shared/Homo_sapiens_assembly19.fasta",
"arg": "--fasta",
}, {
"source": "fd://shared/samplesheet.csv",
"arg": "--input",
}]
--cli-args
(optional)
Use cli-args
to pass raw command line arguments to the workflow manager.
Avoid setting these attributes, which produce undefined behavior:
- pipeline names (use
pipeline
) - profiles (use
profiles
) - input files (use
inputs
) - output locations (use
outdir
) - export locations (use
export-location
) - working directory (use
run-location
) - settings for logging, resumption, configuration files, or run naming (FlowDeploy handles these)
Anything passed to cli-args
are passed as raw arguments to the workflow manager.
Example usage
flowdeploy run [..] --cli-args "--pseudo_aligner salmon"
--export-location
(optional)
An S3 location to export results.
Example usage
flowdeploy run [..] --export-location s3://example/project_one
--export-location-source
(optional, snakemake only)
Exports the contents of this path to the --export-location
destination. Must be a shared file system path (e.g. fd://...).
The --export-location-source
is only used for Snakemake runs with the --export-location
destination set.
Example usage
flowdeploy run snakemake [..] --export-location s3://example/project_one --export-location-source fd://shared/snakemake/outputs
--profile
(optional)
Workflow manager configuration profiles. To add multiple profiles, add --profile
multiple times.
The FlowDeploy configuration profile is automatically added.
Example usage
flowdeploy run [..] --profile "docker" --profile "test"
--run-location
(optional)
A FlowDeploy path to use as the working directory for the pipeline.
The run location is used for workflow manager caching. Re-running the same command with the same run-location
allows
the workflow manager to resume from cached results.
Defaults to fd://shared/work/${TASK_ID}
for Nextflow, but is required for Snakemake.
Example usage
flowdeploy run [..] --run-location fd://shared/project_one
--snakemake-folder
(optional)
Snakemake's working directory, relative to the run location. Defaults to the pipeline name.
Example usage
flowdeploy run snakemake [..] --run-location fd://shared/project_one --snakemake-folder snakmake_pipeline
The command in the example runs from /shared/project_one/snakemake_pipeline/
.
--snakefile-location
(optional)
Snakefile location, relative to the Snakemake folder name. Defaults to 'workflow/Snakefile'.
Example usage
flowdeploy run snakemake [..] --run-location fd://shared/project_one --snakemake-folder snakmake_pipeline --snakefile-location workflow/Snakefile
The command in the example uses the Snakefile at /shared/project_one/snakemake_pipeline/workflow/Snakefile
.
--target
(optional, snakemake only)
Snakemake targets. To add multiple targets, add --target
multiple times.
Example usage
flowdeploy run snakemake [..] --target results/mapped/A.bam --target results/mapped/B.bam
--is-async
(optional)
If set, exits immediately after spawning a FlowDeploy instance.
Default: False
.
Example usage
flowdeploy run [..] --is-async
--flowdeploy-key
(optional)
Provide a FlowDeploy API key to authenticate the run.
Alternatively, you can run flowdeploy set-key [..]
to authenticate your environment.
Example usage
flowdeploy run [..] --flowdeploy-key YOUR_API_KEY_HERE
flowdeploy transfer
Use flowdeploy transfer [..]
to spawn a new transfer run.
--input-file, -i
(required)
The path to a JSON file that contains all transfers.
FlowDeploy accepts a list of transfer objects, which are transferred to or from the cluster file system and external sources.
The transfer object has three attributes: source
, destination
, and destination_name
.
source
is transferred to destination
. If destination_name
is set, that name is used as the file name at the destination. destination_name
is required when transferring individual files.
In a bit more detail:
source
: where the source file or folder is located (S3, HTTP(S), or FlowDeploy)destination
: the absolute path to the folder where the transfer should end up (S3 or FlowDeploy)destination_name
: the name of the destination file, if applicable
Example usage
flowdeploy transfer [..] --input-file ./transfers.json
Example input files
An individual S3 file
[{
"source": "s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta",
"destination": "fd://shared/",
"destination_name": "Homo_sapiens_assembly19.fasta"
}]
s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta
is transferred to
/shared/Homo_sapiens_assembly19.fasta
in the shared file system.
A FlowDeploy directory that's transferred to S3
[{
"source": "fd://shared/outputs/",
"destination": "s3://my-bucket/outputs/"
}]
The files inside /shared/outputs/
are transferred to s3://my-bucket/outputs/
.
Both of the above in one object
[{
"source": "s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta",
"destination": "fd://shared/",
"destination_name": "Homo_sapiens_assembly19.fasta"
}, {
"source": "fd://shared/outputs/",
"destination": "s3://my-bucket/outputs/"
}]
--is-async
(optional)
If set, exits immediately after spawning a FlowDeploy instance.
Default: False
.
Example usage
flowdeploy transfer [..] --is-async
--flowdeploy-key
(optional)
Provide a FlowDeploy API key to authenticate the run.
Alternatively, you can run flowdeploy set-key [..]
to authenticate your environment.
Example usage
flowdeploy transfer [..] --flowdeploy-key YOUR_API_KEY_HERE
flowdeploy status
Checks the state of a FlowDeploy pipeline or transfer run.
run_id
(required)
The FlowDeploy ID for the run.
Example usage
flowdeploy status YOUR_RUN_ID_HERE
--flowdeploy-key
(optional)
Provide a FlowDeploy API key to authenticate the run.
Alternatively, you can run flowdeploy set-key [..]
to authenticate your environment.
Example usage
flowdeploy status YOUR_RUN_ID_HERE --flowdeploy-key YOUR_FLOWDEPLOY_KEY_HERE