Skip to main content

Start with a simple tech stack

Luckily, setting up the tech stack for bioinformatics isn't as hard as it used to be. The tooling for bioinformatics pipelines has evolved significantly in the past five years.

Without version control or a workflow manager, new bioinformatics teams struggle. Putting a workflow manager in place, version control, and basic computing infrastructure are all essential. Here are the four most non-controversial stacks I see:

  • Nextflow or Snakemake launched through the command-line, using the executor plugins for cloud execution.
  • Nextflow or Snakemake on a long-lived single-server – usually a low-cost server provider like Hetzner or an existing lab server.
  • Nextflow with Seqera Tower (formerly Nextflow Tower).
  • Snakemake with FlowDeploy.

Version control is almost always managed with git and GitHub.

tip

In your first few months, it's hard to accurately invest in infrastructure. Your first hires and projects might have wildly different needs than you expect. Try to avoid prematurely optimizing for their expected needs.

Some common mistakes:

  • Buying servers in the first few months to optimize for cost.
  • Prematurely optimizing for scale.
  • Going all-in on a language or platform that will push away future hires. Some workflow languages, for example, are strongly disliked by a subset of hires. Closed-source platforms that limit flexibility are also a turn-off for many bioinformaticians.