This tutorial will guide you through setting up Nextflow on Expanse using the Seqera platform. Nextflow is a powerful and flexible workflow management system that enables scalable and reproducible scientific workflows. It allows researchers to define complex computational pipelines using a simple language, making it easy to manage data dependencies, parallelize tasks, and adapt workflows to different computing environments, from local machines to cloud platforms and HPC clusters like Expanse. Workflow systems like Nextflow are incredibly useful for ensuring reproducibility, simplifying complex analyses, and efficiently utilizing computational resources by automating the execution of multi-step processes.
Seqera Platform enhances Nextflow by providing a centralized control plane for managing and monitoring Nextflow pipelines across diverse execution environments, including HPC clusters like Expanse. It offers features such as advanced logging, resource optimization, and collaborative tools, making it easier to deploy, track, and scale complex scientific workflows.
What is Seqera?
Seqera Labs is a bioinformatics company that provides a platform for collaborative and scalable scientific data analysis, closely associated with the Nextflow workflow management system. It helps scientists develop, debug, and execute bioinformatics pipelines, manage data, and supervise workflows, particularly in cloud environments. Key offerings include Seqera AI for pipeline code generation and analysis, and Seqera Containers for streamlining Docker and Singularity container building and access. Seqera’s technologies are used across various scientific disciplines to scale analytical work.
Seqera Academic Program
Seqera offers an Academic Program that provides free Pro-level access to the Seqera Cloud Platform for researchers, educators, and students at qualifying institutions. To be eligible, the applicant’s organization must be a degree-granting educational institution, and the use of the Seqera Platform must be solely for academic research and/or teaching purposes, not for commercial use. Applicants need a user account on seqera.io using their institutional email address and an organization created within seqera.io. More details and an application form are available on the Seqera website.
Before proceeding with Seqera-specific configurations, please follow the initial setup steps outlined in our previous tutorial: Running Nextflow on Expanse. This includes installing Micromamba, Nextflow, verifying the installation, and running a toy example locally. Once you have completed these foundational steps, return to this tutorial to integrate Nextflow with the Seqera Platform.
1. Create Seqera Account and Token
1.1. Create a Seqera Account and Workspace
If you don’t already have one, create an account on Seqera Platform. Once logged in, create a new workspace for your projects.
1.2. Link Nextflow to Seqera
To allow your local Nextflow installation to communicate with the Seqera Platform, you need to set the TOWER_ACCESS_TOKEN environment variable. You can generate an API token from your Seqera account settings.
export TOWER_ACCESS_TOKEN="YOUR_SEQERA_API_TOKEN"Add this line to your ~/.bashrc or ~/.profile file on Expanse to persist the token across sessions.
2. Launch Workflow from Nextflow CLI and Monitor
2.1. Launching Workflows from the Nextflow CLI with -with-tower
Once your TOWER_ACCESS_TOKEN is set, you can launch Nextflow workflows directly from the command line using the -with-tower flag. This allows you to leverage Seqera’s monitoring and management capabilities, as long as your local Nextflow configuration is linked to your Seqera account.
nextflow run hello-workflow-4.nf -with-tower -profile slurm_debugThis command will execute the workflow, and its progress will be visible in your Seqera Platform dashboard. For more details, refer to the Nextflow training materials on using Seqera Platform to capture and monitor Nextflow jobs launched from the CLI.
2.2. Monitor Progress in Seqera UI
After launching the workflow with -with-tower, navigate to your Seqera Platform dashboard in your web browser. You should see your workflow listed with its real-time status, logs, and resource utilization.
3. Create Compute Environment and Launch from Seqera UI
To fully leverage Seqera’s capabilities for managing and executing workflows on Expanse, you need to configure a compute environment and launch pipelines directly from the Seqera UI.
3.1. Configure a Compute Environment for Expanse
Within your Seqera workspace, navigate to the “Compute Environments” section and create a new one with the following details:
- Name:
expanse-compute - Credentials: Select
Managed identity cluster. You will need to providelogin.expanse.sdsc.eduas the host and configure an SSH key for authentication. This usually involves generating an SSH key pair and adding the public key to your~/.ssh/authorized_keysfile on Expanse. Important: Seqera needs to be able to SSH to Expanse without a Time-based One-Time Password (TOTP). To enable this, you will need to open a ticket through ACCESS (https://access-ci.atlassian.net/servicedesk/customer/portal/2/group/3/create/17) to allow the Seqera IP range to bypass TOTP. You can find the exact IP range at https://community.seqera.io/t/seqera-platform-ip-addresses/1120. - Work directory: First, create a directory on Expanse:
mkdir /expanse/lustre/scratch/$USER/temp_project/nextflow. Then, in Seqera, specify the absolute path to this directory (e.g.,/expanse/lustre/scratch/your_username/temp_project/nextflow), replacingyour_usernamewith your actual username. - Launch directory: Leave this field empty.
- Queue names: Use
debugfor the head queue andcomputefor the compute queue. - Head job submit options: In the advanced options, add:
--account=YOUR_PROJECT_ACCOUNT --time=00:30:00 --nodes=1 --ntasks=1. Remember to replaceYOUR_PROJECT_ACCOUNTwith your actual project account.
Refer to the Seqera documentation for HPC setup for more detailed instructions on each of these steps.
3.2. Configure Pipeline in Seqera Launchpad
Once your compute environment is configured, you can configure a pipeline in the Seqera Launchpad:
- Navigate to the “Launchpad” section in your Seqera workspace.
- Click “Add pipeline”.
- In “Pipeline to launch”, enter:
https://github.com/zonca/expanse_nextflow - Set “Revision” to
main. - Enable the “Pull latest” button.
- Set “Work directory” to
/expanse/lustre/scratch/zonca/temp_project/nextflow. - In “Config profiles”, select
slurm debug. This profile is automatically pulled from thenextflow.configfile in the repository.
3.3. Launch Pipeline
After configuring the pipeline in the Launchpad, go to your Seqera dashboard. Find the expanse_nextflow pipeline and click “Launch”. This action will submit a single job to the debug queue on Expanse to execute Nextflow. This Nextflow process will then submit the actual workflow jobs to the appropriate queues as defined in your pipeline. Real-time updates and logs for all jobs will flow directly into the Seqera.io UI, allowing you to monitor the entire workflow execution.
Conclusion: Impressed by Seqera
I must say I am impressed by Seqera; it is so well-built and polished. You can configure and launch pipelines, view all tasks executing in real-time in a fancy web dashboard, then dig into the logs, check resource utilization for each stage of the pipeline, check execution time task by task, and much more. It truly streamlines the management and monitoring of complex Nextflow workflows on HPC systems like Expanse.