.. index:: Queue Systems .. _queues: Queue Systems ============= Bio_pype supports multiple queue/scheduling systems through a modular adapter architecture. This allows pipelines to run on different computational environments without changing pipeline definitions. Overview -------- Queue systems in Bio_pype: - Run jobs locally, on an HPC scheduler (SLURM), or in the cloud (Scaleway) - Handle job dependencies and execution order - Manage resource requirements (CPUs, memory, walltime) - Track job submission and completion in ``pipeline_runtime.yaml`` - Share a common base (``BaseQueue``) so new systems are easy to add Available Queue Systems ----------------------- Bio_pype ships four queue modules. Select one per run with ``--queue``:: pype pipelines --queue slurm my_pipeline --input data.txt If ``--queue`` is omitted, the default configured queue is used. To run on a system not listed here (PBS/Torque, SGE, LSF, …), you implement a small adapter — see :ref:`queue_adapters`. .. list-table:: :header-rows: 1 :widths: 16 84 * - ``--queue`` - Queue module * - ``parallel`` - **Local multiprocessing.** Runs jobs on the local machine with a worker pool, honouring CPU/memory limits (``PYPE_NCPU`` / ``PYPE_MEM``) and job dependencies. The everyday choice for a laptop or a single server. * - ``slurm`` - **SLURM workload manager.** Submits each job with ``sbatch`` and monitors it with ``squeue`` / ``sacct``. For HPC clusters. * - ``scaleway`` - **Scaleway cloud.** Provisions cloud instances and block-volume snapshots to run jobs; pairs with the Scaleway storage backend (see :ref:`storage_backends`). * - ``dry_run`` - **No execution.** Registers the full job plan into ``pipeline_runtime.yaml`` without running anything — useful for validating a pipeline's structure and dependencies. Local execution ^^^^^^^^^^^^^^^^ ``--queue parallel`` runs the pipeline on the local machine. Jobs are dispatched to a worker pool as their dependencies complete, bounded by ``PYPE_NCPU`` and ``PYPE_MEM``. This is the right choice for testing, small analyses, debugging, and single-machine workflows. SLURM ^^^^^ ``--queue slurm`` submits each job to the SLURM workload manager. Snippet :ref:`resource requirements ` are translated to ``sbatch`` flags: .. list-table:: :widths: 33 33 33 :header-rows: 1 * - Bio_pype - SLURM Flag - Example * - ncpu - --cpus-per-task - --cpus-per-task=8 * - mem - --mem - --mem=16G * - time - --time - --time=02:00:00 So a snippet declaring:: requirements: ncpu: 8 mem: 16gb time: '02:00:00' is submitted as:: sbatch --cpus-per-task=8 --mem=16G --time=02:00:00 job_script.sh Account and partition defaults can be set with ``PYPE_QUEUE_ACCOUNT`` and ``PYPE_QUEUE_PARTITION`` (see :ref:`configuration`). Scaleway (cloud) ^^^^^^^^^^^^^^^^ ``--queue scaleway`` runs jobs on Scaleway cloud instances, moving data via block-volume snapshots. It is designed to work together with the Scaleway storage backend; see :ref:`storage_backends` for how inputs, programs and outputs are staged. Dry run ^^^^^^^ ``--queue dry_run`` registers every job into ``pipeline_runtime.yaml`` and resolves the dependency graph **without executing anything**. Use it to check that a pipeline expands and wires up as expected before committing real compute. .. _resource_requirements: Resource Requirements --------------------- Defining Requirements in Snippets ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Markdown snippets**:: ## requirements ```yaml ncpu: 8 # Number of CPU cores mem: 16gb # Memory allocation time: '02:00:00' # Max runtime (HH:MM:SS) ``` **Python snippets**:: def requirements(): return { 'ncpu': 8, 'mem': '16gb', 'time': '02:00:00' } **Supported units:** - Memory: 'gb', 'GB', 'mb', 'MB', 'kb', 'KB' - Time: 'HH:MM:SS' format - CPUs: Integer number of cores Overriding Requirements in Pipelines ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can override snippet requirements in pipeline definitions:: steps: step_1_align: name: bwa_mem type: snippet requirements: ncpu: 16 # Override default 8 CPUs mem: 32gb # Override default 16GB time: '04:00:00' # Override default 2 hours arguments: -i: '%(input_bam)s' Job Dependencies ---------------- Automatic Dependency Management ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Bio_pype automatically handles job dependencies: 1. **Within pipelines**: Jobs wait for dependencies to complete 2. **Across steps**: Output from one step becomes input to the next 3. **Array jobs**: All jobs in group tracked together **Example**:: steps: step_1_prepare: name: prepare_reference type: snippet depends_on: [] step_2_align: name: align_reads type: snippet depends_on: [step_1_prepare] # Waits for step_1 step_3_sort: name: sort_bam type: snippet depends_on: [step_2_align] # Waits for step_2 Queue System Dependency Handling ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Bio_pype always resolves the dependency graph itself: a job is only released once all of its dependencies have completed. Depending on the queue, it either submits the next job at that point or hands the dependency to the scheduler. **SLURM** can enforce dependencies natively:: sbatch --dependency=afterok:12345 job_script.sh **Local (parallel)** dispatches a job to the worker pool only when its dependencies are done — no queue-level dependency flags are needed. Job Submission and Monitoring ------------------------------ Submitting Jobs ^^^^^^^^^^^^^^^ Bio_pype handles job submission automatically when you run a pipeline:: $ pype pipelines --queue slurm genomic_analysis --input sample.fq For each snippet in the pipeline, the queue: 1. Skips the job if its outputs already exist (see :ref:`progress`) 2. Prepares the job script and environment 3. Registers/submits the job to the queue system 4. Records the queue ID and status in ``pipeline_runtime.yaml`` 5. Sets up log files, then moves on as dependencies clear Monitoring Jobs ^^^^^^^^^^^^^^^ During a run, Bio_pype shows a live status table in the terminal (see :ref:`progress`). You can also inspect the underlying queue directly. **SLURM**:: squeue -u $USER # all your jobs scontrol show job 12345 # one job's details **Local (parallel)**:: ps aux | grep pype # running processes tail -f ~/.bio_pype/logs/*/jobs/*/stdout # live logs Checking Job Status ^^^^^^^^^^^^^^^^^^^ Every job's status is tracked in the pipeline runtime file:: # View the runtime file cat /path/to/logs/_/pipeline_runtime.yaml Each job entry records its status, queue ID and timestamps:: align_reads_abc123: status: completed queue_id: '12345' submitted_at: '2025-01-24T10:00:00' completed_at: '2025-01-24T10:30:00' Custom Queue Adapters --------------------- To run on a queue system that Bio_pype does not ship (PBS/Torque, SGE, LSF, another cloud), you write a small adapter module and point ``PYPE_QUEUES`` at it:: export PYPE_QUEUES=/path/to/my_queues pype pipelines --queue my_custom_queue my_pipeline --input data.txt Adapters subclass ``BaseQueue`` (and optionally provide a ``QueueCommandHandler``), which gives you the execution loop, progress display, compute.bio API integration, carbon tracking and resume support for free. The full guide — including the module-level ``register`` / ``execute`` interface and a worked example — is in :ref:`queue_adapters`. Best Practices -------------- 1. **Choose appropriate queue**: Use ``none`` for testing, cluster queues for production 2. **Set realistic resource requirements**: Don't over-request resources 3. **Monitor queue usage**: Check for failed or stuck jobs regularly 4. **Use resume functionality**: Let Bio_pype handle interrupted runs 5. **Test locally first**: Use ``--queue parallel`` (or ``--queue dry_run`` to only validate the plan) before submitting to a cluster 6. **Check queue limits**: Ensure requirements fit within queue limits 7. **Handle errors gracefully**: Check logs when jobs fail 8. **Track progress**: Monitor ``pipeline_runtime.yaml`` for long-running pipelines 9. **Clean up**: Remove completed jobs and old log files Troubleshooting --------------- Job Won't Submit ^^^^^^^^^^^^^^^^ **Symptoms:** - No queue ID returned - Job not appearing in queue - Submission errors in logs **Solutions:** 1. Check the queue system is available:: which sbatch # For SLURM 2. Verify queue module is installed:: ls $PYPE_QUEUES/ 3. Check resource requirements are valid:: # Look in logs for submission command cat ~/.bio_pype/logs/*/pipeline.log | grep "Submitting" 4. Test queue manually:: echo "#!/bin/bash\necho hello" | sbatch Job Fails Immediately ^^^^^^^^^^^^^^^^^^^^^ **Symptoms:** - Job completes with error code - Short runtime - Error messages in stderr **Solutions:** 1. Check stderr log:: cat ~/.bio_pype/logs/*/jobs/*/stderr 2. Verify environment:: # Check if modules/containers are accessible # Check if paths exist 3. Test command locally:: pype pipelines --queue parallel my_pipeline --input data.txt 4. Check resource allocation:: # Job may have run out of memory or time Job Stays Pending ^^^^^^^^^^^^^^^^^ **Symptoms:** - Job status: PENDING for extended period - Never starts running **Solutions:** 1. Check queue status:: squeue -u $USER # SLURM 2. Check resource availability:: sinfo # SLURM 3. Reduce resource requirements:: # Decrease ncpu, mem, or time in snippet 4. Check queue limits:: # Verify you haven't exceeded job or resource limits Dependencies Not Working ^^^^^^^^^^^^^^^^^^^^^^^^^ **Symptoms:** - Jobs run in wrong order - Jobs fail due to missing inputs **Solutions:** 1. Check pipeline dependencies:: # Review depends_on in pipeline YAML 2. Verify job IDs are tracked:: # Check runtime file for queue_ids grep "queue_id" pipeline_runtime.yaml 3. Validate the plan without running:: pype pipelines --queue dry_run my_pipeline ... Integration with Progress Tracking ----------------------------------- All queues share the same execution loop (via ``BaseQueue``), so they integrate uniformly with Bio_pype's progress tracking: - **Live display**: a status table updates in the terminal as jobs change state - **Status tracking**: each job's state (pending, running, completed, failed) is written to ``pipeline_runtime.yaml`` - **Resume**: failed or cancelled jobs can be rerun, and ``resume --sync`` can reconcile state without cancelling live jobs - **API integration**: when compute.bio is configured, progress is streamed and remote commands (cancel, resubmit, log requests) are handled See :ref:`progress` and :ref:`resume` for more details. See Also -------- - :ref:`progress` - Progress tracking and the live status display - :ref:`resume` - Pipeline resume and ``--sync`` reconciliation - :ref:`queue_adapters` - Writing a queue adapter with ``BaseQueue`` - :ref:`storage_backends` - How cloud queues move data - :ref:`profiles` - Profile configuration