.. index:: Queue Systems

.. _queues:

Queue Systems
=============

Bio_pype supports multiple queue/scheduling systems through a modular adapter architecture.
This allows pipelines to run on different computational environments without changing pipeline definitions.

Overview
--------

Queue systems in Bio_pype:

- Run jobs locally, on an HPC scheduler (SLURM), or in the cloud (Scaleway)
- Handle job dependencies and execution order
- Manage resource requirements (CPUs, memory, walltime)
- Track job submission and completion in ``pipeline_runtime.yaml``
- Share a common base (``BaseQueue``) so new systems are easy to add

Available Queue Systems
-----------------------

Bio_pype ships four queue modules. Select one per run with ``--queue``::

    pype pipelines --queue slurm my_pipeline --input data.txt

If ``--queue`` is omitted, the default configured queue is used. To run on a
system not listed here (PBS/Torque, SGE, LSF, …), you implement a small adapter —
see :ref:`queue_adapters`.

.. list-table::
   :header-rows: 1
   :widths: 16 84

   * - ``--queue``
     - Queue module
   * - ``parallel``
     - **Local multiprocessing.** Runs jobs on the local machine with a worker
       pool, honouring CPU/memory limits (``PYPE_NCPU`` / ``PYPE_MEM``) and job
       dependencies. The everyday choice for a laptop or a single server.
   * - ``slurm``
     - **SLURM workload manager.** Submits each job with ``sbatch`` and monitors
       it with ``squeue`` / ``sacct``. For HPC clusters.
   * - ``scaleway``
     - **Scaleway cloud.** Provisions cloud instances and block-volume snapshots
       to run jobs; pairs with the Scaleway storage backend (see
       :ref:`storage_backends`).
   * - ``dry_run``
     - **No execution.** Registers the full job plan into
       ``pipeline_runtime.yaml`` without running anything — useful for validating
       a pipeline's structure and dependencies.

Local execution
^^^^^^^^^^^^^^^^

``--queue parallel`` runs the pipeline on the local machine. Jobs are dispatched
to a worker pool as their dependencies complete, bounded by ``PYPE_NCPU`` and
``PYPE_MEM``. This is the right choice for testing, small analyses, debugging,
and single-machine workflows.

SLURM
^^^^^

``--queue slurm`` submits each job to the SLURM workload manager. Snippet
:ref:`resource requirements <resource_requirements>` are translated to ``sbatch``
flags:

.. list-table::
   :widths: 33 33 33
   :header-rows: 1

   * - Bio_pype
     - SLURM Flag
     - Example
   * - ncpu
     - --cpus-per-task
     - --cpus-per-task=8
   * - mem
     - --mem
     - --mem=16G
   * - time
     - --time
     - --time=02:00:00

So a snippet declaring::

    requirements:
      ncpu: 8
      mem: 16gb
      time: '02:00:00'

is submitted as::

    sbatch --cpus-per-task=8 --mem=16G --time=02:00:00 job_script.sh

Account and partition defaults can be set with ``PYPE_QUEUE_ACCOUNT`` and
``PYPE_QUEUE_PARTITION`` (see :ref:`configuration`).

Scaleway (cloud)
^^^^^^^^^^^^^^^^

``--queue scaleway`` runs jobs on Scaleway cloud instances, moving data via
block-volume snapshots. It is designed to work together with the Scaleway
storage backend; see :ref:`storage_backends` for how inputs, programs and outputs
are staged.

Dry run
^^^^^^^

``--queue dry_run`` registers every job into ``pipeline_runtime.yaml`` and resolves
the dependency graph **without executing anything**. Use it to check that a
pipeline expands and wires up as expected before committing real compute.

.. _resource_requirements:

Resource Requirements
---------------------

Defining Requirements in Snippets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Markdown snippets**::

    ## requirements

    ```yaml
    ncpu: 8          # Number of CPU cores
    mem: 16gb        # Memory allocation
    time: '02:00:00' # Max runtime (HH:MM:SS)
    ```

**Python snippets**::

    def requirements():
        return {
            'ncpu': 8,
            'mem': '16gb',
            'time': '02:00:00'
        }

**Supported units:**

- Memory: 'gb', 'GB', 'mb', 'MB', 'kb', 'KB'
- Time: 'HH:MM:SS' format
- CPUs: Integer number of cores

Overriding Requirements in Pipelines
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can override snippet requirements in pipeline definitions::

    steps:
      step_1_align:
        name: bwa_mem
        type: snippet
        requirements:
          ncpu: 16      # Override default 8 CPUs
          mem: 32gb     # Override default 16GB
          time: '04:00:00'  # Override default 2 hours
        arguments:
          -i: '%(input_bam)s'

Job Dependencies
----------------

Automatic Dependency Management
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Bio_pype automatically handles job dependencies:

1. **Within pipelines**: Jobs wait for dependencies to complete
2. **Across steps**: Output from one step becomes input to the next
3. **Array jobs**: All jobs in group tracked together

**Example**::

    steps:
      step_1_prepare:
        name: prepare_reference
        type: snippet
        depends_on: []

      step_2_align:
        name: align_reads
        type: snippet
        depends_on: [step_1_prepare]  # Waits for step_1

      step_3_sort:
        name: sort_bam
        type: snippet
        depends_on: [step_2_align]  # Waits for step_2

Queue System Dependency Handling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Bio_pype always resolves the dependency graph itself: a job is only released once
all of its dependencies have completed. Depending on the queue, it either submits
the next job at that point or hands the dependency to the scheduler.

**SLURM** can enforce dependencies natively::

    sbatch --dependency=afterok:12345 job_script.sh

**Local (parallel)** dispatches a job to the worker pool only when its
dependencies are done — no queue-level dependency flags are needed.

Job Submission and Monitoring
------------------------------

Submitting Jobs
^^^^^^^^^^^^^^^

Bio_pype handles job submission automatically when you run a pipeline::

    $ pype pipelines --queue slurm genomic_analysis --input sample.fq

For each snippet in the pipeline, the queue:

1. Skips the job if its outputs already exist (see :ref:`progress`)
2. Prepares the job script and environment
3. Registers/submits the job to the queue system
4. Records the queue ID and status in ``pipeline_runtime.yaml``
5. Sets up log files, then moves on as dependencies clear

Monitoring Jobs
^^^^^^^^^^^^^^^

During a run, Bio_pype shows a live status table in the terminal (see
:ref:`progress`). You can also inspect the underlying queue directly.

**SLURM**::

    squeue -u $USER          # all your jobs
    scontrol show job 12345  # one job's details

**Local (parallel)**::

    ps aux | grep pype                                   # running processes
    tail -f ~/.bio_pype/logs/*/jobs/*/stdout             # live logs

Checking Job Status
^^^^^^^^^^^^^^^^^^^

Every job's status is tracked in the pipeline runtime file::

    # View the runtime file
    cat /path/to/logs/<run_id>_<pipeline>/pipeline_runtime.yaml

Each job entry records its status, queue ID and timestamps::

    align_reads_abc123:
      status: completed
      queue_id: '12345'
      submitted_at: '2025-01-24T10:00:00'
      completed_at: '2025-01-24T10:30:00'

Custom Queue Adapters
---------------------

To run on a queue system that Bio_pype does not ship (PBS/Torque, SGE, LSF,
another cloud), you write a small adapter module and point ``PYPE_QUEUES`` at it::

    export PYPE_QUEUES=/path/to/my_queues
    pype pipelines --queue my_custom_queue my_pipeline --input data.txt

Adapters subclass ``BaseQueue`` (and optionally provide a ``QueueCommandHandler``),
which gives you the execution loop, progress display, compute.bio API integration,
carbon tracking and resume support for free. The full guide — including the
module-level ``register`` / ``execute`` interface and a worked example — is in
:ref:`queue_adapters`.

Best Practices
--------------

1. **Choose appropriate queue**: Use ``none`` for testing, cluster queues for production

2. **Set realistic resource requirements**: Don't over-request resources

3. **Monitor queue usage**: Check for failed or stuck jobs regularly

4. **Use resume functionality**: Let Bio_pype handle interrupted runs

5. **Test locally first**: Use ``--queue parallel`` (or ``--queue dry_run`` to
   only validate the plan) before submitting to a cluster

6. **Check queue limits**: Ensure requirements fit within queue limits

7. **Handle errors gracefully**: Check logs when jobs fail

8. **Track progress**: Monitor ``pipeline_runtime.yaml`` for long-running pipelines

9. **Clean up**: Remove completed jobs and old log files

Troubleshooting
---------------

Job Won't Submit
^^^^^^^^^^^^^^^^

**Symptoms:**

- No queue ID returned
- Job not appearing in queue
- Submission errors in logs

**Solutions:**

1. Check the queue system is available::

    which sbatch  # For SLURM

2. Verify queue module is installed::

    ls $PYPE_QUEUES/

3. Check resource requirements are valid::

    # Look in logs for submission command
    cat ~/.bio_pype/logs/*/pipeline.log | grep "Submitting"

4. Test queue manually::

    echo "#!/bin/bash\necho hello" | sbatch

Job Fails Immediately
^^^^^^^^^^^^^^^^^^^^^

**Symptoms:**

- Job completes with error code
- Short runtime
- Error messages in stderr

**Solutions:**

1. Check stderr log::

    cat ~/.bio_pype/logs/*/jobs/*/stderr

2. Verify environment::

    # Check if modules/containers are accessible
    # Check if paths exist

3. Test command locally::

    pype pipelines --queue parallel my_pipeline --input data.txt

4. Check resource allocation::

    # Job may have run out of memory or time

Job Stays Pending
^^^^^^^^^^^^^^^^^

**Symptoms:**

- Job status: PENDING for extended period
- Never starts running

**Solutions:**

1. Check queue status::

    squeue -u $USER  # SLURM

2. Check resource availability::

    sinfo  # SLURM

3. Reduce resource requirements::

    # Decrease ncpu, mem, or time in snippet

4. Check queue limits::

    # Verify you haven't exceeded job or resource limits

Dependencies Not Working
^^^^^^^^^^^^^^^^^^^^^^^^^

**Symptoms:**

- Jobs run in wrong order
- Jobs fail due to missing inputs

**Solutions:**

1. Check pipeline dependencies::

    # Review depends_on in pipeline YAML

2. Verify job IDs are tracked::

    # Check runtime file for queue_ids
    grep "queue_id" pipeline_runtime.yaml

3. Validate the plan without running::

    pype pipelines --queue dry_run my_pipeline ...

Integration with Progress Tracking
-----------------------------------

All queues share the same execution loop (via ``BaseQueue``), so they integrate
uniformly with Bio_pype's progress tracking:

- **Live display**: a status table updates in the terminal as jobs change state
- **Status tracking**: each job's state (pending, running, completed, failed) is
  written to ``pipeline_runtime.yaml``
- **Resume**: failed or cancelled jobs can be rerun, and ``resume --sync`` can
  reconcile state without cancelling live jobs
- **API integration**: when compute.bio is configured, progress is streamed and
  remote commands (cancel, resubmit, log requests) are handled

See :ref:`progress` and :ref:`resume` for more details.

See Also
--------

- :ref:`progress` - Progress tracking and the live status display
- :ref:`resume` - Pipeline resume and ``--sync`` reconciliation
- :ref:`queue_adapters` - Writing a queue adapter with ``BaseQueue``
- :ref:`storage_backends` - How cloud queues move data
- :ref:`profiles` - Profile configuration