.. index:: Run, Template-based Pipelines

.. _run:

Pipelines Run
===

The ``pype run`` command provides a simplified interface for executing pipelines by automatically generating argument values from high-level template variables. Instead of manually specifying all file paths, users provide just the essential inputs and the system uses templates to construct the full paths.

This feature is particularly useful for:

- **Simplifying user workflows**: Users don't need to remember all argument names and path structures
- **Standardizing project layouts**: Enforces consistent directory structures across runs
- **GUI/Form-based deployment**: Template-based workflows are easier to expose through web interfaces
- **Batch processing**: Reduces errors from manual path construction when processing multiple samples


Quick Start
-----------

First, define templates in your pipeline YAML:

.. code-block:: yaml

    info:
      description: My bioinformatics pipeline
      api: 2.1.0

      template_paths:
        output_bam:
          template: "%(project_dir)s/%(sample_name)s/bam/output.bam"
          type: file
        output_dir:
          template: "%(project_dir)s/%(sample_name)s/qc"
          type: directory

      template_arguments:
        project_dir: Project root directory
        sample_name: Sample identifier or name

Then run the pipeline:

.. code-block:: bash

    pype run my_pipeline \
      --project_dir /scratch/project \
      --sample_name sample_1 \
      --fastq1 /data/sample_1_R1.fastq.gz \
      --fastq2 /data/sample_1_R2.fastq.gz


Concepts
--------

Template Paths
~~~~~~~~~~~~~~

Template paths define how to generate argument values from template variables.

**Structure:**

.. code-block:: yaml

    template_paths:
      argument_name:
        template: "path/with/%(variable)s/placeholders"
        type: file|directory

**Fields:**

- ``argument_name``: The pipeline argument this template generates (e.g., ``output_bam``, ``logs``)
- ``template``: Template string with ``%(variable)s`` placeholders
- ``type``: Either ``file`` (creates parent directories) or ``directory`` (creates the path itself)

**Example:**

.. code-block:: yaml

    template_paths:
      bam_output:
        template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_aligned.bam"
        type: file
      qc_directory:
        template: "%(project_dir)s/%(sample_name)s/qc"
        type: directory
      logs:
        template: "%(project_dir)s/%(sample_name)s/logs"
        type: directory


Template Arguments
~~~~~~~~~~~~~~~~~~

Template arguments provide user-friendly descriptions for template variables shown in help.

**Structure:**

.. code-block:: yaml

    template_arguments:
      variable_name: "User-friendly description"

**Example:**

.. code-block:: yaml

    template_arguments:
      project_dir: "Project root directory (will create subdirectories)"
      sample_name: "Sample identifier or name"
      profile_name: "Pipeline profile name"  # Not needed - auto-injected


Template Variables
~~~~~~~~~~~~~~~~~~

Variables that can be used in template strings come from two sources:

1. **User-provided arguments**: Any argument not in ``template_paths`` is treated as a template variable
   - Example: ``--project_dir``, ``--sample_name``, ``--fastq1``, ``--fastq2``

2. **Auto-injected variables**: Automatically provided by the system
   - ``profile_name``: The current profile being used (you don't need to provide this)

**Usage in templates:**

.. code-block:: yaml

    template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_%(profile_name)s_aligned.bam"
    # Uses: project_dir, sample_name, profile_name


Directory Creation
~~~~~~~~~~~~~~~~~~

Before pipeline execution, ``pype run`` automatically creates all required directories:

- **For ``type: directory``**: Creates the path directly with ``mkdir -p``
- **For ``type: file``**: Creates parent directories with ``mkdir -p $(dirname path)``

Example:

.. code-block:: yaml

    template_paths:
      bam_output:
        template: "%(project_dir)s/%(sample_name)s/bam/output.bam"
        type: file
      logs:
        template: "%(project_dir)s/%(sample_name)s/logs"
        type: directory

Running with:

.. code-block:: bash

    pype run pipeline \
      --project_dir /scratch/project \
      --sample_name sample1

Results in directories:

.. code-block:: text

    /scratch/project/sample1/bam/         (created for file type)
    /scratch/project/sample1/logs/        (created for directory type)


Complete Example
----------------

**Pipeline YAML:**

.. code-block:: yaml

    info:
      description: GATK data processing pipeline with alignment
      api: 2.1.0

      arguments:
        fastq1: First mate fastQ file
        fastq2: Second mate fastQ file
        tmp_dir: Temporary directory

      defaults:
        tmp_dir: /scratch

      template_paths:
        bam_markdups:
          template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_%(profile_name)s_markdups.bam"
          type: file
        bam_recalibrated:
          template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_%(profile_name)s_markdups_recal.bam"
          type: file
        recalibration_table:
          template: "%(project_dir)s/%(sample_name)s/tables/%(sample_name)s_%(profile_name)s_recalibration_table.txt"
          type: file
        dup_metrics:
          template: "%(project_dir)s/%(sample_name)s/tables/%(sample_name)s_%(profile_name)s_dup_metrics.txt"
          type: file
        out_qc:
          template: "%(project_dir)s/%(sample_name)s/qc"
          type: directory
        logs:
          template: "%(project_dir)s/%(sample_name)s/logs"
          type: directory

      template_arguments:
        project_dir: "Project root directory"
        sample_name: "Sample identifier or name"

    steps:
      step_align:
        name: gatk_bwa_mem_gpu
        type: snippet
        depends_on: []
        arguments:
          --out: "%(bam_markdups)s"
          --sample-name: "%(sample_name)s"
          --f1: "%(fastq1)s"
          --f2: "%(fastq2)s"
          --dup-metrics: "%(dup_metrics)s"
          --base-recal: "%(recalibration_table)s"
          --qc: "%(out_qc)s"
          --tmp: "%(tmp_dir)s"

      step_recalibrate:
        name: gatk_apply_BSQR
        type: snippet
        depends_on: [step_align]
        arguments:
          -i: "%(bam_markdups)s"
          -o: "%(bam_recalibrated)s"
          -r: "%(recalibration_table)s"


**Show help:**

.. code-block:: bash

    $ pype run my_pipeline

    usage: pype run my_pipeline --fastq1 FASTQ1 --fastq2 FASTQ2
                               --project_dir PROJECT_DIR --sample_name SAMPLE_NAME
                               [--tmp_dir TMP_DIR]

    GATK data processing pipeline with alignment

    Template Variables:
      Variables used to generate paths from templates

      --fastq1 FASTQ1           First mate fastQ file
      --fastq2 FASTQ2           Second mate fastQ file
      --project_dir PROJECT_DIR
                                Project root directory
      --sample_name SAMPLE_NAME
                                Sample identifier or name

    Optional:
      Optional arguments with defaults

      --tmp_dir TMP_DIR         Temporary directory. Default: /scratch


**Execute pipeline:**

.. code-block:: bash

    pype run --queue slurm --run-name "Align sample1" my_pipeline \
      --project_dir /scratch/project \
      --sample_name sample1 \
      --fastq1 /data/sample1_R1.fastq.gz \
      --fastq2 /data/sample1_R2.fastq.gz

**What happens:**

1. Templates are resolved:
   - ``bam_markdups`` → ``/scratch/project/sample1/bam/sample1_hg38_markdups.bam``
   - ``out_qc`` → ``/scratch/project/sample1/qc``
   - ``logs`` → ``/scratch/project/sample1/logs``

2. Directories are created:
   - ``/scratch/project/sample1/bam/`` (for file type)
   - ``/scratch/project/sample1/qc/`` (for directory type)
   - ``/scratch/project/sample1/logs/`` (for directory type)

3. Pipeline executes with resolved arguments:
   - ``--out /scratch/project/sample1/bam/sample1_hg38_markdups.bam``
   - ``--qc /scratch/project/sample1/qc``
   - etc.


Command-line Options
--------------------

``pype run`` supports the same global options as ``pype pipelines``:

.. code-block:: bash

    pype [--profile PROFILE] run [--queue QUEUE] [--log LOG] [--run-name RUN_NAME] \
         PIPELINE [ARGUMENTS...]

**Common options:**

- ``--profile PROFILE``: Profile to use (default: configured default)
- ``--queue QUEUE``: Queue system (default: configured default)
- ``--log LOG``: Log directory (can use template if defined in ``template_paths``)
- ``--run-name RUN_NAME``: Display name for this run (useful for tracking)

**Examples:**

.. code-block:: bash

    # Use SLURM queue
    pype run --queue slurm my_pipeline --project_dir /proj --sample_name s1 ...

    # Use dry_run to preview without execution
    pype run --queue dry_run my_pipeline --project_dir /proj --sample_name s1 ...

    # Specify custom log directory
    pype run --log /custom/logs my_pipeline --project_dir /proj --sample_name s1 ...

    # Add descriptive run name
    pype run --run-name "Align batch_1" my_pipeline --project_dir /proj --sample_name s1 ...


Comparison with ``pype pipelines``
----------------------------------

**Before (pype pipelines):**

.. code-block:: bash

    pype pipelines my_pipeline \
      --bam_markdups /scratch/project/sample1/bam/sample1_hg38_markdups.bam \
      --bam_recalibrated /scratch/project/sample1/bam/sample1_hg38_markdups_recal.bam \
      --recalibration_table /scratch/project/sample1/tables/sample1_hg38_recalibration_table.txt \
      --dup_metrics /scratch/project/sample1/tables/sample1_hg38_dup_metrics.txt \
      --out_qc /scratch/project/sample1/qc \
      --fastq1 /data/sample1_R1.fastq.gz \
      --fastq2 /data/sample1_R2.fastq.gz

**After (pype run):**

.. code-block:: bash

    pype run my_pipeline \
      --project_dir /scratch/project \
      --sample_name sample1 \
      --fastq1 /data/sample1_R1.fastq.gz \
      --fastq2 /data/sample1_R2.fastq.gz

Benefits of ``pype run``:

- ✅ Fewer arguments to specify
- ✅ Less error-prone path construction
- ✅ Automatic directory creation
- ✅ Consistent directory structure across runs
- ✅ Easier to expose through web interfaces
- ✅ Clearer help showing only relevant inputs


Advanced Features
-----------------

Argument Types
~~~~~~~~~~~~~~

Non-template arguments preserve their full type information:

- **Boolean flags** with ``action: store_true`` work as expected
- **Multiple values** (``nargs``) are supported
- **Type conversions** (int, float, etc.) are handled by the pipeline
- **Custom argument handlers** (composite_arg, batch_arg, etc.) work normally

Example:

.. code-block:: yaml

    arguments:
      --exact_match:
        value: "%(exact_match)s"
        action: store_true
      --threads:
        value: "%(threads)s"
        type: int

Command line:

.. code-block:: bash

    pype run pipeline --exact_match --threads 8 ...


Nested Pipelines
~~~~~~~~~~~~~~~~

``pype run`` works with nested pipelines and complex dependencies exactly like ``pype pipelines``:

.. code-block:: yaml

    steps:
      step_1_align:
        name: alignment_snippet
        type: snippet
        depends_on: []
        arguments:
          --bam: "%(bam_output)s"
          ...

      step_2_qc:
        name: qc_pipeline
        type: pipeline
        depends_on: [step_1_align]
        arguments:
          --bam: "%(bam_output)s"
          --qc_dir: "%(qc_output)s"


Troubleshooting
---------------

**Pipeline not shown in help**

Only pipelines with ``template_paths`` defined are shown by ``pype run``.
If a pipeline doesn't appear, add the ``template_paths`` section to its YAML.

**Missing template variable error**

.. code-block::

    KeyError: Template for 'bam_output' uses undefined variable: project_dir

This means a template uses a variable that wasn't provided. Make sure to include all required template variables on the command line.

**Directories not created**

Verify that ``template_paths`` has the correct ``type`` field (``file`` or ``directory``).
Check file permissions in the parent directory.

**Wrong paths generated**

Double-check the template strings in ``template_paths``.
Use ``pype run pipeline --help`` to see which variables are being collected.


See Also
--------

- :ref:`pipelines`: Full pipeline configuration reference
- :ref:`getting_started`: Getting started with Bio-pype
- :ref:`profiles`: Profile configuration and management