.. index:: Run, Template-based Pipelines .. _run: Pipelines Run === The ``pype run`` command provides a simplified interface for executing pipelines by automatically generating argument values from high-level template variables. Instead of manually specifying all file paths, users provide just the essential inputs and the system uses templates to construct the full paths. This feature is particularly useful for: - **Simplifying user workflows**: Users don't need to remember all argument names and path structures - **Standardizing project layouts**: Enforces consistent directory structures across runs - **GUI/Form-based deployment**: Template-based workflows are easier to expose through web interfaces - **Batch processing**: Reduces errors from manual path construction when processing multiple samples Quick Start ----------- First, define templates in your pipeline YAML: .. code-block:: yaml info: description: My bioinformatics pipeline api: 2.1.0 template_paths: output_bam: template: "%(project_dir)s/%(sample_name)s/bam/output.bam" type: file output_dir: template: "%(project_dir)s/%(sample_name)s/qc" type: directory template_arguments: project_dir: Project root directory sample_name: Sample identifier or name Then run the pipeline: .. code-block:: bash pype run my_pipeline \ --project_dir /scratch/project \ --sample_name sample_1 \ --fastq1 /data/sample_1_R1.fastq.gz \ --fastq2 /data/sample_1_R2.fastq.gz Concepts -------- Template Paths ~~~~~~~~~~~~~~ Template paths define how to generate argument values from template variables. **Structure:** .. code-block:: yaml template_paths: argument_name: template: "path/with/%(variable)s/placeholders" type: file|directory **Fields:** - ``argument_name``: The pipeline argument this template generates (e.g., ``output_bam``, ``logs``) - ``template``: Template string with ``%(variable)s`` placeholders - ``type``: Either ``file`` (creates parent directories) or ``directory`` (creates the path itself) **Example:** .. code-block:: yaml template_paths: bam_output: template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_aligned.bam" type: file qc_directory: template: "%(project_dir)s/%(sample_name)s/qc" type: directory logs: template: "%(project_dir)s/%(sample_name)s/logs" type: directory Template Arguments ~~~~~~~~~~~~~~~~~~ Template arguments provide user-friendly descriptions for template variables shown in help. **Structure:** .. code-block:: yaml template_arguments: variable_name: "User-friendly description" **Example:** .. code-block:: yaml template_arguments: project_dir: "Project root directory (will create subdirectories)" sample_name: "Sample identifier or name" profile_name: "Pipeline profile name" # Not needed - auto-injected Template Variables ~~~~~~~~~~~~~~~~~~ Variables that can be used in template strings come from two sources: 1. **User-provided arguments**: Any argument not in ``template_paths`` is treated as a template variable - Example: ``--project_dir``, ``--sample_name``, ``--fastq1``, ``--fastq2`` 2. **Auto-injected variables**: Automatically provided by the system - ``profile_name``: The current profile being used (you don't need to provide this) **Usage in templates:** .. code-block:: yaml template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_%(profile_name)s_aligned.bam" # Uses: project_dir, sample_name, profile_name Directory Creation ~~~~~~~~~~~~~~~~~~ Before pipeline execution, ``pype run`` automatically creates all required directories: - **For ``type: directory``**: Creates the path directly with ``mkdir -p`` - **For ``type: file``**: Creates parent directories with ``mkdir -p $(dirname path)`` Example: .. code-block:: yaml template_paths: bam_output: template: "%(project_dir)s/%(sample_name)s/bam/output.bam" type: file logs: template: "%(project_dir)s/%(sample_name)s/logs" type: directory Running with: .. code-block:: bash pype run pipeline \ --project_dir /scratch/project \ --sample_name sample1 Results in directories: .. code-block:: text /scratch/project/sample1/bam/ (created for file type) /scratch/project/sample1/logs/ (created for directory type) Complete Example ---------------- **Pipeline YAML:** .. code-block:: yaml info: description: GATK data processing pipeline with alignment api: 2.1.0 arguments: fastq1: First mate fastQ file fastq2: Second mate fastQ file tmp_dir: Temporary directory defaults: tmp_dir: /scratch template_paths: bam_markdups: template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_%(profile_name)s_markdups.bam" type: file bam_recalibrated: template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_%(profile_name)s_markdups_recal.bam" type: file recalibration_table: template: "%(project_dir)s/%(sample_name)s/tables/%(sample_name)s_%(profile_name)s_recalibration_table.txt" type: file dup_metrics: template: "%(project_dir)s/%(sample_name)s/tables/%(sample_name)s_%(profile_name)s_dup_metrics.txt" type: file out_qc: template: "%(project_dir)s/%(sample_name)s/qc" type: directory logs: template: "%(project_dir)s/%(sample_name)s/logs" type: directory template_arguments: project_dir: "Project root directory" sample_name: "Sample identifier or name" steps: step_align: name: gatk_bwa_mem_gpu type: snippet depends_on: [] arguments: --out: "%(bam_markdups)s" --sample-name: "%(sample_name)s" --f1: "%(fastq1)s" --f2: "%(fastq2)s" --dup-metrics: "%(dup_metrics)s" --base-recal: "%(recalibration_table)s" --qc: "%(out_qc)s" --tmp: "%(tmp_dir)s" step_recalibrate: name: gatk_apply_BSQR type: snippet depends_on: [step_align] arguments: -i: "%(bam_markdups)s" -o: "%(bam_recalibrated)s" -r: "%(recalibration_table)s" **Show help:** .. code-block:: bash $ pype run my_pipeline usage: pype run my_pipeline --fastq1 FASTQ1 --fastq2 FASTQ2 --project_dir PROJECT_DIR --sample_name SAMPLE_NAME [--tmp_dir TMP_DIR] GATK data processing pipeline with alignment Template Variables: Variables used to generate paths from templates --fastq1 FASTQ1 First mate fastQ file --fastq2 FASTQ2 Second mate fastQ file --project_dir PROJECT_DIR Project root directory --sample_name SAMPLE_NAME Sample identifier or name Optional: Optional arguments with defaults --tmp_dir TMP_DIR Temporary directory. Default: /scratch **Execute pipeline:** .. code-block:: bash pype run --queue slurm --run-name "Align sample1" my_pipeline \ --project_dir /scratch/project \ --sample_name sample1 \ --fastq1 /data/sample1_R1.fastq.gz \ --fastq2 /data/sample1_R2.fastq.gz **What happens:** 1. Templates are resolved: - ``bam_markdups`` → ``/scratch/project/sample1/bam/sample1_hg38_markdups.bam`` - ``out_qc`` → ``/scratch/project/sample1/qc`` - ``logs`` → ``/scratch/project/sample1/logs`` 2. Directories are created: - ``/scratch/project/sample1/bam/`` (for file type) - ``/scratch/project/sample1/qc/`` (for directory type) - ``/scratch/project/sample1/logs/`` (for directory type) 3. Pipeline executes with resolved arguments: - ``--out /scratch/project/sample1/bam/sample1_hg38_markdups.bam`` - ``--qc /scratch/project/sample1/qc`` - etc. Command-line Options -------------------- ``pype run`` supports the same global options as ``pype pipelines``: .. code-block:: bash pype [--profile PROFILE] run [--queue QUEUE] [--log LOG] [--run-name RUN_NAME] \ PIPELINE [ARGUMENTS...] **Common options:** - ``--profile PROFILE``: Profile to use (default: configured default) - ``--queue QUEUE``: Queue system (default: configured default) - ``--log LOG``: Log directory (can use template if defined in ``template_paths``) - ``--run-name RUN_NAME``: Display name for this run (useful for tracking) **Examples:** .. code-block:: bash # Use SLURM queue pype run --queue slurm my_pipeline --project_dir /proj --sample_name s1 ... # Use dry_run to preview without execution pype run --queue dry_run my_pipeline --project_dir /proj --sample_name s1 ... # Specify custom log directory pype run --log /custom/logs my_pipeline --project_dir /proj --sample_name s1 ... # Add descriptive run name pype run --run-name "Align batch_1" my_pipeline --project_dir /proj --sample_name s1 ... Comparison with ``pype pipelines`` ---------------------------------- **Before (pype pipelines):** .. code-block:: bash pype pipelines my_pipeline \ --bam_markdups /scratch/project/sample1/bam/sample1_hg38_markdups.bam \ --bam_recalibrated /scratch/project/sample1/bam/sample1_hg38_markdups_recal.bam \ --recalibration_table /scratch/project/sample1/tables/sample1_hg38_recalibration_table.txt \ --dup_metrics /scratch/project/sample1/tables/sample1_hg38_dup_metrics.txt \ --out_qc /scratch/project/sample1/qc \ --fastq1 /data/sample1_R1.fastq.gz \ --fastq2 /data/sample1_R2.fastq.gz **After (pype run):** .. code-block:: bash pype run my_pipeline \ --project_dir /scratch/project \ --sample_name sample1 \ --fastq1 /data/sample1_R1.fastq.gz \ --fastq2 /data/sample1_R2.fastq.gz Benefits of ``pype run``: - ✅ Fewer arguments to specify - ✅ Less error-prone path construction - ✅ Automatic directory creation - ✅ Consistent directory structure across runs - ✅ Easier to expose through web interfaces - ✅ Clearer help showing only relevant inputs Advanced Features ----------------- Argument Types ~~~~~~~~~~~~~~ Non-template arguments preserve their full type information: - **Boolean flags** with ``action: store_true`` work as expected - **Multiple values** (``nargs``) are supported - **Type conversions** (int, float, etc.) are handled by the pipeline - **Custom argument handlers** (composite_arg, batch_arg, etc.) work normally Example: .. code-block:: yaml arguments: --exact_match: value: "%(exact_match)s" action: store_true --threads: value: "%(threads)s" type: int Command line: .. code-block:: bash pype run pipeline --exact_match --threads 8 ... Nested Pipelines ~~~~~~~~~~~~~~~~ ``pype run`` works with nested pipelines and complex dependencies exactly like ``pype pipelines``: .. code-block:: yaml steps: step_1_align: name: alignment_snippet type: snippet depends_on: [] arguments: --bam: "%(bam_output)s" ... step_2_qc: name: qc_pipeline type: pipeline depends_on: [step_1_align] arguments: --bam: "%(bam_output)s" --qc_dir: "%(qc_output)s" Troubleshooting --------------- **Pipeline not shown in help** Only pipelines with ``template_paths`` defined are shown by ``pype run``. If a pipeline doesn't appear, add the ``template_paths`` section to its YAML. **Missing template variable error** .. code-block:: KeyError: Template for 'bam_output' uses undefined variable: project_dir This means a template uses a variable that wasn't provided. Make sure to include all required template variables on the command line. **Directories not created** Verify that ``template_paths`` has the correct ``type`` field (``file`` or ``directory``). Check file permissions in the parent directory. **Wrong paths generated** Double-check the template strings in ``template_paths``. Use ``pype run pipeline --help`` to see which variables are being collected. See Also -------- - :ref:`pipelines`: Full pipeline configuration reference - :ref:`getting_started`: Getting started with Bio-pype - :ref:`profiles`: Profile configuration and management