Pipelines Run ===

The pype run command provides a simplified interface for executing pipelines by automatically generating argument values from high-level template variables. Instead of manually specifying all file paths, users provide just the essential inputs and the system uses templates to construct the full paths.

This feature is particularly useful for:

  • Simplifying user workflows: Users don’t need to remember all argument names and path structures

  • Standardizing project layouts: Enforces consistent directory structures across runs

  • GUI/Form-based deployment: Template-based workflows are easier to expose through web interfaces

  • Batch processing: Reduces errors from manual path construction when processing multiple samples

Quick Start#

First, define templates in your pipeline YAML:

info:
  description: My bioinformatics pipeline
  api: 2.1.0

  template_paths:
    output_bam:
      template: "%(project_dir)s/%(sample_name)s/bam/output.bam"
      type: file
    output_dir:
      template: "%(project_dir)s/%(sample_name)s/qc"
      type: directory

  template_arguments:
    project_dir: Project root directory
    sample_name: Sample identifier or name

Then run the pipeline:

pype run my_pipeline \
  --project_dir /scratch/project \
  --sample_name sample_1 \
  --fastq1 /data/sample_1_R1.fastq.gz \
  --fastq2 /data/sample_1_R2.fastq.gz

Concepts#

Template Paths#

Template paths define how to generate argument values from template variables.

Structure:

template_paths:
  argument_name:
    template: "path/with/%(variable)s/placeholders"
    type: file|directory

Fields:

  • argument_name: The pipeline argument this template generates (e.g., output_bam, logs)

  • template: Template string with %(variable)s placeholders

  • type: Either file (creates parent directories) or directory (creates the path itself)

Example:

template_paths:
  bam_output:
    template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_aligned.bam"
    type: file
  qc_directory:
    template: "%(project_dir)s/%(sample_name)s/qc"
    type: directory
  logs:
    template: "%(project_dir)s/%(sample_name)s/logs"
    type: directory

Template Arguments#

Template arguments provide user-friendly descriptions for template variables shown in help.

Structure:

template_arguments:
  variable_name: "User-friendly description"

Example:

template_arguments:
  project_dir: "Project root directory (will create subdirectories)"
  sample_name: "Sample identifier or name"
  profile_name: "Pipeline profile name"  # Not needed - auto-injected

Template Variables#

Variables that can be used in template strings come from two sources:

  1. User-provided arguments: Any argument not in template_paths is treated as a template variable - Example: --project_dir, --sample_name, --fastq1, --fastq2

  2. Auto-injected variables: Automatically provided by the system - profile_name: The current profile being used (you don’t need to provide this)

Usage in templates:

template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_%(profile_name)s_aligned.bam"
# Uses: project_dir, sample_name, profile_name

Directory Creation#

Before pipeline execution, pype run automatically creates all required directories:

  • For ``type: directory``: Creates the path directly with mkdir -p

  • For ``type: file``: Creates parent directories with mkdir -p $(dirname path)

Example:

template_paths:
  bam_output:
    template: "%(project_dir)s/%(sample_name)s/bam/output.bam"
    type: file
  logs:
    template: "%(project_dir)s/%(sample_name)s/logs"
    type: directory

Running with:

pype run pipeline \
  --project_dir /scratch/project \
  --sample_name sample1

Results in directories:

/scratch/project/sample1/bam/         (created for file type)
/scratch/project/sample1/logs/        (created for directory type)

Complete Example#

Pipeline YAML:

info:
  description: GATK data processing pipeline with alignment
  api: 2.1.0

  arguments:
    fastq1: First mate fastQ file
    fastq2: Second mate fastQ file
    tmp_dir: Temporary directory

  defaults:
    tmp_dir: /scratch

  template_paths:
    bam_markdups:
      template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_%(profile_name)s_markdups.bam"
      type: file
    bam_recalibrated:
      template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_%(profile_name)s_markdups_recal.bam"
      type: file
    recalibration_table:
      template: "%(project_dir)s/%(sample_name)s/tables/%(sample_name)s_%(profile_name)s_recalibration_table.txt"
      type: file
    dup_metrics:
      template: "%(project_dir)s/%(sample_name)s/tables/%(sample_name)s_%(profile_name)s_dup_metrics.txt"
      type: file
    out_qc:
      template: "%(project_dir)s/%(sample_name)s/qc"
      type: directory
    logs:
      template: "%(project_dir)s/%(sample_name)s/logs"
      type: directory

  template_arguments:
    project_dir: "Project root directory"
    sample_name: "Sample identifier or name"

steps:
  step_align:
    name: gatk_bwa_mem_gpu
    type: snippet
    depends_on: []
    arguments:
      --out: "%(bam_markdups)s"
      --sample-name: "%(sample_name)s"
      --f1: "%(fastq1)s"
      --f2: "%(fastq2)s"
      --dup-metrics: "%(dup_metrics)s"
      --base-recal: "%(recalibration_table)s"
      --qc: "%(out_qc)s"
      --tmp: "%(tmp_dir)s"

  step_recalibrate:
    name: gatk_apply_BSQR
    type: snippet
    depends_on: [step_align]
    arguments:
      -i: "%(bam_markdups)s"
      -o: "%(bam_recalibrated)s"
      -r: "%(recalibration_table)s"

Show help:

$ pype run my_pipeline

usage: pype run my_pipeline --fastq1 FASTQ1 --fastq2 FASTQ2
                           --project_dir PROJECT_DIR --sample_name SAMPLE_NAME
                           [--tmp_dir TMP_DIR]

GATK data processing pipeline with alignment

Template Variables:
  Variables used to generate paths from templates

  --fastq1 FASTQ1           First mate fastQ file
  --fastq2 FASTQ2           Second mate fastQ file
  --project_dir PROJECT_DIR
                            Project root directory
  --sample_name SAMPLE_NAME
                            Sample identifier or name

Optional:
  Optional arguments with defaults

  --tmp_dir TMP_DIR         Temporary directory. Default: /scratch

Execute pipeline:

pype run --queue slurm --run-name "Align sample1" my_pipeline \
  --project_dir /scratch/project \
  --sample_name sample1 \
  --fastq1 /data/sample1_R1.fastq.gz \
  --fastq2 /data/sample1_R2.fastq.gz

What happens:

  1. Templates are resolved: - bam_markdups/scratch/project/sample1/bam/sample1_hg38_markdups.bam - out_qc/scratch/project/sample1/qc - logs/scratch/project/sample1/logs

  2. Directories are created: - /scratch/project/sample1/bam/ (for file type) - /scratch/project/sample1/qc/ (for directory type) - /scratch/project/sample1/logs/ (for directory type)

  3. Pipeline executes with resolved arguments: - --out /scratch/project/sample1/bam/sample1_hg38_markdups.bam - --qc /scratch/project/sample1/qc - etc.

Command-line Options#

pype run supports the same global options as pype pipelines:

pype [--profile PROFILE] run [--queue QUEUE] [--log LOG] [--run-name RUN_NAME] \
     PIPELINE [ARGUMENTS...]

Common options:

  • --profile PROFILE: Profile to use (default: configured default)

  • --queue QUEUE: Queue system (default: configured default)

  • --log LOG: Log directory (can use template if defined in template_paths)

  • --run-name RUN_NAME: Display name for this run (useful for tracking)

Examples:

# Use SLURM queue
pype run --queue slurm my_pipeline --project_dir /proj --sample_name s1 ...

# Use dry_run to preview without execution
pype run --queue dry_run my_pipeline --project_dir /proj --sample_name s1 ...

# Specify custom log directory
pype run --log /custom/logs my_pipeline --project_dir /proj --sample_name s1 ...

# Add descriptive run name
pype run --run-name "Align batch_1" my_pipeline --project_dir /proj --sample_name s1 ...

Comparison with pype pipelines#

Before (pype pipelines):

pype pipelines my_pipeline \
  --bam_markdups /scratch/project/sample1/bam/sample1_hg38_markdups.bam \
  --bam_recalibrated /scratch/project/sample1/bam/sample1_hg38_markdups_recal.bam \
  --recalibration_table /scratch/project/sample1/tables/sample1_hg38_recalibration_table.txt \
  --dup_metrics /scratch/project/sample1/tables/sample1_hg38_dup_metrics.txt \
  --out_qc /scratch/project/sample1/qc \
  --fastq1 /data/sample1_R1.fastq.gz \
  --fastq2 /data/sample1_R2.fastq.gz

After (pype run):

pype run my_pipeline \
  --project_dir /scratch/project \
  --sample_name sample1 \
  --fastq1 /data/sample1_R1.fastq.gz \
  --fastq2 /data/sample1_R2.fastq.gz

Benefits of pype run:

  • ✅ Fewer arguments to specify

  • ✅ Less error-prone path construction

  • ✅ Automatic directory creation

  • ✅ Consistent directory structure across runs

  • ✅ Easier to expose through web interfaces

  • ✅ Clearer help showing only relevant inputs

Advanced Features#

Argument Types#

Non-template arguments preserve their full type information:

  • Boolean flags with action: store_true work as expected

  • Multiple values (nargs) are supported

  • Type conversions (int, float, etc.) are handled by the pipeline

  • Custom argument handlers (composite_arg, batch_arg, etc.) work normally

Example:

arguments:
  --exact_match:
    value: "%(exact_match)s"
    action: store_true
  --threads:
    value: "%(threads)s"
    type: int

Command line:

pype run pipeline --exact_match --threads 8 ...

Nested Pipelines#

pype run works with nested pipelines and complex dependencies exactly like pype pipelines:

steps:
  step_1_align:
    name: alignment_snippet
    type: snippet
    depends_on: []
    arguments:
      --bam: "%(bam_output)s"
      ...

  step_2_qc:
    name: qc_pipeline
    type: pipeline
    depends_on: [step_1_align]
    arguments:
      --bam: "%(bam_output)s"
      --qc_dir: "%(qc_output)s"

Troubleshooting#

Pipeline not shown in help

Only pipelines with template_paths defined are shown by pype run. If a pipeline doesn’t appear, add the template_paths section to its YAML.

Missing template variable error

KeyError: Template for 'bam_output' uses undefined variable: project_dir

This means a template uses a variable that wasn’t provided. Make sure to include all required template variables on the command line.

Directories not created

Verify that template_paths has the correct type field (file or directory). Check file permissions in the parent directory.

Wrong paths generated

Double-check the template strings in template_paths. Use pype run pipeline --help to see which variables are being collected.

See Also#