Pipelines Run ===
The pype run command provides a simplified interface for executing pipelines by automatically generating argument values from high-level template variables. Instead of manually specifying all file paths, users provide just the essential inputs and the system uses templates to construct the full paths.
This feature is particularly useful for:
Simplifying user workflows: Users don’t need to remember all argument names and path structures
Standardizing project layouts: Enforces consistent directory structures across runs
GUI/Form-based deployment: Template-based workflows are easier to expose through web interfaces
Batch processing: Reduces errors from manual path construction when processing multiple samples
Quick Start#
First, define templates in your pipeline YAML:
info:
description: My bioinformatics pipeline
api: 2.1.0
template_paths:
output_bam:
template: "%(project_dir)s/%(sample_name)s/bam/output.bam"
type: file
output_dir:
template: "%(project_dir)s/%(sample_name)s/qc"
type: directory
template_arguments:
project_dir: Project root directory
sample_name: Sample identifier or name
Then run the pipeline:
pype run my_pipeline \
--project_dir /scratch/project \
--sample_name sample_1 \
--fastq1 /data/sample_1_R1.fastq.gz \
--fastq2 /data/sample_1_R2.fastq.gz
Concepts#
Template Paths#
Template paths define how to generate argument values from template variables.
Structure:
template_paths:
argument_name:
template: "path/with/%(variable)s/placeholders"
type: file|directory
Fields:
argument_name: The pipeline argument this template generates (e.g.,output_bam,logs)template: Template string with%(variable)splaceholderstype: Eitherfile(creates parent directories) ordirectory(creates the path itself)
Example:
template_paths:
bam_output:
template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_aligned.bam"
type: file
qc_directory:
template: "%(project_dir)s/%(sample_name)s/qc"
type: directory
logs:
template: "%(project_dir)s/%(sample_name)s/logs"
type: directory
Template Arguments#
Template arguments provide user-friendly descriptions for template variables shown in help.
Structure:
template_arguments:
variable_name: "User-friendly description"
Example:
template_arguments:
project_dir: "Project root directory (will create subdirectories)"
sample_name: "Sample identifier or name"
profile_name: "Pipeline profile name" # Not needed - auto-injected
Template Variables#
Variables that can be used in template strings come from two sources:
User-provided arguments: Any argument not in
template_pathsis treated as a template variable - Example:--project_dir,--sample_name,--fastq1,--fastq2Auto-injected variables: Automatically provided by the system -
profile_name: The current profile being used (you don’t need to provide this)
Usage in templates:
template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_%(profile_name)s_aligned.bam"
# Uses: project_dir, sample_name, profile_name
Directory Creation#
Before pipeline execution, pype run automatically creates all required directories:
For ``type: directory``: Creates the path directly with
mkdir -pFor ``type: file``: Creates parent directories with
mkdir -p $(dirname path)
Example:
template_paths:
bam_output:
template: "%(project_dir)s/%(sample_name)s/bam/output.bam"
type: file
logs:
template: "%(project_dir)s/%(sample_name)s/logs"
type: directory
Running with:
pype run pipeline \
--project_dir /scratch/project \
--sample_name sample1
Results in directories:
/scratch/project/sample1/bam/ (created for file type)
/scratch/project/sample1/logs/ (created for directory type)
Complete Example#
Pipeline YAML:
info:
description: GATK data processing pipeline with alignment
api: 2.1.0
arguments:
fastq1: First mate fastQ file
fastq2: Second mate fastQ file
tmp_dir: Temporary directory
defaults:
tmp_dir: /scratch
template_paths:
bam_markdups:
template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_%(profile_name)s_markdups.bam"
type: file
bam_recalibrated:
template: "%(project_dir)s/%(sample_name)s/bam/%(sample_name)s_%(profile_name)s_markdups_recal.bam"
type: file
recalibration_table:
template: "%(project_dir)s/%(sample_name)s/tables/%(sample_name)s_%(profile_name)s_recalibration_table.txt"
type: file
dup_metrics:
template: "%(project_dir)s/%(sample_name)s/tables/%(sample_name)s_%(profile_name)s_dup_metrics.txt"
type: file
out_qc:
template: "%(project_dir)s/%(sample_name)s/qc"
type: directory
logs:
template: "%(project_dir)s/%(sample_name)s/logs"
type: directory
template_arguments:
project_dir: "Project root directory"
sample_name: "Sample identifier or name"
steps:
step_align:
name: gatk_bwa_mem_gpu
type: snippet
depends_on: []
arguments:
--out: "%(bam_markdups)s"
--sample-name: "%(sample_name)s"
--f1: "%(fastq1)s"
--f2: "%(fastq2)s"
--dup-metrics: "%(dup_metrics)s"
--base-recal: "%(recalibration_table)s"
--qc: "%(out_qc)s"
--tmp: "%(tmp_dir)s"
step_recalibrate:
name: gatk_apply_BSQR
type: snippet
depends_on: [step_align]
arguments:
-i: "%(bam_markdups)s"
-o: "%(bam_recalibrated)s"
-r: "%(recalibration_table)s"
Show help:
$ pype run my_pipeline
usage: pype run my_pipeline --fastq1 FASTQ1 --fastq2 FASTQ2
--project_dir PROJECT_DIR --sample_name SAMPLE_NAME
[--tmp_dir TMP_DIR]
GATK data processing pipeline with alignment
Template Variables:
Variables used to generate paths from templates
--fastq1 FASTQ1 First mate fastQ file
--fastq2 FASTQ2 Second mate fastQ file
--project_dir PROJECT_DIR
Project root directory
--sample_name SAMPLE_NAME
Sample identifier or name
Optional:
Optional arguments with defaults
--tmp_dir TMP_DIR Temporary directory. Default: /scratch
Execute pipeline:
pype run --queue slurm --run-name "Align sample1" my_pipeline \
--project_dir /scratch/project \
--sample_name sample1 \
--fastq1 /data/sample1_R1.fastq.gz \
--fastq2 /data/sample1_R2.fastq.gz
What happens:
Templates are resolved: -
bam_markdups→/scratch/project/sample1/bam/sample1_hg38_markdups.bam-out_qc→/scratch/project/sample1/qc-logs→/scratch/project/sample1/logsDirectories are created: -
/scratch/project/sample1/bam/(for file type) -/scratch/project/sample1/qc/(for directory type) -/scratch/project/sample1/logs/(for directory type)Pipeline executes with resolved arguments: -
--out /scratch/project/sample1/bam/sample1_hg38_markdups.bam---qc /scratch/project/sample1/qc- etc.
Command-line Options#
pype run supports the same global options as pype pipelines:
pype [--profile PROFILE] run [--queue QUEUE] [--log LOG] [--run-name RUN_NAME] \
PIPELINE [ARGUMENTS...]
Common options:
--profile PROFILE: Profile to use (default: configured default)--queue QUEUE: Queue system (default: configured default)--log LOG: Log directory (can use template if defined intemplate_paths)--run-name RUN_NAME: Display name for this run (useful for tracking)
Examples:
# Use SLURM queue
pype run --queue slurm my_pipeline --project_dir /proj --sample_name s1 ...
# Use dry_run to preview without execution
pype run --queue dry_run my_pipeline --project_dir /proj --sample_name s1 ...
# Specify custom log directory
pype run --log /custom/logs my_pipeline --project_dir /proj --sample_name s1 ...
# Add descriptive run name
pype run --run-name "Align batch_1" my_pipeline --project_dir /proj --sample_name s1 ...
Comparison with pype pipelines#
Before (pype pipelines):
pype pipelines my_pipeline \
--bam_markdups /scratch/project/sample1/bam/sample1_hg38_markdups.bam \
--bam_recalibrated /scratch/project/sample1/bam/sample1_hg38_markdups_recal.bam \
--recalibration_table /scratch/project/sample1/tables/sample1_hg38_recalibration_table.txt \
--dup_metrics /scratch/project/sample1/tables/sample1_hg38_dup_metrics.txt \
--out_qc /scratch/project/sample1/qc \
--fastq1 /data/sample1_R1.fastq.gz \
--fastq2 /data/sample1_R2.fastq.gz
After (pype run):
pype run my_pipeline \
--project_dir /scratch/project \
--sample_name sample1 \
--fastq1 /data/sample1_R1.fastq.gz \
--fastq2 /data/sample1_R2.fastq.gz
Benefits of pype run:
✅ Fewer arguments to specify
✅ Less error-prone path construction
✅ Automatic directory creation
✅ Consistent directory structure across runs
✅ Easier to expose through web interfaces
✅ Clearer help showing only relevant inputs
Advanced Features#
Argument Types#
Non-template arguments preserve their full type information:
Boolean flags with
action: store_truework as expectedMultiple values (
nargs) are supportedType conversions (int, float, etc.) are handled by the pipeline
Custom argument handlers (composite_arg, batch_arg, etc.) work normally
Example:
arguments:
--exact_match:
value: "%(exact_match)s"
action: store_true
--threads:
value: "%(threads)s"
type: int
Command line:
pype run pipeline --exact_match --threads 8 ...
Nested Pipelines#
pype run works with nested pipelines and complex dependencies exactly like pype pipelines:
steps:
step_1_align:
name: alignment_snippet
type: snippet
depends_on: []
arguments:
--bam: "%(bam_output)s"
...
step_2_qc:
name: qc_pipeline
type: pipeline
depends_on: [step_1_align]
arguments:
--bam: "%(bam_output)s"
--qc_dir: "%(qc_output)s"
Troubleshooting#
Pipeline not shown in help
Only pipelines with template_paths defined are shown by pype run.
If a pipeline doesn’t appear, add the template_paths section to its YAML.
Missing template variable error
KeyError: Template for 'bam_output' uses undefined variable: project_dir
This means a template uses a variable that wasn’t provided. Make sure to include all required template variables on the command line.
Directories not created
Verify that template_paths has the correct type field (file or directory).
Check file permissions in the parent directory.
Wrong paths generated
Double-check the template strings in template_paths.
Use pype run pipeline --help to see which variables are being collected.
See Also#
Pipelines: Full pipeline configuration reference
Getting Started: Getting started with Bio-pype
Profiles: Profile configuration and management