Pipeline Resume#
Bio_pype provides a dedicated resume command to continue previously-started pipelines from
their runtime YAML files. The resume command automatically restores the pipeline environment
and continues execution from where it left off.
Overview#
The resume functionality enables:
Automatic continuation: Resume interrupted pipelines with a single command
Environment restoration: Automatically restores all PYPE_* environment variables
Status inspection: Check pipeline status without executing
Selective re-execution: Re-run failed jobs or force re-run all jobs
Queue override: Change queue system when resuming
How Resume Works#
Pipeline Runtime Tracking#
Each pipeline run creates a pipeline_runtime.yaml file in its log directory that tracks:
Job status: Current state of each job (pending, running, completed, failed)
Pipeline metadata: Run name, pipeline name, submission time, run ID
Environment variables: All PYPE_* configuration used for the run
Job details: Commands, queue IDs, timestamps, log paths
Example runtime YAML location:
/path/to/logs/251112224941_genomic_analysis/
├── pipeline_runtime.yaml ← Resume from this file
├── genomic_analysis.log
├── align_reads.out
└── sort_bam.err
Resume Process#
When you resume a pipeline:
Runtime YAML is read to extract environment and metadata
All PYPE_* environment variables are restored
Job statuses are checked to determine what needs to run
Queue system’s
post_runmethod continues executionOnly incomplete jobs are executed (completed jobs are skipped)
Basic Resume Workflow#
# Start pipeline
$ pype pipeline --queue slurm genomic_analysis --input sample1.fq
# Pipeline runs, creates: logs/251112224941_genomic_analysis/pipeline_runtime.yaml
# Job 1/5 completed
# Job 2/5 completed
# [Interrupted by Ctrl+C, system crash, cluster maintenance, etc.]
# Resume from the runtime YAML
$ pype resume logs/251112224941_genomic_analysis/pipeline_runtime.yaml
# Automatically restores environment
# Continues from job 3 (jobs 1-2 already completed)
# Job 3/5 running...
# Job 4/5 running...
# Job 5/5 running...
Command Line Usage#
Basic Syntax#
pype resume <runtime_yaml> [options]
Required Arguments:
runtime_yaml: Path to thepipeline_runtime.yamlfile from a previous run
Optional Arguments:
--queue QUEUE: Override the original queue system--status: Print pipeline status and exit (no execution)--force-errors: Re-run failed jobs--force-all: Re-run all jobs regardless of status
Command Line Options#
–status: Check Pipeline Status#
Print a summary of the pipeline status without executing:
$ pype resume --status logs/251112224941_genomic_analysis/pipeline_runtime.yaml
Example output:
================================================================================
Pipeline Status Summary
================================================================================
Run Name: sample1_analysis
Pipeline: genomic_analysis
Submitted: 2025-01-15 10:00:00
Queue: slurm
Run ID: 251112224941
Log: /path/to/logs/251112224941_genomic_analysis
--------------------------------------------------------------------------------
Total jobs: 10
Completed : 7 ( 70.0%)
Running : 1 ( 10.0%)
Pending : 2 ( 20.0%)
Failed : 0 ( 0.0%)
================================================================================
–queue: Override Queue System#
Change the queue system when resuming:
$ pype resume --queue local logs/251112224941_genomic_analysis/pipeline_runtime.yaml
When to use:
Debug locally after cluster interruption
Switch from SLURM to PBS
Run remaining jobs without queue system
Default: Uses the original queue system from pipeline metadata
–force-errors: Re-run Failed Jobs#
Reset failed jobs to pending and re-execute them:
$ pype resume --force-errors logs/251112224941_genomic_analysis/pipeline_runtime.yaml
Effect:
All jobs with
status: failedare reset tostatus: pendingCompleted and running jobs are untouched
Pipeline resumes and re-executes the failed jobs
When to use:
Transient failures (network issues, temp files)
After fixing input data or configuration
Cluster node failures
–force-all: Re-run All Jobs#
Reset all jobs to pending and re-execute the entire pipeline:
$ pype resume --force-all logs/251112224941_genomic_analysis/pipeline_runtime.yaml
Effect:
All jobs are reset to
status: pendingEverything runs again from scratch
Original environment is preserved
When to use:
Complete pipeline re-execution needed
Testing after significant changes
Regenerating all outputs
–sync: Reconcile Without Cancelling#
Reconcile the runtime YAML with the actual queue and log state, without cancelling any jobs, then continue:
$ pype resume --sync logs/251112224941_genomic_analysis/pipeline_runtime.yaml
This is the key difference from a normal resume. A plain pype resume assumes
the previously running jobs are stale and cancels any running /
submitted jobs before resubmitting them, to avoid duplicate execution. That
is wrong when the jobs are in fact still alive — for example when only the
coordinator process died (a wall-time kill, a dropped SSH session, a crashed
login node) while the queued jobs kept running normally.
--sync handles exactly that case. It:
Bulk-queries the queue handler for the true state of every non-completed job (
get_all_job_states), falling back to per-job log inspection when the handler cannot answer.Updates each job’s status in the YAML —
completed,failed,runningorsubmitted— reconstructingstarted_at/completed_atfrom the snippet logs where the metadata is missing.Writes the reconciled YAML and resumes, picking up genuinely pending work while leaving the still-running jobs untouched.
Effect:
No job is cancelled; in-flight jobs continue running in the scheduler
Already-completed jobs and their resource timelines are preserved as-is
Only the coordinator’s view of the world is rebuilt from ground truth
When to use:
The coordinator hit its wall-time limit but the worker jobs were still running
A pipeline was interrupted at the driver level (network/login-node loss)
The runtime YAML has drifted from the real queue state and you want it re-synced before continuing
--sync can be combined with --queue and works with any queue handler;
handlers that implement get_all_job_states give the fastest, most accurate
reconciliation.
Environment Restoration#
Automatic Variable Restoration#
The resume command automatically restores all PYPE_* environment variables from the
__pipeline_environment__ section of the runtime YAML. This ensures the resumed pipeline
uses the exact same configuration as the original run.
Restored variables include:
PYPE_MODULES: Module path (snippets, pipelines, profiles, queues)PYPE_LOGDIR: Log directory locationPYPE_TMP: Temporary directoryPYPE_NCPU: CPU limitsPYPE_MEM: Memory limitsAnd any other PYPE_* variables
Example:
If the original pipeline was run with PYPE_MODULES=custom_modules, the resume command
automatically sets this environment variable before continuing execution.
Why This Matters#
Environment restoration is critical for:
Module consistency: Ensures the same snippets/queues are used
Path consistency: Finds resources in the same locations
Configuration consistency: Uses the same limits and settings
Reproducibility: Guarantees identical execution environment
Usage Examples#
Example 1: Basic Resume After Interruption#
# Start pipeline
$ pype pipeline --queue slurm genomic_analysis --input sample1.fq
# Creates: logs/251112224941_genomic_analysis/pipeline_runtime.yaml
# Job 1/5 completed
# Job 2/5 completed
# [Interrupted - Ctrl+C, system crash, cluster downtime]
# Resume the pipeline
$ pype resume logs/251112224941_genomic_analysis/pipeline_runtime.yaml
# Restored 5 environment variable(s) from pipeline runtime
# INFO: Resuming from: logs/251112224941_genomic_analysis/pipeline_runtime.yaml
# INFO: Using queue: slurm
# Continues from job 3...
Example 2: Check Status Before Resuming#
# Check what's been completed
$ pype resume --status logs/251112224941_genomic_analysis/pipeline_runtime.yaml
================================================================================
Pipeline Status Summary
================================================================================
Run Name: sample1_genomic_analysis
Pipeline: genomic_analysis
Total jobs: 10
Completed : 7 ( 70.0%)
Pending : 3 ( 30.0%)
================================================================================
# Now resume if needed
$ pype resume logs/251112224941_genomic_analysis/pipeline_runtime.yaml
Example 3: Re-run Failed Jobs#
# Check status to see failures
$ pype resume --status logs/251112224941_genomic_analysis/pipeline_runtime.yaml
Total jobs: 10
Completed : 8 ( 80.0%)
Failed : 2 ( 20.0%)
# Re-run only the failed jobs
$ pype resume --force-errors logs/251112224941_genomic_analysis/pipeline_runtime.yaml
# INFO: Reset 2 job(s) to pending status
# Executes only the 2 failed jobs
Example 4: Switch Queue System#
# Original run was on SLURM, but cluster is down
$ pype resume --queue local logs/251112224941_genomic_analysis/pipeline_runtime.yaml
# INFO: Using queue: local
# Runs remaining jobs locally instead of on SLURM
Example 5: Complete Re-execution#
# Need to regenerate all outputs after fixing an issue
$ pype resume --force-all logs/251112224941_genomic_analysis/pipeline_runtime.yaml
# INFO: Reset 10 job(s) to pending status
# Re-executes the entire pipeline from start to finish
Queue System Integration#
The resume command works with all queue systems by calling their post_run method:
SLURM (
--queue slurm): Monitors job queue and continues executionPBS/Torque (
--queue pbs): Monitors job queue and continues executionLocal (
--queue local): Runs remaining jobs locally without queueingNone (
--queue none): Direct execution without queue system
The queue system can be overridden using --queue to switch between systems when resuming.
Inspecting Runtime Files#
Runtime YAML files can be inspected directly:
$ cat logs/251112224941_genomic_analysis/pipeline_runtime.yaml
The file contains job statuses, pipeline metadata (__pipeline_metadata__),
and environment variables (__pipeline_environment__).
See Understanding Bio_pype Logs for the complete runtime YAML structure and examples.
Troubleshooting#
Runtime YAML Not Found#
Symptom: FileNotFoundError: Runtime YAML not found
Solutions:
Verify the file path is correct:
$ ls logs/251112224941_genomic_analysis/pipeline_runtime.yaml
Check you’re in the correct directory
Use absolute path if relative path doesn’t work:
$ pype resume /full/path/to/logs/251112224941_genomic_analysis/pipeline_runtime.yaml
Environment Variables Not Restored#
Symptom: Pipeline behaves differently than original run
Cause: Missing __pipeline_environment__ section in runtime YAML
Solutions:
Check runtime YAML contains environment section:
$ grep -A 5 "__pipeline_environment__" pipeline_runtime.yaml
Manually set environment variables before resuming:
$ export PYPE_MODULES=/path/to/modules $ pype resume pipeline_runtime.yaml
Queue System Mismatch#
Symptom: Queue system not found in metadata
Solutions:
Specify queue explicitly with
--queue:$ pype resume --queue slurm pipeline_runtime.yaml
Check metadata in runtime YAML:
$ grep "queue_system" pipeline_runtime.yaml
Jobs Still Show as Running#
Symptom: Jobs stuck in “running” status but actually completed/failed
Solutions:
Check actual job status in queue system:
$ squeue -u $USER # SLURM $ qstat -u $USER # PBS/Torque
Manually update status in YAML if jobs are dead:
# Edit pipeline_runtime.yaml # Change: status: running # To: status: failed (or pending to retry)
Use
--force-errorsor--force-allto reset statuses
YAML Parsing Errors#
Symptom: Failed to parse runtime YAML
Solutions:
Validate YAML syntax:
$ python -c "import yaml; yaml.safe_load(open('pipeline_runtime.yaml'))"Check for special characters in job commands that need quoting
Restore from backup if available
Post-run Method Not Found#
Symptom: Queue module does not have post_run method
Cause: Custom queue module missing required method
Solutions:
Verify queue module exists:
$ ls $PYPE_MODULES/queues/
Check queue module has
post_runfunctionUse different queue system:
$ pype resume --queue local pipeline_runtime.yaml
Best Practices#
Use –status first: Check pipeline status before resuming to understand what needs to run:
$ pype resume --status pipeline_runtime.yaml
Keep runtime YAML files: Don’t delete pipeline_runtime.yaml until you’re certain the run is complete and you won’t need to resume.
Backup long-running pipelines: For critical or long-running pipelines, periodically backup the runtime YAML file:
$ cp logs/251112224941_analysis/pipeline_runtime.yaml backups/
Environment consistency: The resume command automatically restores environment variables, ensuring consistent execution. Don’t manually override unless necessary.
Use –force-errors for transient failures: If jobs failed due to temporary issues (network, disk), use
--force-errorsto retry only the failed jobs.Use –force-all sparingly: Only use
--force-allwhen you truly need to regenerate all outputs. It will re-execute everything, wasting time on already-completed work.Archive completed runs: Once a pipeline completes successfully, move the entire log directory to an archive location:
$ mv logs/251112224941_analysis /archive/completed_runs/
Check queue status manually: If resume seems stuck, check the queue system directly to see if jobs are actually running:
$ squeue -u $USER # SLURM $ qstat -u $USER # PBS/Torque
Don’t manually edit runtime YAML: Manual edits can cause inconsistencies. Use the command-line flags (–force-errors, –force-all) instead.
See Also#
Progress Tracking - Progress tracking API and internals
Pipelines - Pipeline definition and execution
Understanding Bio_pype Logs - Understanding Bio_pype logs
Queue Systems - Queue system integration