Progress Tracking#
Bio_pype includes a comprehensive progress tracking and display system that monitors pipeline execution and provides real-time visibility into job status.
Overview#
The Progress system provides:
Real-time progress display: Rich formatted tables showing job status
Job state tracking: Track jobs from pending through completion or failure
Resource monitoring: Display CPU, memory, and task statistics
Progress statistics: Track completion percentage and timing
YAML-based persistence: Save/load progress to
pipeline_runtime.yamlOptional API integration: Send progress to compute.bio for web monitoring
Core Classes#
ProgressDisplay#
The ProgressDisplay class handles visual formatting and display of pipeline progress.
It works with the ResourceManager and TaskScheduler paradigm to show real-time job status.
Basic usage:
from pype.utils.progress import ProgressDisplay
progress = ProgressDisplay()
# Display pipeline status with job table and statistics
jobs = [(run_id, job_data) for run_id, job_data in scheduler.runtime.items()]
stats = scheduler.get_stats()
resource_stats = resource_manager.get_stats()
progress.show_pipeline_status(jobs, stats, resource_stats)
Key Features:
Rich formatted tables with ASCII box drawing
Color-coded job status (pending, submitted, running, completed, failed)
Progress percentage and job counts
Resource usage statistics (CPU, memory)
Automatic deduplication (only displays when state changes)
PipelineProgress#
Dataclass for tracking overall pipeline statistics:
from pype.utils.progress import PipelineProgress
progress = PipelineProgress(
total_jobs=10,
completed=7,
running=2,
submitted=0,
pending=1,
failed=0
)
print(progress.percent_complete) # 70.0
print(progress.summary)
# [7/10] 70.0% complete | 2 running | 0 submitted | 1 pending | 0 failed
Properties:
percent_complete: Percentage of completed jobssummary: One-line text summary of progress
JobProgress#
Dataclass representing a single job for display:
from pype.utils.progress import JobProgress
job = JobProgress(
run_id="align_reads_abc123",
friendly_name="Align sample 1",
status="running",
queue_id="12345"
)
Fields:
run_id: Unique job identifierfriendly_name: Human-readable job namestatus: Current status (pending, submitted, running, completed, failed)queue_id: Optional queue system job ID
Display Modes#
Progress display is automatically integrated into pipeline execution. The ProgressDisplay class
provides two main display modes:
Pipeline Status Table#
Shows all jobs with detailed status information:
┌─────────────────────────────────────────────────┐
│ Pipeline Job Status │
├──────────────┬─────────────┬──────────┬─────────┤
│ Job ID │ Name │ Status │Queue ID │
├──────────────┼─────────────┼──────────┼─────────┤
│ align_abc123 │ Align read │ running │ 12345 │
│ sort_def456 │ Sort BAM │ pending │ - │
│ index_ghi789 │ Index BAM │ pending │ - │
└──────────────┴─────────────┴──────────┴─────────┘
Progress: [1/3] 33.3% | 1 running | 0 submitted | 2 pending
Status Colors:
pending: Dimmed (waiting to start)
submitted: Yellow (submitted to queue)
running: Blue (currently executing)
completed: Green (successfully finished)
failed: Red (execution error)
Summary Statistics#
Displays overall progress and resource usage:
Progress: [5/10] 50.0% | 2 running | 0 submitted | 3 pending | 0 failed
RESOURCE USAGE:
used_mem: 8.25 GB
max_mem: 32.00 GB
cpu_percent: 45.2%
active_tasks: 2
Pipeline Runtime Tracking#
Progress information is persisted in pipeline_runtime.yaml for each pipeline run.
This YAML file stores job metadata, status, and execution details.
YAML Structure#
Each job in the pipeline is tracked with the following information:
job_run_id:
name: "Human-readable job name"
status: "pending|submitted|running|completed|failed"
command: "Full command being executed"
queue_id: "12345" # Optional queue system job ID
log_dir: "/path/to/logs"
started_at: "2025-01-15T10:30:00"
completed_at: "2025-01-15T10:45:00"
Example pipeline_runtime.yaml:
__pipeline_metadata__:
pipeline_name: "genomic_analysis"
run_name: "sample1_analysis"
run_hash: "abc123def456"
created_at: "2025-01-15T10:00:00"
__pipeline_environment__:
profile: "hg38"
queue_system: "slurm"
align_reads_abc123:
name: "Align reads"
status: "completed"
command: "bwa mem ref.fa sample1.fq"
queue_id: "12345"
started_at: "2025-01-15T10:05:00"
completed_at: "2025-01-15T10:30:00"
sort_bam_def456:
name: "Sort BAM"
status: "running"
command: "samtools sort aligned.bam"
queue_id: "12346"
started_at: "2025-01-15T10:31:00"
Resume Integration#
The progress system integrates with the resume functionality to enable continuing interrupted pipelines. See Pipeline Resume for details on using the resume command.
Compute.bio API Integration#
Bio_pype can optionally integrate with compute.bio API for web-based monitoring and remote control of pipeline execution. When configured, progress updates are automatically sent to the API.
Configuration#
Enable API integration by setting environment variables or adding to ~/.bio_pype/config:
COMPUTE_BIO_API_URL=https://api.compute.bio
COMPUTE_BIO_TOKEN=your_api_token_here
See Configuration for details on compute.bio API configuration.
Features#
When API integration is enabled:
Automatic progress updates: Pipeline status sent every 30 seconds (configurable)
Worker registration: Each pipeline run registers with unique worker ID
Real-time job tracking: Job status, queue IDs, and timestamps synced to API
Log streaming: Support for real-time log viewing through web interface
Remote commands: Receive commands from API (e.g., log requests)
The API integration runs in background threads and does not block pipeline execution.
Integration with Pipelines#
Progress display is automatically integrated into pipeline execution when using queue systems.
The ProgressDisplay shows real-time updates as jobs are submitted, executed, and completed.
To manually use progress display in custom scripts:
from pype.utils.progress import ProgressDisplay
display = ProgressDisplay()
# Show pipeline status
display.show_pipeline_status(jobs, stats, resource_stats)
# Show final summary
display.show_summary(
total_jobs=10,
completed=9,
failed=1,
elapsed_time=3600.5
)
See Also#
Pipeline Resume - Using resume functionality from CLI
Pipelines - Pipeline definition and execution
Queue Systems - Queue system integration
Understanding Bio_pype Logs - Understanding Bio_pype logs