Progress Tracking#

Bio_pype includes a comprehensive progress tracking and display system that monitors pipeline execution and provides real-time visibility into job status.

Overview#

The Progress system provides:

Real-time progress display: Rich formatted tables showing job status
Job state tracking: Track jobs from pending through completion or failure
Resource monitoring: Display CPU, memory, and task statistics
Progress statistics: Track completion percentage and timing
YAML-based persistence: Save/load progress to pipeline_runtime.yaml
Optional API integration: Send progress to compute.bio for web monitoring

Core Classes#

ProgressDisplay#

The ProgressDisplay class handles visual formatting and display of pipeline progress. It works with the ResourceManager and TaskScheduler paradigm to show real-time job status.

Basic usage:

from pype.utils.progress import ProgressDisplay

progress = ProgressDisplay()

# Display pipeline status with job table and statistics
jobs = [(run_id, job_data) for run_id, job_data in scheduler.runtime.items()]
stats = scheduler.get_stats()
resource_stats = resource_manager.get_stats()

progress.show_pipeline_status(jobs, stats, resource_stats)

Key Features:

Rich formatted tables with ASCII box drawing
Color-coded job status (pending, submitted, running, completed, failed)
Progress percentage and job counts
Resource usage statistics (CPU, memory)
Automatic deduplication (only displays when state changes)

PipelineProgress#

Dataclass for tracking overall pipeline statistics:

from pype.utils.progress import PipelineProgress

progress = PipelineProgress(
    total_jobs=10,
    completed=7,
    running=2,
    submitted=0,
    pending=1,
    failed=0
)

print(progress.percent_complete)  # 70.0
print(progress.summary)
# [7/10] 70.0% complete | 2 running | 0 submitted | 1 pending | 0 failed

Properties:

percent_complete: Percentage of completed jobs
summary: One-line text summary of progress

JobProgress#

Dataclass representing a single job for display:

from pype.utils.progress import JobProgress

job = JobProgress(
    run_id="align_reads_abc123",
    friendly_name="Align sample 1",
    status="running",
    queue_id="12345"
)

Fields:

run_id: Unique job identifier
friendly_name: Human-readable job name
status: Current status (pending, submitted, running, completed, failed)
queue_id: Optional queue system job ID

Display Modes#

Progress display is automatically integrated into pipeline execution. The ProgressDisplay class provides two main display modes:

Pipeline Status Table#

Shows all jobs with detailed status information:

┌─────────────────────────────────────────────────┐
│         Pipeline Job Status                     │
├──────────────┬─────────────┬──────────┬─────────┤
│ Job ID       │ Name        │ Status   │Queue ID │
├──────────────┼─────────────┼──────────┼─────────┤
│ align_abc123 │ Align read  │ running  │ 12345   │
│ sort_def456  │ Sort BAM    │ pending  │ -       │
│ index_ghi789 │ Index BAM   │ pending  │ -       │
└──────────────┴─────────────┴──────────┴─────────┘

Progress: [1/3] 33.3% | 1 running | 0 submitted | 2 pending

Status Colors:

pending: Dimmed (waiting to start)
submitted: Yellow (submitted to queue)
running: Blue (currently executing)
completed: Green (successfully finished)
failed: Red (execution error)

Summary Statistics#

Displays overall progress and resource usage:

Progress: [5/10] 50.0% | 2 running | 0 submitted | 3 pending | 0 failed

RESOURCE USAGE:
  used_mem: 8.25 GB
  max_mem: 32.00 GB
  cpu_percent: 45.2%
  active_tasks: 2

Pipeline Runtime Tracking#

Progress information is persisted in pipeline_runtime.yaml for each pipeline run. This YAML file stores job metadata, status, and execution details.

YAML Structure#

Each job in the pipeline is tracked with the following information:

job_run_id:
  name: "Human-readable job name"
  status: "pending|submitted|running|completed|failed"
  command: "Full command being executed"
  queue_id: "12345"  # Optional queue system job ID
  log_dir: "/path/to/logs"
  started_at: "2025-01-15T10:30:00"
  completed_at: "2025-01-15T10:45:00"

Example pipeline_runtime.yaml:

__pipeline_metadata__:
  pipeline_name: "genomic_analysis"
  run_name: "sample1_analysis"
  run_hash: "abc123def456"
  created_at: "2025-01-15T10:00:00"

__pipeline_environment__:
  profile: "hg38"
  queue_system: "slurm"

align_reads_abc123:
  name: "Align reads"
  status: "completed"
  command: "bwa mem ref.fa sample1.fq"
  queue_id: "12345"
  started_at: "2025-01-15T10:05:00"
  completed_at: "2025-01-15T10:30:00"

sort_bam_def456:
  name: "Sort BAM"
  status: "running"
  command: "samtools sort aligned.bam"
  queue_id: "12346"
  started_at: "2025-01-15T10:31:00"

Resume Integration#

The progress system integrates with the resume functionality to enable continuing interrupted pipelines. See Pipeline Resume for details on using the resume command.

Compute.bio API Integration#

Bio_pype can optionally integrate with compute.bio API for web-based monitoring and remote control of pipeline execution. When configured, progress updates are automatically sent to the API.

Configuration#

Enable API integration by setting environment variables or adding to ~/.bio_pype/config:

COMPUTE_BIO_API_URL=https://api.compute.bio
COMPUTE_BIO_TOKEN=your_api_token_here

See Configuration for details on compute.bio API configuration.

Features#

When API integration is enabled:

Automatic progress updates: Pipeline status sent every 30 seconds (configurable)
Worker registration: Each pipeline run registers with unique worker ID
Real-time job tracking: Job status, queue IDs, and timestamps synced to API
Log streaming: Support for real-time log viewing through web interface
Remote commands: Receive commands from API (e.g., log requests)

The API integration runs in background threads and does not block pipeline execution.

Integration with Pipelines#

Progress display is automatically integrated into pipeline execution when using queue systems. The ProgressDisplay shows real-time updates as jobs are submitted, executed, and completed.

To manually use progress display in custom scripts:

from pype.utils.progress import ProgressDisplay

display = ProgressDisplay()

# Show pipeline status
display.show_pipeline_status(jobs, stats, resource_stats)

# Show final summary
display.show_summary(
    total_jobs=10,
    completed=9,
    failed=1,
    elapsed_time=3600.5
)