.. index:: Progress Tracking .. _progress: Progress Tracking ================== Bio_pype includes a comprehensive progress tracking and display system that monitors pipeline execution and provides real-time visibility into job status. Overview -------- The Progress system provides: - **Real-time progress display**: Rich formatted tables showing job status - **Job state tracking**: Track jobs from pending through completion or failure - **Resource monitoring**: Display CPU, memory, and task statistics - **Progress statistics**: Track completion percentage and timing - **YAML-based persistence**: Save/load progress to ``pipeline_runtime.yaml`` - **Optional API integration**: Send progress to compute.bio for web monitoring Core Classes ------------ ProgressDisplay ^^^^^^^^^^^^^^^ The ``ProgressDisplay`` class handles visual formatting and display of pipeline progress. It works with the ResourceManager and TaskScheduler paradigm to show real-time job status. Basic usage:: from pype.utils.progress import ProgressDisplay progress = ProgressDisplay() # Display pipeline status with job table and statistics jobs = [(run_id, job_data) for run_id, job_data in scheduler.runtime.items()] stats = scheduler.get_stats() resource_stats = resource_manager.get_stats() progress.show_pipeline_status(jobs, stats, resource_stats) **Key Features:** - Rich formatted tables with ASCII box drawing - Color-coded job status (pending, submitted, running, completed, failed) - Progress percentage and job counts - Resource usage statistics (CPU, memory) - Automatic deduplication (only displays when state changes) PipelineProgress ^^^^^^^^^^^^^^^^ Dataclass for tracking overall pipeline statistics:: from pype.utils.progress import PipelineProgress progress = PipelineProgress( total_jobs=10, completed=7, running=2, submitted=0, pending=1, failed=0 ) print(progress.percent_complete) # 70.0 print(progress.summary) # [7/10] 70.0% complete | 2 running | 0 submitted | 1 pending | 0 failed **Properties:** - ``percent_complete``: Percentage of completed jobs - ``summary``: One-line text summary of progress JobProgress ^^^^^^^^^^^ Dataclass representing a single job for display:: from pype.utils.progress import JobProgress job = JobProgress( run_id="align_reads_abc123", friendly_name="Align sample 1", status="running", queue_id="12345" ) **Fields:** - ``run_id``: Unique job identifier - ``friendly_name``: Human-readable job name - ``status``: Current status (pending, submitted, running, completed, failed) - ``queue_id``: Optional queue system job ID Display Modes ------------- Progress display is automatically integrated into pipeline execution. The ``ProgressDisplay`` class provides two main display modes: Pipeline Status Table ^^^^^^^^^^^^^^^^^^^^^ Shows all jobs with detailed status information:: ┌─────────────────────────────────────────────────┐ │ Pipeline Job Status │ ├──────────────┬─────────────┬──────────┬─────────┤ │ Job ID │ Name │ Status │Queue ID │ ├──────────────┼─────────────┼──────────┼─────────┤ │ align_abc123 │ Align read │ running │ 12345 │ │ sort_def456 │ Sort BAM │ pending │ - │ │ index_ghi789 │ Index BAM │ pending │ - │ └──────────────┴─────────────┴──────────┴─────────┘ Progress: [1/3] 33.3% | 1 running | 0 submitted | 2 pending **Status Colors:** - **pending**: Dimmed (waiting to start) - **submitted**: Yellow (submitted to queue) - **running**: Blue (currently executing) - **completed**: Green (successfully finished) - **failed**: Red (execution error) Summary Statistics ^^^^^^^^^^^^^^^^^^ Displays overall progress and resource usage:: Progress: [5/10] 50.0% | 2 running | 0 submitted | 3 pending | 0 failed RESOURCE USAGE: used_mem: 8.25 GB max_mem: 32.00 GB cpu_percent: 45.2% active_tasks: 2 Pipeline Runtime Tracking -------------------------- Progress information is persisted in ``pipeline_runtime.yaml`` for each pipeline run. This YAML file stores job metadata, status, and execution details. YAML Structure ^^^^^^^^^^^^^^ Each job in the pipeline is tracked with the following information:: job_run_id: name: "Human-readable job name" status: "pending|submitted|running|completed|failed" command: "Full command being executed" queue_id: "12345" # Optional queue system job ID log_dir: "/path/to/logs" started_at: "2025-01-15T10:30:00" completed_at: "2025-01-15T10:45:00" **Example pipeline_runtime.yaml**:: __pipeline_metadata__: pipeline_name: "genomic_analysis" run_name: "sample1_analysis" run_hash: "abc123def456" created_at: "2025-01-15T10:00:00" __pipeline_environment__: profile: "hg38" queue_system: "slurm" align_reads_abc123: name: "Align reads" status: "completed" command: "bwa mem ref.fa sample1.fq" queue_id: "12345" started_at: "2025-01-15T10:05:00" completed_at: "2025-01-15T10:30:00" sort_bam_def456: name: "Sort BAM" status: "running" command: "samtools sort aligned.bam" queue_id: "12346" started_at: "2025-01-15T10:31:00" Resume Integration ^^^^^^^^^^^^^^^^^^ The progress system integrates with the resume functionality to enable continuing interrupted pipelines. See :ref:`resume` for details on using the resume command. Compute.bio API Integration ---------------------------- Bio_pype can optionally integrate with compute.bio API for web-based monitoring and remote control of pipeline execution. When configured, progress updates are automatically sent to the API. Configuration ^^^^^^^^^^^^^ Enable API integration by setting environment variables or adding to ``~/.bio_pype/config``:: COMPUTE_BIO_API_URL=https://api.compute.bio COMPUTE_BIO_TOKEN=your_api_token_here See :ref:`configuration` for details on compute.bio API configuration. Features ^^^^^^^^ When API integration is enabled: - **Automatic progress updates**: Pipeline status sent every 30 seconds (configurable) - **Worker registration**: Each pipeline run registers with unique worker ID - **Real-time job tracking**: Job status, queue IDs, and timestamps synced to API - **Log streaming**: Support for real-time log viewing through web interface - **Remote commands**: Receive commands from API (e.g., log requests) The API integration runs in background threads and does not block pipeline execution. Integration with Pipelines --------------------------- Progress display is automatically integrated into pipeline execution when using queue systems. The ``ProgressDisplay`` shows real-time updates as jobs are submitted, executed, and completed. To manually use progress display in custom scripts:: from pype.utils.progress import ProgressDisplay display = ProgressDisplay() # Show pipeline status display.show_pipeline_status(jobs, stats, resource_stats) # Show final summary display.show_summary( total_jobs=10, completed=9, failed=1, elapsed_time=3600.5 ) See Also -------- - :ref:`resume` - Using resume functionality from CLI - :ref:`pipelines` - Pipeline definition and execution - :ref:`queues` - Queue system integration - :ref:`logs` - Understanding Bio_pype logs