.. index:: Logs

.. _logs:

Understanding Bio_pype Logs
=========================

Log Structure
-----------

Bio_pype organizes logs hierarchically in PYPE_LOGDIR (default: ~/.bio_pype/logs).
Each pipeline run creates a unique log folder containing runtime state and nested logs::

    ~/.bio_pype/logs/
    └── 251114144825.886245_6TEJ_genomic_analysis/    # Main run: timestamp_runid_name
        ├── 251114144825.886245_6TEJ_genomic_analysis.log  # Main pipeline log
        ├── pipeline_runtime.yaml                      # Runtime state (jobs, status)
        ├── pipeline_runtime.yaml.lock                 # Lock file for concurrent access
        ├── profile.yaml                               # Profile snapshot
        ├── parallel_run/                              # Queue-specific directory
        │   └── parallel_run.log
        ├── 251114144826_XXXX_align_reads/             # Nested pipeline step
        │   ├── 251114144826_XXXX_align_reads.log
        │   ├── stdout                                 # Job stdout
        │   ├── stderr                                 # Job stderr
        │   └── align_reads/                           # Snippet outputs
        │       ├── align_reads.log
        │       └── profile.yaml
        └── 251114144827_YYYY_sort_bam/                # Another nested step
            └── (similar structure)

**Key components:**

- **Run directory name**: ``<timestamp>_<run_id>_<pipeline_name>``
- **Main log**: ``<timestamp>_<run_id>_<pipeline_name>.log``
- **Runtime state**: ``pipeline_runtime.yaml`` tracks all jobs and their status
- **Profile snapshot**: ``profile.yaml`` preserves profile used for this run
- **Nested pipelines**: Steps that are pipelines get their own subdirectories
- **Queue directories**: Queue systems may create working directories (e.g., ``parallel_run/``)

Log File Types
------------

pipeline.log
^^^^^^^^^^^
Contains overall pipeline execution information:
- Arguments and configuration used
- Step execution order
- Dependencies between steps
- Resource allocations
- Final status

Example pipeline.log::

    2023-12-05 14:30:22 INFO: Starting pipeline rev_compl_low_fa
    2023-12-05 14:30:22 INFO: Using profile: local
    2023-12-05 14:30:23 INFO: Submitting step reverse_fa to queue slurm
    2023-12-05 14:35:45 INFO: Step reverse_fa completed successfully
    ...

job.log
^^^^^^^
Contains snippet-specific information:
- Input validation
- Command execution
- Output generation
- Resource usage

Example job.log::

    2023-12-05 14:30:23 INFO: Processing input file: sample1.fa
    2023-12-05 14:30:23 INFO: Command: reverse_fa -i sample1.fa -o reversed.fa
    2023-12-05 14:30:24 INFO: Generated output: reversed.fa
    2023-12-05 14:30:24 INFO: Peak memory usage: 2.1GB

queue.log (.out / .err files)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Contains queue system output:
- Job submission details
- Resource allocation
- Error messages
- Exit codes

Example stdout file (SLURM)::

    Submitted batch job 123456
    slurmstepd: Job 123456 started on node034
    ...
    slurmstepd: Job 123456 completed with exit code 0

pipeline_runtime.yaml
^^^^^^^^^^^^^^^^^^^^^

Tracks pipeline execution state with job metadata and status.

**Location**: ``<run_directory>/pipeline_runtime.yaml``

Contains:
- Job status tracking (pending, running, completed, failed)
- Queue IDs for submitted jobs
- Execution timestamps
- Job dependencies
- Resource requirements
- Pipeline metadata and environment variables

Example runtime file::

    I8R579WD35:
      command: python -m pype.commands snippets test_step1 -i input.txt -o output.txt
      name: test_step1
      status: completed
      completed_at: '2025-11-14T14:48:29.865717'
      dependencies: []
      requirements:
        mem: 1gb
        ncpu: 1
        time: 00:02:00

    YT0IGXG2FZ:
      command: python -m pype.commands snippets test_step2 -i output.txt
      name: test_step2
      status: running
      dependencies:
      - I8R579WD35
      requirements:
        mem: 1gb
        ncpu: 1
        time: 00:03:00

    __pipeline_environment__:
      PYPE_MODULES: /path/to/modules
      PYPE_LOGDIR: /path/to/logs

    __pipeline_metadata__:
      log_directory: /Users/user/.bio_pype/logs/251114144825_6TEJ_pipeline
      pipeline_name: genomic_analysis

**Usage:** The ``pype resume`` command uses this file to continue interrupted pipelines.
See :ref:`resume` for details.

Useful Log Commands
----------------

List all pipeline runs::

    ls -d ~/.bio_pype/logs/*/

Find recent pipeline runs::

    ls -ltr ~/.bio_pype/logs/

View runtime state for a specific run::

    cat ~/.bio_pype/logs/251114144825.886245_6TEJ_genomic_analysis/pipeline_runtime.yaml

Monitor active pipeline::

    tail -f ~/.bio_pype/logs/<run_directory>/<run_id>_<pipeline_name>.log

Check job statuses::

    grep "status:" ~/.bio_pype/logs/251114144825_6TEJ_genomic_analysis/pipeline_runtime.yaml

Count completed jobs::

    grep "status: completed" pipeline_runtime.yaml | wc -l

View pipeline metadata::

    grep -A 5 "__pipeline_metadata__" pipeline_runtime.yaml

View environment used::

    grep -A 10 "__pipeline_environment__" pipeline_runtime.yaml

Find all failed jobs across runs::

    grep -r "status: failed" ~/.bio_pype/logs/

Check failed jobs in logs::

    grep -r "ERROR" ~/.bio_pype/logs/*/

Get resource usage::

    grep "memory usage" ~/.bio_pype/logs/*/*.log

Debug Tips
---------

1. Start with pipeline.log for overall status
2. Check job.log for specific step failures
3. See queue.log for system-level issues
4. Use PYPE_DEBUG=1 for verbose logging::

    export PYPE_DEBUG=1
    pype pipelines my_pipeline ...

Common Issues
-----------

Memory Errors::

    2023-12-05 14:35:45 ERROR: MemoryError in step bwa_mem
    2023-12-05 14:35:45 INFO: Peak memory: 32GB, Allocated: 16GB

Solution: Increase memory in snippet requirements or profile

Queue Timeouts::

    2023-12-05 15:00:00 ERROR: Job exceeded walltime limit

Solution: Adjust walltime in queue configuration