Understanding Bio_pype Logs#

Log Structure#

Bio_pype organizes logs hierarchically in PYPE_LOGDIR (default: ~/.bio_pype/logs). Each pipeline run creates a unique log folder containing runtime state and nested logs:

~/.bio_pype/logs/
└── 251114144825.886245_6TEJ_genomic_analysis/    # Main run: timestamp_runid_name
    ├── 251114144825.886245_6TEJ_genomic_analysis.log  # Main pipeline log
    ├── pipeline_runtime.yaml                      # Runtime state (jobs, status)
    ├── pipeline_runtime.yaml.lock                 # Lock file for concurrent access
    ├── profile.yaml                               # Profile snapshot
    ├── parallel_run/                              # Queue-specific directory
    │   └── parallel_run.log
    ├── 251114144826_XXXX_align_reads/             # Nested pipeline step
    │   ├── 251114144826_XXXX_align_reads.log
    │   ├── stdout                                 # Job stdout
    │   ├── stderr                                 # Job stderr
    │   └── align_reads/                           # Snippet outputs
    │       ├── align_reads.log
    │       └── profile.yaml
    └── 251114144827_YYYY_sort_bam/                # Another nested step
        └── (similar structure)

Key components:

  • Run directory name: <timestamp>_<run_id>_<pipeline_name>

  • Main log: <timestamp>_<run_id>_<pipeline_name>.log

  • Runtime state: pipeline_runtime.yaml tracks all jobs and their status

  • Profile snapshot: profile.yaml preserves profile used for this run

  • Nested pipelines: Steps that are pipelines get their own subdirectories

  • Queue directories: Queue systems may create working directories (e.g., parallel_run/)

Log File Types#

pipeline.log#

Contains overall pipeline execution information: - Arguments and configuration used - Step execution order - Dependencies between steps - Resource allocations - Final status

Example pipeline.log:

2023-12-05 14:30:22 INFO: Starting pipeline rev_compl_low_fa
2023-12-05 14:30:22 INFO: Using profile: local
2023-12-05 14:30:23 INFO: Submitting step reverse_fa to queue slurm
2023-12-05 14:35:45 INFO: Step reverse_fa completed successfully
...

job.log#

Contains snippet-specific information: - Input validation - Command execution - Output generation - Resource usage

Example job.log:

2023-12-05 14:30:23 INFO: Processing input file: sample1.fa
2023-12-05 14:30:23 INFO: Command: reverse_fa -i sample1.fa -o reversed.fa
2023-12-05 14:30:24 INFO: Generated output: reversed.fa
2023-12-05 14:30:24 INFO: Peak memory usage: 2.1GB

queue.log (.out / .err files)#

Contains queue system output: - Job submission details - Resource allocation - Error messages - Exit codes

Example stdout file (SLURM):

Submitted batch job 123456
slurmstepd: Job 123456 started on node034
...
slurmstepd: Job 123456 completed with exit code 0

pipeline_runtime.yaml#

Tracks pipeline execution state with job metadata and status.

Location: <run_directory>/pipeline_runtime.yaml

Contains: - Job status tracking (pending, running, completed, failed) - Queue IDs for submitted jobs - Execution timestamps - Job dependencies - Resource requirements - Pipeline metadata and environment variables

Example runtime file:

I8R579WD35:
  command: python -m pype.commands snippets test_step1 -i input.txt -o output.txt
  name: test_step1
  status: completed
  completed_at: '2025-11-14T14:48:29.865717'
  dependencies: []
  requirements:
    mem: 1gb
    ncpu: 1
    time: 00:02:00

YT0IGXG2FZ:
  command: python -m pype.commands snippets test_step2 -i output.txt
  name: test_step2
  status: running
  dependencies:
  - I8R579WD35
  requirements:
    mem: 1gb
    ncpu: 1
    time: 00:03:00

__pipeline_environment__:
  PYPE_MODULES: /path/to/modules
  PYPE_LOGDIR: /path/to/logs

__pipeline_metadata__:
  log_directory: /Users/user/.bio_pype/logs/251114144825_6TEJ_pipeline
  pipeline_name: genomic_analysis

Usage: The pype resume command uses this file to continue interrupted pipelines. See Pipeline Resume for details.

Useful Log Commands#

List all pipeline runs:

ls -d ~/.bio_pype/logs/*/

Find recent pipeline runs:

ls -ltr ~/.bio_pype/logs/

View runtime state for a specific run:

cat ~/.bio_pype/logs/251114144825.886245_6TEJ_genomic_analysis/pipeline_runtime.yaml

Monitor active pipeline:

tail -f ~/.bio_pype/logs/<run_directory>/<run_id>_<pipeline_name>.log

Check job statuses:

grep "status:" ~/.bio_pype/logs/251114144825_6TEJ_genomic_analysis/pipeline_runtime.yaml

Count completed jobs:

grep "status: completed" pipeline_runtime.yaml | wc -l

View pipeline metadata:

grep -A 5 "__pipeline_metadata__" pipeline_runtime.yaml

View environment used:

grep -A 10 "__pipeline_environment__" pipeline_runtime.yaml

Find all failed jobs across runs:

grep -r "status: failed" ~/.bio_pype/logs/

Check failed jobs in logs:

grep -r "ERROR" ~/.bio_pype/logs/*/

Get resource usage:

grep "memory usage" ~/.bio_pype/logs/*/*.log

Debug Tips#

  1. Start with pipeline.log for overall status

  2. Check job.log for specific step failures

  3. See queue.log for system-level issues

  4. Use PYPE_DEBUG=1 for verbose logging:

    export PYPE_DEBUG=1
    pype pipelines my_pipeline ...
    

Common Issues#

Memory Errors:

2023-12-05 14:35:45 ERROR: MemoryError in step bwa_mem
2023-12-05 14:35:45 INFO: Peak memory: 32GB, Allocated: 16GB

Solution: Increase memory in snippet requirements or profile

Queue Timeouts:

2023-12-05 15:00:00 ERROR: Job exceeded walltime limit

Solution: Adjust walltime in queue configuration