Configuration#

Configuration Overview#

Bio_pype uses a flexible configuration system that allows you to: - Customize module locations - Set system-specific parameters - Define execution environments - Manage resource limits

Module Paths#

By default, Bio_pype modules (snippets, pipelines, profiles, and queues) are installed in Python’s site-packages directory. However, you can customize these locations to: - Make modules easier to edit and maintain - Switch between different module sets (e.g., stable vs. development) - Share modules across users or projects

Configuration Methods#

1. Local Configuration File#

The primary configuration file is located at ~/.bio_pype/config. Example:

PYPE_TMP=/tmp
PYPE_LOGDIR=/tmp/logs
PYPE_SNIPPETS=~/bio_pype/snippets
PYPE_PIPELINES=~/bio_pype/pipelines

2. Environment Variables#

Environment variables override settings in the configuration file:

export PYPE_SNIPPETS=/custom/path/snippets
export PYPE_MEM="32GB"

Configuration Variables#

Variable

Description

PYPE_MODULES

Base path for all module types — sets snippets, pipelines, profiles, and queues subdirectories at once

PYPE_SNIPPETS

Path to snippet modules (overridden by PYPE_MODULES)

PYPE_PROFILES

Path to profile configurations (overridden by PYPE_MODULES)

PYPE_PIPELINES

Path to pipeline definitions (overridden by PYPE_MODULES)

PYPE_QUEUES

Path to queue system adapters (overridden by PYPE_MODULES)

PYPE_HOME

Base directory for Bio_pype state — config file, logs, caches (default: ~/.bio_pype)

PYPE_REGISTRY

Registry git URL or local path (default: https://codeberg.org/bio-pype/workflows-registry.git)

PYPE_NCPU

Maximum CPUs for parallel local execution

PYPE_GPU

Number of GPUs available for local execution

PYPE_NPU

Number of NPUs available for local execution

PYPE_MAX_JOBS_IN_QUEUE

Maximum number of jobs to keep in the queue at once

PYPE_MEM

Maximum memory for local execution

PYPE_TMP

Temporary directory (available as %(pype_tmp)s in snippets; default: /tmp)

PYPE_LOGDIR

Log file directory (default: ~/.bio_pype/logs)

PYPE_DOCKER

Container runtime executable (default: docker)

PYPE_CONDA

Conda executable path (default: conda)

PYPE_SINGULARITY_CACHE

Singularity image cache directory (default: current working directory)

PYPE_PULL_TIMEOUT

Timeout in seconds for container/conda pulls during profile build (default: 3600)

PYPE_EDITOR

Editor for pype profiles edit (default: $EDITOR or vi)

PYPE_MONITOR_INTERVAL

Resource-monitor sampling interval in seconds (default: 1.0)

PYPE_MONITOR_FLUSH_INTERVAL

Resource-monitor sample flush interval in seconds (default: 5.0)

PYPE_QUEUE_POLL_INTERVAL

Seconds between queue poll cycles (default: 10)

PYPE_QUEUE_ACCOUNT

Default account/project submitted to the queue scheduler

PYPE_QUEUE_PARTITION

Default partition/queue name for job submission

PYPE_QUEUE_PARTITIONS_CONFIG

Path to a partition configuration file used for resource matching

COMPUTE_BIO_API_URL

compute.bio API endpoint for web monitoring (default: https://app.compute.bio/api/v1)

COMPUTE_BIO_TOKEN

API authentication token for compute.bio (optional; leave unset to disable API integration)

Storage and execution mode#

These variables select the storage backend that moves data in and out of a snippet execution. See Storage Backends for full details.

Variable

Description

PYPE_OVERLAY_MODE

Storage backend: direct (default), overlay, or a queue-module name (e.g. scaleway)

PYPE_OVERLAY_SCRATCH

Scratch directory for the overlay backend (default: /tmp/pype-overlay)

PYPE_SNAPSHOT_REGISTRY

Path to the snapshot→path registry JSON used by cloud backends

Energy and carbon tracking#

These variables enable and tune CO2eq/energy estimation. See Energy and Carbon Tracking for the full guide.

Variable

Description

PYPE_CARBON_COUNTRY

Electricity region/zone (e.g. DK, DE, FR); setting it enables tracking

PYPE_CO2EQ_SRC

Carbon-intensity provider: entsoe, electricitymaps or compute_bio

ENTSOE_API_KEY

API token for the entsoe provider

ELECTRICITY_MAPS_API_KEY

API token for the electricitymaps provider

PYPE_CARBON_FALLBACK_G_PER_KWH

Static carbon intensity used when no provider value is available (default: 300.0)

PYPE_CARBON_CPU_TDP_W

CPU TDP for the power model when idle/loaded watts are unknown (default: 100.0)

PYPE_POWER_IDLE_W

Measured node idle power in watts (optional)

PYPE_POWER_LOADED_W

Measured node full-load power in watts (optional)

Compute.bio API Integration#

Bio_pype can optionally integrate with compute.bio for web-based pipeline monitoring and control.

Setup#

To enable compute.bio integration:

  1. Obtain an API token from your compute.bio account

  2. Add configuration to ~/.bio_pype/config:

    COMPUTE_BIO_API_URL=https://app.compute.bio/api/v1
    COMPUTE_BIO_TOKEN=your_api_token_here
    

Or set environment variables:

export COMPUTE_BIO_API_URL=https://app.compute.bio/api/v1
export COMPUTE_BIO_TOKEN=your_api_token_here

Features#

When configured, Bio_pype automatically:

  • Registers workers: Each pipeline run registers with a unique worker ID

  • Sends progress updates: Pipeline status sent every 30 seconds (default)

  • Tracks jobs: Job status, queue IDs, and timestamps synced to API

  • Supports log streaming: Real-time log viewing through web interface

  • Receives commands: API can request logs and other information

Configuration Options#

Fine-tune API integration with environment variables:

Variable

Description

PYPE_API_PROGRESS_INTERVAL

Seconds between progress updates (default: 30)

PYPE_API_COMMAND_INTERVAL

Seconds between command polls (default: 60)

Example:

export PYPE_API_PROGRESS_INTERVAL=60
export PYPE_API_COMMAND_INTERVAL=120

Verification#

To verify API integration is working:

  1. Run a pipeline with API configured

  2. Check logs for registration message:

    INFO: Worker registered: worker-hostname-abc123 (Pipeline ID: 12345)
    INFO: Started progress watcher (updates every 30s)
    INFO: Started command watcher (polls every 60s)
    
  3. View pipeline progress on compute.bio web interface

Disabling API Integration#

API integration is disabled by default. If you want to ensure it’s disabled:

  • Don’t set COMPUTE_BIO_API_URL or COMPUTE_BIO_TOKEN

  • Or remove them from ~/.bio_pype/config

If API credentials are not configured, pipelines run normally without web monitoring:

WARNING: compute.bio API not configured. Pipeline progress will not be sent to API.

CLI Commands#

Bio_pype provides CLI commands for testing and running the compute.bio integration.

Test API connectivity:

pype compute_bio --test

Tests the API connection without creating any records. Verifies that COMPUTE_BIO_API_URL and COMPUTE_BIO_TOKEN are set correctly.

Run listener daemon:

pype compute_bio --run

# With custom log directory
pype compute_bio --run --log /path/to/logs

Starts a persistent daemon that monitors compute.bio for commands sent to pipelines running on this machine. The daemon:

  • Polls for commands every 10 seconds

  • Processes commands for inactive workers (pipelines that finished or crashed)

  • Handles log requests, job cancellation, and other remote commands

  • Uses a lock file to prevent multiple daemons on the same machine

Daemon usage notes:

  • Run in the background: nohup pype compute_bio --run &

  • Only one daemon should run per machine

  • The daemon handles commands for all pipelines registered from this host

  • Press Ctrl+C to stop the daemon gracefully