Energy and Carbon Tracking#

Bio_pype can estimate the energy consumption (kWh) and CO2eq footprint (grams) of every snippet it runs and roll those numbers up to a pipeline-level summary. The feature reuses the per-second telemetry already collected by the resource monitor — CPU utilisation, memory, GPU power — and combines it with a power model and an electricity carbon intensity value (gCO2eq/kWh).

It is off by default and adds no external dependencies (standard library only: urllib, sqlite3, xml.etree). When disabled, pipelines run exactly as before.

What you get#

Once enabled, three artifacts are produced automatically:

Per-job estimate — written under the co2 key of each job’s resource_consumption in the pipeline runtime, and logged at job completion:

INFO: CO2 estimate: 12.3400 gCO2eq (45.6000 Wh), intensity 280 gCO2eq/kWh [DK]

Pipeline summary — rolled up across all jobs into __pipeline_metadata__.carbon and printed at the end of the run:

Carbon & efficiency
  Energy:       0.4560 kWh
  CO2eq:        127.68 g
  SCI:          4.21 gCO2/GB  (30.32 GB processed)

The SCI line (Software Carbon Intensity) is shown only when input/output data sizes were recorded; it expresses grams of CO2eq per GB of data processed.

Power profile — a time-binned curve of total instantaneous pipeline power is written to pipeline_power_profile.yaml in the pipeline log directory. Each bin (30 minutes wide by default) records mean power and energy:

- t: '2026-06-14T10:15:00'
  power_w: 412.5
  energy_wh: 206.25
- t: '2026-06-14T10:45:00'
  power_w: 388.1
  energy_wh: 194.05

Concurrent jobs are summed, so the curve reflects the whole workflow’s draw over wall-clock time, not any single job.

Enabling carbon tracking#

Two variables switch the feature on:

PYPE_CARBON_COUNTRY — the electricity region/zone to price emissions against (e.g. DK, DE, FR). Setting this is what activates tracking: without a region, the resource monitor never creates a carbon cache and no estimates are produced.
PYPE_CO2EQ_SRC — which provider supplies the carbon intensity value.

Add them to ~/.bio_pype/config or export them:

PYPE_CARBON_COUNTRY=DK
PYPE_CO2EQ_SRC=entsoe
ENTSOE_API_KEY=your-entsoe-token

If PYPE_CO2EQ_SRC is left unset but PYPE_CARBON_COUNTRY is set, tracking still runs using the static fallback intensity (PYPE_CARBON_FALLBACK_G_PER_KWH, default 300 gCO2eq/kWh) so you get energy numbers without any network access.

Carbon intensity providers#

A single provider is queried per run, selected by PYPE_CO2EQ_SRC:

`PYPE_CO2EQ_SRC`	Credential	Source
`entsoe`	`ENTSOE_API_KEY`	ENTSO-E generation mix, flow-traced to an intensity value. EU coverage; includes a 24-hour forecast used for green-window scheduling.
`electricitymaps`	`ELECTRICITY_MAPS_API_KEY`	Electricity Maps direct CO2eq signal. Global coverage, higher precision.
`compute_bio`	`COMPUTE_BIO_TOKEN` / `COMPUTE_BIO_API_URL`	compute.bio API proxy (see Configuration).

There is no fallback chain between providers: if the configured provider fails, the run falls back to PYPE_CARBON_FALLBACK_G_PER_KWH for that hour.

Intensity caching#

Carbon intensity is fetched at most once per hour per region, regardless of how many jobs or concurrent workers are running. Values are stored in a SQLite cache at ~/.bio_pype/carbon_cache.db (path fixed as PYPE_HOME/carbon_cache.db).

The cache uses WAL mode and a fetch-lock table so that, across all concurrent processes on a node, only one acquires the lock and calls the API each hour; the others read the cached value. Failed fetches hold the lock for a short TTL to avoid hammering the API, then expire so the next worker can retry.

The power model#

Power for each telemetry interval is estimated as the sum of CPU, GPU and memory contributions:

CPU — if node idle/loaded power is known (see below), CPU power is interpolated between them by mean CPU utilisation. Otherwise it is PYPE_CARBON_CPU_TDP_W (default 100 W) scaled by utilisation.
GPU — measured watts from NVML when available, else 0 W.
Memory — estimated at ~0.375 W per GB of resident memory.

Energy is power integrated over the interval; CO2eq is energy times the carbon intensity for that interval. Two accounting methods are recorded in the estimate’s method field:

timeline — each per-second sample is priced with the carbon intensity in effect at that moment (most accurate).
summary — a single average applied to the job summary (used as a fallback when per-sample data is insufficient).

Calibrating node power#

For more accurate CPU power you can supply measured idle and loaded wattage instead of relying on the TDP estimate:

Variable	Description
`PYPE_POWER_IDLE_W`	Node power draw when idle (W).
`PYPE_POWER_LOADED_W`	Node power draw at full CPU load (W).
`PYPE_CARBON_CPU_TDP_W`	CPU TDP used when idle/loaded values are not available (default 100 W).

To measure these on a given partition/instance type, run the bundled benchmark snippet, which records idle then loaded power (via Intel RAPL when available, falling back to a TDP estimate):

pype snippets _benchmark_power --output power.json

Feed the resulting idle/loaded watts into PYPE_POWER_IDLE_W / PYPE_POWER_LOADED_W (or into a partition configuration) so subsequent runs price energy against the real hardware.

Green-window scheduling#

When the entsoe provider is used, Bio_pype also caches a 24-hour carbon intensity forecast. This enables finding the lowest-carbon contiguous window in which to run a deferrable workload of a given duration before a deadline — the basis for scheduling jobs when the grid is cleanest. This is consumed programmatically (carbon.find_green_window / carbon.forecast_carbon_intensity) and is still being surfaced in the CLI.

Configuration reference#

All carbon-related variables, for quick reference:

Variable	Default	Description
`PYPE_CARBON_COUNTRY`	(unset)	Electricity region/zone; setting it enables tracking.
`PYPE_CO2EQ_SRC`	(unset)	Provider: `entsoe`, `electricitymaps` or `compute_bio`.
`ENTSOE_API_KEY`	(unset)	Token for the `entsoe` provider.
`ELECTRICITY_MAPS_API_KEY`	(unset)	Token for the `electricitymaps` provider.
`PYPE_CARBON_FALLBACK_G_PER_KWH`	`300.0`	Static intensity used when no provider value is available.
`PYPE_CARBON_CPU_TDP_W`	`100.0`	CPU TDP for the power model when idle/loaded watts are unknown.
`PYPE_POWER_IDLE_W`	(unset)	Measured node idle power (W).
`PYPE_POWER_LOADED_W`	(unset)	Measured node full-load power (W).

See Configuration for the complete list of Bio_pype environment variables.