Energy and Carbon Tracking#
Bio_pype can estimate the energy consumption (kWh) and CO2eq footprint (grams) of every snippet it runs and roll those numbers up to a pipeline-level summary. The feature reuses the per-second telemetry already collected by the resource monitor — CPU utilisation, memory, GPU power — and combines it with a power model and an electricity carbon intensity value (gCO2eq/kWh).
It is off by default and adds no external dependencies (standard library
only: urllib, sqlite3, xml.etree). When disabled, pipelines run
exactly as before.
What you get#
Once enabled, three artifacts are produced automatically:
Per-job estimate — written under the co2 key of each job’s
resource_consumption in the pipeline runtime, and logged at job completion:
INFO: CO2 estimate: 12.3400 gCO2eq (45.6000 Wh), intensity 280 gCO2eq/kWh [DK]
Pipeline summary — rolled up across all jobs into
__pipeline_metadata__.carbon and printed at the end of the run:
Carbon & efficiency
Energy: 0.4560 kWh
CO2eq: 127.68 g
SCI: 4.21 gCO2/GB (30.32 GB processed)
The SCI line (Software Carbon Intensity) is shown only when input/output data sizes were recorded; it expresses grams of CO2eq per GB of data processed.
Power profile — a time-binned curve of total instantaneous pipeline power is
written to pipeline_power_profile.yaml in the pipeline log directory. Each
bin (30 minutes wide by default) records mean power and energy:
- t: '2026-06-14T10:15:00'
power_w: 412.5
energy_wh: 206.25
- t: '2026-06-14T10:45:00'
power_w: 388.1
energy_wh: 194.05
Concurrent jobs are summed, so the curve reflects the whole workflow’s draw over wall-clock time, not any single job.
Enabling carbon tracking#
Two variables switch the feature on:
PYPE_CARBON_COUNTRY— the electricity region/zone to price emissions against (e.g.DK,DE,FR). Setting this is what activates tracking: without a region, the resource monitor never creates a carbon cache and no estimates are produced.PYPE_CO2EQ_SRC— which provider supplies the carbon intensity value.
Add them to ~/.bio_pype/config or export them:
PYPE_CARBON_COUNTRY=DK
PYPE_CO2EQ_SRC=entsoe
ENTSOE_API_KEY=your-entsoe-token
If PYPE_CO2EQ_SRC is left unset but PYPE_CARBON_COUNTRY is set, tracking
still runs using the static fallback intensity
(PYPE_CARBON_FALLBACK_G_PER_KWH, default 300 gCO2eq/kWh) so you get energy
numbers without any network access.
Carbon intensity providers#
A single provider is queried per run, selected by PYPE_CO2EQ_SRC:
|
Credential |
Source |
|---|---|---|
|
|
ENTSO-E generation mix, flow-traced to an intensity value. EU coverage; includes a 24-hour forecast used for green-window scheduling. |
|
|
Electricity Maps direct CO2eq signal. Global coverage, higher precision. |
|
|
compute.bio API proxy (see Configuration). |
There is no fallback chain between providers: if the configured provider
fails, the run falls back to PYPE_CARBON_FALLBACK_G_PER_KWH for that hour.
Intensity caching#
Carbon intensity is fetched at most once per hour per region, regardless of
how many jobs or concurrent workers are running. Values are stored in a
SQLite cache at ~/.bio_pype/carbon_cache.db (path fixed as
PYPE_HOME/carbon_cache.db).
The cache uses WAL mode and a fetch-lock table so that, across all concurrent processes on a node, only one acquires the lock and calls the API each hour; the others read the cached value. Failed fetches hold the lock for a short TTL to avoid hammering the API, then expire so the next worker can retry.
The power model#
Power for each telemetry interval is estimated as the sum of CPU, GPU and memory contributions:
CPU — if node idle/loaded power is known (see below), CPU power is interpolated between them by mean CPU utilisation. Otherwise it is
PYPE_CARBON_CPU_TDP_W(default 100 W) scaled by utilisation.GPU — measured watts from NVML when available, else 0 W.
Memory — estimated at ~0.375 W per GB of resident memory.
Energy is power integrated over the interval; CO2eq is energy times the carbon
intensity for that interval. Two accounting methods are recorded in the
estimate’s method field:
timeline— each per-second sample is priced with the carbon intensity in effect at that moment (most accurate).summary— a single average applied to the job summary (used as a fallback when per-sample data is insufficient).
Calibrating node power#
For more accurate CPU power you can supply measured idle and loaded wattage instead of relying on the TDP estimate:
Variable |
Description |
|---|---|
|
Node power draw when idle (W). |
|
Node power draw at full CPU load (W). |
|
CPU TDP used when idle/loaded values are not available (default 100 W). |
To measure these on a given partition/instance type, run the bundled benchmark snippet, which records idle then loaded power (via Intel RAPL when available, falling back to a TDP estimate):
pype snippets _benchmark_power --output power.json
Feed the resulting idle/loaded watts into PYPE_POWER_IDLE_W /
PYPE_POWER_LOADED_W (or into a partition configuration) so subsequent runs
price energy against the real hardware.
Green-window scheduling#
When the entsoe provider is used, Bio_pype also caches a 24-hour carbon
intensity forecast. This enables finding the lowest-carbon contiguous window in
which to run a deferrable workload of a given duration before a deadline — the
basis for scheduling jobs when the grid is cleanest. This is consumed
programmatically (carbon.find_green_window / carbon.forecast_carbon_intensity)
and is still being surfaced in the CLI.
Configuration reference#
All carbon-related variables, for quick reference:
Variable |
Default |
Description |
|---|---|---|
|
(unset) |
Electricity region/zone; setting it enables tracking. |
|
(unset) |
Provider: |
|
(unset) |
Token for the |
|
(unset) |
Token for the |
|
|
Static intensity used when no provider value is available. |
|
|
CPU TDP for the power model when idle/loaded watts are unknown. |
|
(unset) |
Measured node idle power (W). |
|
(unset) |
Measured node full-load power (W). |
See Configuration for the complete list of Bio_pype environment variables.