.. index:: Carbon tracking, Energy, CO2eq, PYPE_CARBON_COUNTRY, PYPE_CO2EQ_SRC .. _carbon_tracking: Energy and Carbon Tracking ========================== Bio_pype can estimate the **energy consumption** (kWh) and **CO2eq footprint** (grams) of every snippet it runs and roll those numbers up to a pipeline-level summary. The feature reuses the per-second telemetry already collected by the :ref:`resource monitor ` — CPU utilisation, memory, GPU power — and combines it with a power model and an electricity *carbon intensity* value (gCO2eq/kWh). It is **off by default** and adds **no external dependencies** (standard library only: ``urllib``, ``sqlite3``, ``xml.etree``). When disabled, pipelines run exactly as before. What you get ------------ Once enabled, three artifacts are produced automatically: **Per-job estimate** — written under the ``co2`` key of each job's ``resource_consumption`` in the pipeline runtime, and logged at job completion:: INFO: CO2 estimate: 12.3400 gCO2eq (45.6000 Wh), intensity 280 gCO2eq/kWh [DK] **Pipeline summary** — rolled up across all jobs into ``__pipeline_metadata__.carbon`` and printed at the end of the run:: Carbon & efficiency Energy: 0.4560 kWh CO2eq: 127.68 g SCI: 4.21 gCO2/GB (30.32 GB processed) The *SCI* line (Software Carbon Intensity) is shown only when input/output data sizes were recorded; it expresses grams of CO2eq per GB of data processed. **Power profile** — a time-binned curve of total instantaneous pipeline power is written to ``pipeline_power_profile.yaml`` in the pipeline log directory. Each bin (30 minutes wide by default) records mean power and energy:: - t: '2026-06-14T10:15:00' power_w: 412.5 energy_wh: 206.25 - t: '2026-06-14T10:45:00' power_w: 388.1 energy_wh: 194.05 Concurrent jobs are summed, so the curve reflects the whole workflow's draw over wall-clock time, not any single job. Enabling carbon tracking ------------------------- Two variables switch the feature on: * ``PYPE_CARBON_COUNTRY`` — the electricity region/zone to price emissions against (e.g. ``DK``, ``DE``, ``FR``). **Setting this is what activates tracking**: without a region, the resource monitor never creates a carbon cache and no estimates are produced. * ``PYPE_CO2EQ_SRC`` — which provider supplies the carbon intensity value. Add them to ``~/.bio_pype/config`` or export them:: PYPE_CARBON_COUNTRY=DK PYPE_CO2EQ_SRC=entsoe ENTSOE_API_KEY=your-entsoe-token If ``PYPE_CO2EQ_SRC`` is left unset but ``PYPE_CARBON_COUNTRY`` is set, tracking still runs using the static fallback intensity (``PYPE_CARBON_FALLBACK_G_PER_KWH``, default 300 gCO2eq/kWh) so you get energy numbers without any network access. Carbon intensity providers -------------------------- A single provider is queried per run, selected by ``PYPE_CO2EQ_SRC``: .. list-table:: :header-rows: 1 :widths: 22 18 60 * - ``PYPE_CO2EQ_SRC`` - Credential - Source * - ``entsoe`` - ``ENTSOE_API_KEY`` - ENTSO-E generation mix, flow-traced to an intensity value. EU coverage; includes a 24-hour forecast used for green-window scheduling. * - ``electricitymaps`` - ``ELECTRICITY_MAPS_API_KEY`` - Electricity Maps direct CO2eq signal. Global coverage, higher precision. * - ``compute_bio`` - ``COMPUTE_BIO_TOKEN`` / ``COMPUTE_BIO_API_URL`` - compute.bio API proxy (see :ref:`configuration`). There is **no fallback chain between providers**: if the configured provider fails, the run falls back to ``PYPE_CARBON_FALLBACK_G_PER_KWH`` for that hour. Intensity caching ----------------- Carbon intensity is fetched at most **once per hour per region**, regardless of how many jobs or concurrent workers are running. Values are stored in a SQLite cache at ``~/.bio_pype/carbon_cache.db`` (path fixed as ``PYPE_HOME/carbon_cache.db``). The cache uses WAL mode and a fetch-lock table so that, across all concurrent processes on a node, only one acquires the lock and calls the API each hour; the others read the cached value. Failed fetches hold the lock for a short TTL to avoid hammering the API, then expire so the next worker can retry. The power model --------------- Power for each telemetry interval is estimated as the sum of CPU, GPU and memory contributions: * **CPU** — if node idle/loaded power is known (see below), CPU power is interpolated between them by mean CPU utilisation. Otherwise it is ``PYPE_CARBON_CPU_TDP_W`` (default 100 W) scaled by utilisation. * **GPU** — measured watts from NVML when available, else 0 W. * **Memory** — estimated at ~0.375 W per GB of resident memory. Energy is power integrated over the interval; CO2eq is energy times the carbon intensity for that interval. Two accounting methods are recorded in the estimate's ``method`` field: * ``timeline`` — each per-second sample is priced with the carbon intensity in effect at that moment (most accurate). * ``summary`` — a single average applied to the job summary (used as a fallback when per-sample data is insufficient). Calibrating node power ^^^^^^^^^^^^^^^^^^^^^^^ For more accurate CPU power you can supply measured idle and loaded wattage instead of relying on the TDP estimate: .. list-table:: :header-rows: 1 :widths: 32 68 * - Variable - Description * - ``PYPE_POWER_IDLE_W`` - Node power draw when idle (W). * - ``PYPE_POWER_LOADED_W`` - Node power draw at full CPU load (W). * - ``PYPE_CARBON_CPU_TDP_W`` - CPU TDP used when idle/loaded values are not available (default 100 W). To measure these on a given partition/instance type, run the bundled benchmark snippet, which records idle then loaded power (via Intel RAPL when available, falling back to a TDP estimate):: pype snippets _benchmark_power --output power.json Feed the resulting idle/loaded watts into ``PYPE_POWER_IDLE_W`` / ``PYPE_POWER_LOADED_W`` (or into a partition configuration) so subsequent runs price energy against the real hardware. Green-window scheduling ----------------------- When the ``entsoe`` provider is used, Bio_pype also caches a 24-hour carbon intensity forecast. This enables finding the lowest-carbon contiguous window in which to run a deferrable workload of a given duration before a deadline — the basis for scheduling jobs when the grid is cleanest. This is consumed programmatically (``carbon.find_green_window`` / ``carbon.forecast_carbon_intensity``) and is still being surfaced in the CLI. Configuration reference ----------------------- All carbon-related variables, for quick reference: .. list-table:: :header-rows: 1 :widths: 34 16 50 * - Variable - Default - Description * - ``PYPE_CARBON_COUNTRY`` - *(unset)* - Electricity region/zone; setting it enables tracking. * - ``PYPE_CO2EQ_SRC`` - *(unset)* - Provider: ``entsoe``, ``electricitymaps`` or ``compute_bio``. * - ``ENTSOE_API_KEY`` - *(unset)* - Token for the ``entsoe`` provider. * - ``ELECTRICITY_MAPS_API_KEY`` - *(unset)* - Token for the ``electricitymaps`` provider. * - ``PYPE_CARBON_FALLBACK_G_PER_KWH`` - ``300.0`` - Static intensity used when no provider value is available. * - ``PYPE_CARBON_CPU_TDP_W`` - ``100.0`` - CPU TDP for the power model when idle/loaded watts are unknown. * - ``PYPE_POWER_IDLE_W`` - *(unset)* - Measured node idle power (W). * - ``PYPE_POWER_LOADED_W`` - *(unset)* - Measured node full-load power (W). See :ref:`configuration` for the complete list of Bio_pype environment variables.