.. index:: Carbon tracking, Energy, CO2eq, PYPE_CARBON_COUNTRY, PYPE_CO2EQ_SRC

.. _carbon_tracking:

Energy and Carbon Tracking
==========================

Bio_pype can estimate the **energy consumption** (kWh) and **CO2eq footprint**
(grams) of every snippet it runs and roll those numbers up to a pipeline-level
summary.  The feature reuses the per-second telemetry already collected by the
:ref:`resource monitor <logs>` — CPU utilisation, memory, GPU power — and
combines it with a power model and an electricity *carbon intensity* value
(gCO2eq/kWh).

It is **off by default** and adds **no external dependencies** (standard library
only: ``urllib``, ``sqlite3``, ``xml.etree``).  When disabled, pipelines run
exactly as before.


What you get
------------

Once enabled, three artifacts are produced automatically:

**Per-job estimate** — written under the ``co2`` key of each job's
``resource_consumption`` in the pipeline runtime, and logged at job completion::

    INFO: CO2 estimate: 12.3400 gCO2eq (45.6000 Wh), intensity 280 gCO2eq/kWh [DK]

**Pipeline summary** — rolled up across all jobs into
``__pipeline_metadata__.carbon`` and printed at the end of the run::

    Carbon & efficiency
      Energy:       0.4560 kWh
      CO2eq:        127.68 g
      SCI:          4.21 gCO2/GB  (30.32 GB processed)

The *SCI* line (Software Carbon Intensity) is shown only when input/output data
sizes were recorded; it expresses grams of CO2eq per GB of data processed.

**Power profile** — a time-binned curve of total instantaneous pipeline power is
written to ``pipeline_power_profile.yaml`` in the pipeline log directory.  Each
bin (30 minutes wide by default) records mean power and energy::

    - t: '2026-06-14T10:15:00'
      power_w: 412.5
      energy_wh: 206.25
    - t: '2026-06-14T10:45:00'
      power_w: 388.1
      energy_wh: 194.05

Concurrent jobs are summed, so the curve reflects the whole workflow's draw over
wall-clock time, not any single job.


Enabling carbon tracking
-------------------------

Two variables switch the feature on:

* ``PYPE_CARBON_COUNTRY`` — the electricity region/zone to price emissions
  against (e.g. ``DK``, ``DE``, ``FR``).  **Setting this is what activates
  tracking**: without a region, the resource monitor never creates a carbon
  cache and no estimates are produced.
* ``PYPE_CO2EQ_SRC`` — which provider supplies the carbon intensity value.

Add them to ``~/.bio_pype/config`` or export them::

    PYPE_CARBON_COUNTRY=DK
    PYPE_CO2EQ_SRC=entsoe
    ENTSOE_API_KEY=your-entsoe-token

If ``PYPE_CO2EQ_SRC`` is left unset but ``PYPE_CARBON_COUNTRY`` is set, tracking
still runs using the static fallback intensity
(``PYPE_CARBON_FALLBACK_G_PER_KWH``, default 300 gCO2eq/kWh) so you get energy
numbers without any network access.


Carbon intensity providers
--------------------------

A single provider is queried per run, selected by ``PYPE_CO2EQ_SRC``:

.. list-table::
   :header-rows: 1
   :widths: 22 18 60

   * - ``PYPE_CO2EQ_SRC``
     - Credential
     - Source
   * - ``entsoe``
     - ``ENTSOE_API_KEY``
     - ENTSO-E generation mix, flow-traced to an intensity value.  EU coverage;
       includes a 24-hour forecast used for green-window scheduling.
   * - ``electricitymaps``
     - ``ELECTRICITY_MAPS_API_KEY``
     - Electricity Maps direct CO2eq signal.  Global coverage, higher precision.
   * - ``compute_bio``
     - ``COMPUTE_BIO_TOKEN`` / ``COMPUTE_BIO_API_URL``
     - compute.bio API proxy (see :ref:`configuration`).

There is **no fallback chain between providers**: if the configured provider
fails, the run falls back to ``PYPE_CARBON_FALLBACK_G_PER_KWH`` for that hour.


Intensity caching
-----------------

Carbon intensity is fetched at most **once per hour per region**, regardless of
how many jobs or concurrent workers are running.  Values are stored in a
SQLite cache at ``~/.bio_pype/carbon_cache.db`` (path fixed as
``PYPE_HOME/carbon_cache.db``).

The cache uses WAL mode and a fetch-lock table so that, across all concurrent
processes on a node, only one acquires the lock and calls the API each hour;
the others read the cached value.  Failed fetches hold the lock for a short TTL
to avoid hammering the API, then expire so the next worker can retry.


The power model
---------------

Power for each telemetry interval is estimated as the sum of CPU, GPU and memory
contributions:

* **CPU** — if node idle/loaded power is known (see below), CPU power is
  interpolated between them by mean CPU utilisation.  Otherwise it is
  ``PYPE_CARBON_CPU_TDP_W`` (default 100 W) scaled by utilisation.
* **GPU** — measured watts from NVML when available, else 0 W.
* **Memory** — estimated at ~0.375 W per GB of resident memory.

Energy is power integrated over the interval; CO2eq is energy times the carbon
intensity for that interval.  Two accounting methods are recorded in the
estimate's ``method`` field:

* ``timeline`` — each per-second sample is priced with the carbon intensity in
  effect at that moment (most accurate).
* ``summary`` — a single average applied to the job summary (used as a fallback
  when per-sample data is insufficient).


Calibrating node power
^^^^^^^^^^^^^^^^^^^^^^^

For more accurate CPU power you can supply measured idle and loaded wattage
instead of relying on the TDP estimate:

.. list-table::
   :header-rows: 1
   :widths: 32 68

   * - Variable
     - Description
   * - ``PYPE_POWER_IDLE_W``
     - Node power draw when idle (W).
   * - ``PYPE_POWER_LOADED_W``
     - Node power draw at full CPU load (W).
   * - ``PYPE_CARBON_CPU_TDP_W``
     - CPU TDP used when idle/loaded values are not available (default 100 W).

To measure these on a given partition/instance type, run the bundled benchmark
snippet, which records idle then loaded power (via Intel RAPL when available,
falling back to a TDP estimate)::

    pype snippets _benchmark_power --output power.json

Feed the resulting idle/loaded watts into ``PYPE_POWER_IDLE_W`` /
``PYPE_POWER_LOADED_W`` (or into a partition configuration) so subsequent runs
price energy against the real hardware.


Green-window scheduling
-----------------------

When the ``entsoe`` provider is used, Bio_pype also caches a 24-hour carbon
intensity forecast.  This enables finding the lowest-carbon contiguous window in
which to run a deferrable workload of a given duration before a deadline — the
basis for scheduling jobs when the grid is cleanest.  This is consumed
programmatically (``carbon.find_green_window`` / ``carbon.forecast_carbon_intensity``)
and is still being surfaced in the CLI.


Configuration reference
-----------------------

All carbon-related variables, for quick reference:

.. list-table::
   :header-rows: 1
   :widths: 34 16 50

   * - Variable
     - Default
     - Description
   * - ``PYPE_CARBON_COUNTRY``
     - *(unset)*
     - Electricity region/zone; setting it enables tracking.
   * - ``PYPE_CO2EQ_SRC``
     - *(unset)*
     - Provider: ``entsoe``, ``electricitymaps`` or ``compute_bio``.
   * - ``ENTSOE_API_KEY``
     - *(unset)*
     - Token for the ``entsoe`` provider.
   * - ``ELECTRICITY_MAPS_API_KEY``
     - *(unset)*
     - Token for the ``electricitymaps`` provider.
   * - ``PYPE_CARBON_FALLBACK_G_PER_KWH``
     - ``300.0``
     - Static intensity used when no provider value is available.
   * - ``PYPE_CARBON_CPU_TDP_W``
     - ``100.0``
     - CPU TDP for the power model when idle/loaded watts are unknown.
   * - ``PYPE_POWER_IDLE_W``
     - *(unset)*
     - Measured node idle power (W).
   * - ``PYPE_POWER_LOADED_W``
     - *(unset)*
     - Measured node full-load power (W).

See :ref:`configuration` for the complete list of Bio_pype environment
variables.