AI Token Ledger Explained — An Imbila.AI Explainer

01 · What Is an AI Token Ledger

The system of record for every unit of AI work.

An AI Token Ledger is a system of record that meters, logs, attributes and audits every unit of work performed by AI agents and LLM-powered applications. It treats the token — the fundamental unit of LLM input and output — as an atomic unit of economic activity, the same way cloud billing treats API calls or fintech ledgers treat financial transactions.

No single product or protocol owns this space today. What exists is a composable architecture assembled from open-source components spanning observability, API gateways, workflow engines, immutable storage, and policy engines. Think of it as the "Stripe + Ledger + Observability + Policy Engine" for AI tokens — each layer handled by a different tool, none of them yet unified into one canonical system.

The problem it solves is straightforward: as organisations deploy autonomous agents that make API calls, invoke tools, chain reasoning steps, and consume real money in tokens, they need to know what happened, who did it, what it cost, and whether it was authorised. Without a system of record, AI spend is ungovernable and agent behaviour is unauditable.

The Problem

Invisible AI Spend & Unauditable Actions

Tokens are consumed across providers, models, agents, and workflows with no unified accounting. Traditional logging captures requests but not economic attribution or decision provenance.

The Solution

A Composable Metering Stack

By layering an LLM gateway, observability platform, event stream, immutable ledger, and policy engine, teams create a complete audit trail from prompt to cost allocation.

The Result

Full Visibility Into AI Economics

Every API call is metered (tokens, latency, cost). Every agent action is logged (inputs, outputs, tools used). Every decision is auditable (human vs agent vs chain-of-agents). Every cost is allocatable (per user, agent, task, customer).

👤

Agent / User

Initiates request

→

🚪

LLM Gateway

LiteLLM / Portkey

→

🔍

Observability

Langfuse / OTel

→

📨

Event Stream

Kafka / Inngest

→

📒

Immutable Ledger

Postgres / immudb

→

⚖️

Governance

OPA / Custom

02 · Why It Matters

AI spend is now a material line item.

As organisations move from single-prompt applications to multi-agent systems that run autonomously for hours or days, the gap between "tokens consumed" and "value delivered" becomes a governance problem. Every untracked agent run is an unaudited financial transaction.

10B+

Tokens/Day via LiteLLM

24K+

GitHub Stars on Langfuse

$0

Unified Token Ledger Standards

5

Infrastructure Layers Required

Why not just use cloud billing?

Cloud provider bills tell you aggregate spend. They don't tell you which agent made which decision, whether it was authorised, what reasoning chain led to the cost, or how to attribute that cost to a specific customer, project, or workflow. The AI Token Ledger fills the gap between cloud invoices and operational accountability.

03 · How It Works

Five composable layers. All open-source options.

The system of record for AI work is built from five composable layers. Each layer has open-source options at production maturity.

Layer A

Observability & Tracing

Captures token usage, latency, cost, prompts, responses, and traces across agent steps. The foundational data collection layer.

Langfuse — Open-source LLM engineering platform (YC W23). Tracks traces, sessions, token usage, cost per request. Supports OpenTelemetry ingestion natively since v3. Self-hostable via Docker or Kubernetes. Architecture: Postgres + ClickHouse + Redis + S3. License: MIT (core).

Helicone — Open-source LLM observability (YC W23). Proxy-based: change one line (your base URL) to start logging. Rust-based AI gateway for low-latency proxying. Tracks cost, latency, usage, caching, and analytics.

OpenTelemetry GenAI Semantic Conventions — Emerging industry standard for LLM telemetry. Defines span attributes for gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.request.model, and more. Agent-level conventions for create_agent and invoke_agent. Status: experimental.

OpenLLMetry (by Traceloop) — Extends OTel semantic conventions for GenAI. Instrumentation libraries for Python, TypeScript, Go, and Ruby. Contributors lead the OTel GenAI SIG working group.

Layer B

API Gateway / Proxy

Centralises all LLM calls through a single control point. Normalises requests across providers. Enforces rate limits, budgets, and routing. Captures usage at the edge before it reaches the model.

LiteLLM — Python SDK + Proxy Server (AI Gateway). Calls 100+ LLMs in OpenAI-compatible format. Multi-tenant cost tracking per project/user. Virtual keys for access control. Budget enforcement per key/team. 8ms P95 latency at 1K RPS. Native Langfuse integration. License: MIT.

Portkey AI Gateway — <1ms latency, 122KB footprint. Routes to 1,600+ models. Load balancing, fallbacks, retries, guardrails, cost tracking. 10B+ tokens daily in production. Open-source LLM pricing database for 2,300+ models. License: MIT.

Helicone AI Gateway — Rust-based, fully open-source. Smart routing, caching, rate limiting, tracing, fallbacks. Supports 20+ providers.

This layer is where token accounting becomes enforceable.

The gateway sees every request before it reaches the provider, making it the natural metering point.

Layer C

Event & Workflow Engine

Treats every agent action as a durable event. Provides replayability, fault tolerance, and audit trails across multi-step workflows. Your "transaction log backbone."

Temporal — Durable execution platform. Fork of Uber's Cadence. Workflows survive process failures, network partitions, and infrastructure outages. Complete execution history. SDKs for Go, Java, PHP, TypeScript, Python. Used by Stripe, Netflix, Datadog, Coinbase, Snap. License: MIT.

Inngest — Event-driven workflow engine. Serverless-first: functions invoked via HTTP, no worker fleet. Built-in retries, scheduling, concurrency control. Self-hostable. License: SSPL + delayed Apache 2.0.

Apache Kafka — Distributed event streaming platform. Append-only commit log. De facto backbone for high-throughput event pipelines. License: Apache 2.0.

Layer D

Immutable Ledger / Storage

Stores the permanent, tamper-proof record of all token transactions. Makes the system auditable and compliant. This is the most underdeveloped layer in the current stack — no dominant open-source "AI token ledger" exists.

PostgreSQL with append-only tables — Most common pragmatic choice. INSERT-only patterns, triggers to prevent UPDATE/DELETE, timestamped versioning. No cryptographic tamper-proofing, but operationally sufficient.

immudb — Open-source immutable database. Zero-trust: cryptographically coherent via Merkle trees. SQL and Key-Value access. Millions of TPS. FIPS-compliant verification. Time-travel queries. Used in financial services, government, defence. License: BSL 1.1.

ClickHouse — Column-oriented OLAP database. Used by Langfuse as analytics backend. Excellent for aggregation queries over token usage data. License: Apache 2.0.

AWS QLDB — Amazon's managed ledger database. Discontinued — AWS announced end of service. immudb is the primary open-source alternative.

Layer E

Policy & Governance

Enforces rules about who can spend what, which tools agents can use, and when human approval is required. Separates policy logic from application code.

Open Policy Agent (OPA) — General-purpose policy engine. CNCF Graduated project (accepted 2018, graduated 2021). Policy-as-code using the Rego declarative language. Used by Netflix, Intuit, and thousands of Kubernetes clusters. Applicable to AI governance: budget caps per agent, allowed tool lists, human-in-the-loop approval checkpoints, model access control. License: Apache 2.0.

04 · The Ecosystem

When to use what across the stack.

The AI Token Ledger is assembled from best-of-breed open-source tools. Here's how they map to needs.

Need	Use This	GitHub	Why
Track token usage + cost per request	Langfuse	langfuse/langfuse	Purpose-built LLM observability with native OTel support
Centralise LLM calls + enforce budgets	LiteLLM	BerriAI/litellm	OpenAI-compatible proxy with multi-tenant spend tracking
Fast gateway with routing + guardrails	Portkey	Portkey-AI/gateway	Sub-millisecond latency, 1,600+ model support
Proxy-first observability	Helicone	Helicone/helicone	One-line integration, Rust gateway, cost analytics
Standardised telemetry schema	OTel GenAI SemConv	open-telemetry/semantic-conventions	Vendor-neutral standard for LLM span attributes
Durable multi-step workflows	Temporal	temporalio/temporal	Complete execution history, survives any failure
Serverless event-driven workflows	Inngest	inngest/inngest	No worker fleet, built-in retries and scheduling
Tamper-proof immutable storage	immudb	codenotary/immudb	Cryptographic verification, Merkle tree integrity
High-speed cost analytics	ClickHouse	ClickHouse/ClickHouse	Column-oriented OLAP, used by Langfuse internally
Policy-as-code governance	OPA	open-policy-agent/opa	CNCF Graduated, declarative policy engine

Proven Combinations

LiteLLM + Langfuse

The most documented integration. LiteLLM routes and meters; Langfuse traces and analyses. Endorsed by both projects. Lemonade (insurer) runs this pairing in production.

Helicone + n8n

Helicone publishes an official n8n nodes package for workflow integration.

OpenTelemetry + Langfuse

Langfuse v3 natively ingests OTel spans using GenAI semantic conventions. Compliant with v1.37+ of the spec.

OpenTelemetry + Datadog

Datadog LLM Observability natively supports OTel GenAI SemConv (v1.37+), allowing teams to instrument once and analyse across platforms.

05 · Use Cases

What teams actually build with this stack.

The AI Token Ledger pattern serves any scenario where token consumption needs to be tracked, attributed, governed, or billed.

FinOps

FinOps for AI

Track and allocate token spend per team, project, and customer. Enforce budget caps. Generate chargeback reports. The gateway + observability layers provide the data; the ledger makes it auditable.

Compliance

Agent Audit Trails

Log every decision an autonomous agent makes — which tools it invoked, what reasoning chain it followed, what data it accessed. Critical for regulated industries and enterprise risk management.

Multi-Agent

Multi-Agent Cost Attribution

When Agent A calls Agent B which calls Agent C, each consuming tokens across different models, attribute the full cost chain back to the originating request. Requires trace propagation across the full stack.

Regulatory

Compliance & Regulatory Reporting

Demonstrate to auditors that AI systems operated within defined policies. Show human-in-the-loop controls were respected. Prove data handling complied with governance rules. Immutable ledger + policy engine enables this.

Product

AI-Powered Product Billing

If your product bills customers for AI features, you need accurate token metering per customer request. The gateway captures raw usage; the ledger provides the billing record of truth.

Incident Response

Incident Investigation & Replay

When an agent produces a bad outcome, replay the exact sequence of calls, prompts, and tool invocations that led to it. Temporal and Inngest provide native replay; Langfuse provides the trace detail.

06 · Evolution

From general-purpose infra to AI system of record.

The tools that compose an AI Token Ledger emerged from different eras of infrastructure, converging around the token as the atomic unit of AI economics.

2017–2019

Foundations

OpenTelemetry project formed (merger of OpenTracing and OpenCensus). Open Policy Agent accepted into CNCF (2018). Temporal forked from Uber's Cadence. These general-purpose infrastructure tools laid the groundwork.

2020–2022

LLM Era Begins

GPT-3 launches (2020). Teams start wrapping LLM calls in ad-hoc logging. immudb gains production adoption in financial services and government. No AI-specific observability tooling exists yet.

2023

Observability Explodes

Langfuse launches (YC W23). Helicone launches (YC W23). LiteLLM gains traction as the de facto LLM proxy. Portkey AI Gateway open-sourced. Teams start combining these tools into bespoke stacks. The pattern of "gateway + observability + ledger" emerges.

2024

Standards Emerge

OpenTelemetry GenAI Semantic Conventions published (experimental). Defines standard span attributes for LLM operations. Agent-level semantic conventions added. OpenLLMetry project leads the OTel GenAI SIG working group. AWS announces QLDB discontinuation; immudb becomes primary open-source immutable DB alternative.

2025–2026

Convergence (Current)

Langfuse v3 adds native OpenTelemetry ingestion. Datadog supports OTel GenAI SemConv v1.37+ natively. LiteLLM processes 10B+ tokens daily. Portkey gateway handles 400B+ tokens for 200+ enterprises. The five-layer reference architecture solidifies. But still: no unified token ledger standard, no open protocol for cross-provider usage accounting, no "GAAP for AI usage."

07 · Decision Guide

Should you build an AI Token Ledger stack?

The five-layer stack adds real infrastructure overhead. That trade-off is worth it in specific scenarios and wasteful in others.

Build this stack when

You're running autonomous agents that make unsupervised API calls and consume real budget.

You need to attribute AI costs to specific customers, projects, or business units for chargeback.

You operate in regulated industries where agent decisions must be auditable.

You're scaling past prototype stage and need to answer "what did that agent do and why did it cost $X?"

You have multi-provider LLM usage (OpenAI + Anthropic + Bedrock, etc.) that needs unified tracking.

Skip this complexity when

You're running a single LLM integration with predictable, low-volume usage — provider dashboards are sufficient.

Your AI spend is immaterial and doesn't warrant governance infrastructure.

You don't have agents making autonomous decisions — just human-initiated, single-turn completions.

You're in early prototyping and the overhead of a five-layer stack would slow you down — come back when you have production traffic.

Minimum Viable Stack

If you're starting from zero, the smallest meaningful combination is:

1. LiteLLM (gateway + metering) — centralise all LLM calls
2. Langfuse (observability + tracing) — capture traces and costs
3. PostgreSQL (append-only tables) — store the ledger

Add Temporal/Inngest (workflow durability), immudb (tamper-proofing), and OPA (governance) as your requirements grow.

08 · Imbila Perspective

How we see it. What we recommend.

Our take

The "AI Token Ledger" is not a product you buy — it's an architecture you compose. The industry is converging toward treating the token as the atomic unit of AI economics, but no one has built the unified system of record yet. For South African businesses, the additional consideration is latency and cost overhead: every observability proxy between your application and a US/EU-hosted LLM API adds round-trip time. Choose lightweight gateways (Portkey at 122KB, LiteLLM at 8ms P95) and self-host observability (Langfuse via Docker) close to your compute to minimise the penalty.

Enterprise

AI Spend Governance Framework

For organisations running multiple AI initiatives across departments, the priority is centralising all LLM calls through a single gateway (LiteLLM or Portkey) and feeding traces to Langfuse. This gives finance teams cost attribution by project and business unit. Add OPA policies to enforce per-department budget caps and model access controls. The immutable ledger layer (PostgreSQL append-only or immudb) becomes critical for audit compliance.

Studio

Production-Ready Agent Observability

For teams building and shipping AI products, instrument from day one. Wire LiteLLM + Langfuse into your agent stack before you have production traffic. Design your trace schema around the OpenTelemetry GenAI semantic conventions so your telemetry is portable. Use Temporal or Inngest for any multi-step agent workflow that needs to survive failures and remain auditable.

Dojo

Token Economics Literacy

Understanding how tokens translate to cost, how cost attribution works across agent chains, and how observability stacks compose is becoming essential knowledge for AI practitioners. The five-layer reference architecture is a practical framework for reasoning about AI system design. Start with the OTel GenAI Semantic Conventions spec — it's the closest thing to a shared vocabulary for this space.

09 · Resources & Sources

Go deeper. Start building.

Core Open-Source Repositories

Tool	GitHub	Stars	License
Langfuse	langfuse/langfuse	24K+	MIT
LiteLLM	BerriAI/litellm	—	MIT
Portkey AI Gateway	Portkey-AI/gateway	—	MIT
Helicone	Helicone/helicone	—	Apache 2.0
Helicone AI Gateway	Helicone/ai-gateway	—	Open Source
Temporal	temporalio/temporal	—	MIT
Inngest	inngest/inngest	—	SSPL + Apache 2.0
immudb	codenotary/immudb	—	BSL 1.1
Open Policy Agent	open-policy-agent/opa	—	Apache 2.0
ClickHouse	ClickHouse/ClickHouse	—	Apache 2.0
Apache Kafka	apache/kafka	—	Apache 2.0

Standards & Specifications

Industry Standards

OTel GenAI Semantic Conventions — Emerging standard for LLM telemetry ↗
OTel GenAI Metrics Spec — Token usage, latency, and cost metrics ↗
OTel GenAI Agent Spans — Agent-level tracing conventions ↗
OpenLLMetry Semantic Conventions — Extended GenAI conventions by Traceloop ↗
OTel GenAI Span YAML — Machine-readable spec ↗

Key Documentation

Getting Started

Langfuse Self-Hosting Guide — Architecture and deployment options ↗
Langfuse OTel Integration — OTel spans to Langfuse data model ↗
LiteLLM Proxy Docs — Gateway setup with spend tracking ↗
LiteLLM + Langfuse Integration — Combined gateway + observability ↗
Portkey Open Source Docs — Gateway, pricing DB, community ↗
immudb Documentation — Immutable database setup and SDKs ↗
OPA Documentation — Policy-as-code setup and integration ↗
Temporal Documentation — Durable execution platform ↗

Ledger Schema — Conceptual Primitive

For teams designing their own AI Token Ledger, the minimum viable record should capture:

event_id         — Unique identifier for this token transaction
agent_id         — Which agent (or agent chain) made the call
human_id         — Which human initiated or is responsible
workflow_id      — Which workflow / session this belongs to
tokens_in        — Input tokens consumed
tokens_out       — Output tokens generated
cost             — Calculated cost in currency
model            — Which model was called
provider         — Which provider served the request
tools_used       — Which tools / functions were invoked
timestamp        — When it happened (UTC)
trace_id         — OpenTelemetry trace ID for full context

Imbila.AI

Stay in the Loop

We publish twice-weekly AI briefings covering tools, frameworks, and developments that matter for South African businesses. No hype, no fluff — just what's useful and what's not.

Get in touch Read the blog ↗

Disclaimer

Sources & Attribution

Content validated March 2026. OpenTelemetry is a trademark of the Cloud Native Computing Foundation. Langfuse, LiteLLM, Helicone, Portkey, Temporal, Inngest, immudb, ClickHouse, Apache Kafka, and Open Policy Agent are trademarks of their respective owners. This is an independent educational explainer by Imbila.AI.