The emerging system of record for AI work
An AI Token Ledger is a system of record that meters, logs, attributes and audits every unit of work performed by AI agents and LLM-powered applications. It treats the token — the fundamental unit of LLM input and output — as an atomic unit of economic activity, the same way cloud billing treats API calls or fintech ledgers treat financial transactions.
No single product or protocol owns this space today. What exists is a composable architecture assembled from open-source components spanning observability, API gateways, workflow engines, immutable storage, and policy engines. Think of it as the "Stripe + Ledger + Observability + Policy Engine" for AI tokens — each layer handled by a different tool, none of them yet unified into one canonical system.
The problem it solves is straightforward: as organisations deploy autonomous agents that make API calls, invoke tools, chain reasoning steps, and consume real money in tokens, they need to know what happened, who did it, what it cost, and whether it was authorised. Without a system of record, AI spend is ungovernable and agent behaviour is unauditable.
Tokens are consumed across providers, models, agents, and workflows with no unified accounting. Traditional logging captures requests but not economic attribution or decision provenance.
By layering an LLM gateway, observability platform, event stream, immutable ledger, and policy engine, teams create a complete audit trail from prompt to cost allocation.
Every API call is metered (tokens, latency, cost). Every agent action is logged (inputs, outputs, tools used). Every decision is auditable (human vs agent vs chain-of-agents). Every cost is allocatable (per user, agent, task, customer).
As organisations move from single-prompt applications to multi-agent systems that run autonomously for hours or days, the gap between "tokens consumed" and "value delivered" becomes a governance problem. Every untracked agent run is an unaudited financial transaction.
Cloud provider bills tell you aggregate spend. They don't tell you which agent made which decision, whether it was authorised, what reasoning chain led to the cost, or how to attribute that cost to a specific customer, project, or workflow. The AI Token Ledger fills the gap between cloud invoices and operational accountability.
The system of record for AI work is built from five composable layers. Each layer has open-source options at production maturity.
Captures token usage, latency, cost, prompts, responses, and traces across agent steps. The foundational data collection layer.
Langfuse — Open-source LLM engineering platform (YC W23). Tracks traces, sessions, token usage, cost per request. Supports OpenTelemetry ingestion natively since v3. Self-hostable via Docker or Kubernetes. Architecture: Postgres + ClickHouse + Redis + S3. License: MIT (core).
Helicone — Open-source LLM observability (YC W23). Proxy-based: change one line (your base URL) to start logging. Rust-based AI gateway for low-latency proxying. Tracks cost, latency, usage, caching, and analytics.
OpenTelemetry GenAI Semantic Conventions — Emerging industry standard for LLM telemetry. Defines span attributes for gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.request.model, and more. Agent-level conventions for create_agent and invoke_agent. Status: experimental.
OpenLLMetry (by Traceloop) — Extends OTel semantic conventions for GenAI. Instrumentation libraries for Python, TypeScript, Go, and Ruby. Contributors lead the OTel GenAI SIG working group.
Centralises all LLM calls through a single control point. Normalises requests across providers. Enforces rate limits, budgets, and routing. Captures usage at the edge before it reaches the model.
LiteLLM — Python SDK + Proxy Server (AI Gateway). Calls 100+ LLMs in OpenAI-compatible format. Multi-tenant cost tracking per project/user. Virtual keys for access control. Budget enforcement per key/team. 8ms P95 latency at 1K RPS. Native Langfuse integration. License: MIT.
Portkey AI Gateway — <1ms latency, 122KB footprint. Routes to 1,600+ models. Load balancing, fallbacks, retries, guardrails, cost tracking. 10B+ tokens daily in production. Open-source LLM pricing database for 2,300+ models. License: MIT.
Helicone AI Gateway — Rust-based, fully open-source. Smart routing, caching, rate limiting, tracing, fallbacks. Supports 20+ providers.
The gateway sees every request before it reaches the provider, making it the natural metering point.
Treats every agent action as a durable event. Provides replayability, fault tolerance, and audit trails across multi-step workflows. Your "transaction log backbone."
Temporal — Durable execution platform. Fork of Uber's Cadence. Workflows survive process failures, network partitions, and infrastructure outages. Complete execution history. SDKs for Go, Java, PHP, TypeScript, Python. Used by Stripe, Netflix, Datadog, Coinbase, Snap. License: MIT.
Inngest — Event-driven workflow engine. Serverless-first: functions invoked via HTTP, no worker fleet. Built-in retries, scheduling, concurrency control. Self-hostable. License: SSPL + delayed Apache 2.0.
Apache Kafka — Distributed event streaming platform. Append-only commit log. De facto backbone for high-throughput event pipelines. License: Apache 2.0.
Stores the permanent, tamper-proof record of all token transactions. Makes the system auditable and compliant. This is the most underdeveloped layer in the current stack — no dominant open-source "AI token ledger" exists.
PostgreSQL with append-only tables — Most common pragmatic choice. INSERT-only patterns, triggers to prevent UPDATE/DELETE, timestamped versioning. No cryptographic tamper-proofing, but operationally sufficient.
immudb — Open-source immutable database. Zero-trust: cryptographically coherent via Merkle trees. SQL and Key-Value access. Millions of TPS. FIPS-compliant verification. Time-travel queries. Used in financial services, government, defence. License: BSL 1.1.
ClickHouse — Column-oriented OLAP database. Used by Langfuse as analytics backend. Excellent for aggregation queries over token usage data. License: Apache 2.0.
AWS QLDB — Amazon's managed ledger database. Discontinued — AWS announced end of service. immudb is the primary open-source alternative.
Enforces rules about who can spend what, which tools agents can use, and when human approval is required. Separates policy logic from application code.
Open Policy Agent (OPA) — General-purpose policy engine. CNCF Graduated project (accepted 2018, graduated 2021). Policy-as-code using the Rego declarative language. Used by Netflix, Intuit, and thousands of Kubernetes clusters. Applicable to AI governance: budget caps per agent, allowed tool lists, human-in-the-loop approval checkpoints, model access control. License: Apache 2.0.
The AI Token Ledger is assembled from best-of-breed open-source tools. Here's how they map to needs.
| Need | Use This | GitHub | Why |
|---|---|---|---|
| Track token usage + cost per request | Langfuse | langfuse/langfuse | Purpose-built LLM observability with native OTel support |
| Centralise LLM calls + enforce budgets | LiteLLM | BerriAI/litellm | OpenAI-compatible proxy with multi-tenant spend tracking |
| Fast gateway with routing + guardrails | Portkey | Portkey-AI/gateway | Sub-millisecond latency, 1,600+ model support |
| Proxy-first observability | Helicone | Helicone/helicone | One-line integration, Rust gateway, cost analytics |
| Standardised telemetry schema | OTel GenAI SemConv | open-telemetry/semantic-conventions | Vendor-neutral standard for LLM span attributes |
| Durable multi-step workflows | Temporal | temporalio/temporal | Complete execution history, survives any failure |
| Serverless event-driven workflows | Inngest | inngest/inngest | No worker fleet, built-in retries and scheduling |
| Tamper-proof immutable storage | immudb | codenotary/immudb | Cryptographic verification, Merkle tree integrity |
| High-speed cost analytics | ClickHouse | ClickHouse/ClickHouse | Column-oriented OLAP, used by Langfuse internally |
| Policy-as-code governance | OPA | open-policy-agent/opa | CNCF Graduated, declarative policy engine |
The most documented integration. LiteLLM routes and meters; Langfuse traces and analyses. Endorsed by both projects. Lemonade (insurer) runs this pairing in production.
Helicone publishes an official n8n nodes package for workflow integration.
Langfuse v3 natively ingests OTel spans using GenAI semantic conventions. Compliant with v1.37+ of the spec.
Datadog LLM Observability natively supports OTel GenAI SemConv (v1.37+), allowing teams to instrument once and analyse across platforms.
The AI Token Ledger pattern serves any scenario where token consumption needs to be tracked, attributed, governed, or billed.
Track and allocate token spend per team, project, and customer. Enforce budget caps. Generate chargeback reports. The gateway + observability layers provide the data; the ledger makes it auditable.
Log every decision an autonomous agent makes — which tools it invoked, what reasoning chain it followed, what data it accessed. Critical for regulated industries and enterprise risk management.
When Agent A calls Agent B which calls Agent C, each consuming tokens across different models, attribute the full cost chain back to the originating request. Requires trace propagation across the full stack.
Demonstrate to auditors that AI systems operated within defined policies. Show human-in-the-loop controls were respected. Prove data handling complied with governance rules. Immutable ledger + policy engine enables this.
If your product bills customers for AI features, you need accurate token metering per customer request. The gateway captures raw usage; the ledger provides the billing record of truth.
When an agent produces a bad outcome, replay the exact sequence of calls, prompts, and tool invocations that led to it. Temporal and Inngest provide native replay; Langfuse provides the trace detail.
The tools that compose an AI Token Ledger emerged from different eras of infrastructure, converging around the token as the atomic unit of AI economics.
OpenTelemetry project formed (merger of OpenTracing and OpenCensus). Open Policy Agent accepted into CNCF (2018). Temporal forked from Uber's Cadence. These general-purpose infrastructure tools laid the groundwork.
GPT-3 launches (2020). Teams start wrapping LLM calls in ad-hoc logging. immudb gains production adoption in financial services and government. No AI-specific observability tooling exists yet.
Langfuse launches (YC W23). Helicone launches (YC W23). LiteLLM gains traction as the de facto LLM proxy. Portkey AI Gateway open-sourced. Teams start combining these tools into bespoke stacks. The pattern of "gateway + observability + ledger" emerges.
OpenTelemetry GenAI Semantic Conventions published (experimental). Defines standard span attributes for LLM operations. Agent-level semantic conventions added. OpenLLMetry project leads the OTel GenAI SIG working group. AWS announces QLDB discontinuation; immudb becomes primary open-source immutable DB alternative.
Langfuse v3 adds native OpenTelemetry ingestion. Datadog supports OTel GenAI SemConv v1.37+ natively. LiteLLM processes 10B+ tokens daily. Portkey gateway handles 400B+ tokens for 200+ enterprises. The five-layer reference architecture solidifies. But still: no unified token ledger standard, no open protocol for cross-provider usage accounting, no "GAAP for AI usage."
The five-layer stack adds real infrastructure overhead. That trade-off is worth it in specific scenarios and wasteful in others.
You're running autonomous agents that make unsupervised API calls and consume real budget.
You need to attribute AI costs to specific customers, projects, or business units for chargeback.
You operate in regulated industries where agent decisions must be auditable.
You're scaling past prototype stage and need to answer "what did that agent do and why did it cost $X?"
You have multi-provider LLM usage (OpenAI + Anthropic + Bedrock, etc.) that needs unified tracking.
You're running a single LLM integration with predictable, low-volume usage — provider dashboards are sufficient.
Your AI spend is immaterial and doesn't warrant governance infrastructure.
You don't have agents making autonomous decisions — just human-initiated, single-turn completions.
You're in early prototyping and the overhead of a five-layer stack would slow you down — come back when you have production traffic.
If you're starting from zero, the smallest meaningful combination is:
1. LiteLLM (gateway + metering) — centralise all LLM calls
2. Langfuse (observability + tracing) — capture traces and costs
3. PostgreSQL (append-only tables) — store the ledger
Add Temporal/Inngest (workflow durability), immudb (tamper-proofing), and OPA (governance) as your requirements grow.
The "AI Token Ledger" is not a product you buy — it's an architecture you compose. The industry is converging toward treating the token as the atomic unit of AI economics, but no one has built the unified system of record yet. For South African businesses, the additional consideration is latency and cost overhead: every observability proxy between your application and a US/EU-hosted LLM API adds round-trip time. Choose lightweight gateways (Portkey at 122KB, LiteLLM at 8ms P95) and self-host observability (Langfuse via Docker) close to your compute to minimise the penalty.
For organisations running multiple AI initiatives across departments, the priority is centralising all LLM calls through a single gateway (LiteLLM or Portkey) and feeding traces to Langfuse. This gives finance teams cost attribution by project and business unit. Add OPA policies to enforce per-department budget caps and model access controls. The immutable ledger layer (PostgreSQL append-only or immudb) becomes critical for audit compliance.
For teams building and shipping AI products, instrument from day one. Wire LiteLLM + Langfuse into your agent stack before you have production traffic. Design your trace schema around the OpenTelemetry GenAI semantic conventions so your telemetry is portable. Use Temporal or Inngest for any multi-step agent workflow that needs to survive failures and remain auditable.
Understanding how tokens translate to cost, how cost attribution works across agent chains, and how observability stacks compose is becoming essential knowledge for AI practitioners. The five-layer reference architecture is a practical framework for reasoning about AI system design. Start with the OTel GenAI Semantic Conventions spec — it's the closest thing to a shared vocabulary for this space.
| Tool | GitHub | Stars | License |
|---|---|---|---|
| Langfuse | langfuse/langfuse | 24K+ | MIT |
| LiteLLM | BerriAI/litellm | — | MIT |
| Portkey AI Gateway | Portkey-AI/gateway | — | MIT |
| Helicone | Helicone/helicone | — | Apache 2.0 |
| Helicone AI Gateway | Helicone/ai-gateway | — | Open Source |
| Temporal | temporalio/temporal | — | MIT |
| Inngest | inngest/inngest | — | SSPL + Apache 2.0 |
| immudb | codenotary/immudb | — | BSL 1.1 |
| Open Policy Agent | open-policy-agent/opa | — | Apache 2.0 |
| ClickHouse | ClickHouse/ClickHouse | — | Apache 2.0 |
| Apache Kafka | apache/kafka | — | Apache 2.0 |
OTel GenAI Semantic Conventions — Emerging standard for LLM telemetry ↗
OTel GenAI Metrics Spec — Token usage, latency, and cost metrics ↗
OTel GenAI Agent Spans — Agent-level tracing conventions ↗
OpenLLMetry Semantic Conventions — Extended GenAI conventions by Traceloop ↗
OTel GenAI Span YAML — Machine-readable spec ↗
Langfuse Self-Hosting Guide — Architecture and deployment options ↗
Langfuse OTel Integration — OTel spans to Langfuse data model ↗
LiteLLM Proxy Docs — Gateway setup with spend tracking ↗
LiteLLM + Langfuse Integration — Combined gateway + observability ↗
Portkey Open Source Docs — Gateway, pricing DB, community ↗
immudb Documentation — Immutable database setup and SDKs ↗
OPA Documentation — Policy-as-code setup and integration ↗
Temporal Documentation — Durable execution platform ↗
For teams designing their own AI Token Ledger, the minimum viable record should capture:
event_id — Unique identifier for this token transaction agent_id — Which agent (or agent chain) made the call human_id — Which human initiated or is responsible workflow_id — Which workflow / session this belongs to tokens_in — Input tokens consumed tokens_out — Output tokens generated cost — Calculated cost in currency model — Which model was called provider — Which provider served the request tools_used — Which tools / functions were invoked timestamp — When it happened (UTC) trace_id — OpenTelemetry trace ID for full context
We publish twice-weekly AI briefings covering tools, frameworks, and developments that matter for South African businesses. No hype, no fluff — just what's useful and what's not.
Content validated March 2026. OpenTelemetry is a trademark of the Cloud Native Computing Foundation. Langfuse, LiteLLM, Helicone, Portkey, Temporal, Inngest, immudb, ClickHouse, Apache Kafka, and Open Policy Agent are trademarks of their respective owners. This is an independent educational explainer by Imbila.AI.
Give this URL to any AI agent — it can fetch the full explainer as structured markdown, no Cloudflare blocking.