Token Metering

Token metering is the practice of measuring AI inference usage at the token level — counting individual input tokens (the prompt) and output tokens (the generated response) separately. This granular metering enables usage-based pricing models where customers are billed based on their actual compute consumption rather than flat subscription fees.

Token metering is the foundation of AI inference billing. Because different requests consume vastly different amounts of compute (a 10-token prompt vs. a 10,000-token document), flat-rate pricing is inefficient. Token-level metering ensures operators charge fairly based on actual resource consumption.

Accurate token metering requires counting tokens at the model serving layer before billing. The metering system must handle concurrent requests from multiple tenants, aggregate usage per billing period, and provide real-time visibility to both operators and tenants.

Hoonify AI's platform meters every request at the token level with separate input and output token counts. Operators configure per-token pricing in the admin portal, and tenants see their real-time usage in the tenant portal.

See how token metering works in practice.

Explore the Platform

Token Metering

Related Terms