NPS-Release

English 中文版

Cognon Budget Specification

Version: 0.6 Date: 2026-05-14


1. Overview

The Cognon Budget mechanism lets an agent declare a maximum token-consumption cap for a given request. A node uses this cap to trim response fields, limit the number of records returned, or reject over-budget requests.

To address the differences in how each LLM counts tokens, NPS introduces Cognon (CGN) as a standardized unit of measure.


2. Cognon (CGN)

2.1 Definition

CGN is the standard token-accounting unit inside the NPS protocol suite. Native tokens from each LLM are converted to CGN through exchange rates.

CGN is defined in two named profiles with non-overlapping conformance requirements (issue #40). Every CGN value carried on the wire MUST be unambiguously associated with exactly one profile; counterparties MUST NOT mix the two.

Profile Purpose Used by
CGN-Estimate Estimation, budget hints, telemetry, sampling-tolerant flows X-NWP-Budget enforcement, CapsFrame token_est, push-stream per-event cgn_est reporting
CGN-Billing Commercial settlement, dispute and chargeback handling Invoiced metering and signed accounting records exchanged between counterparties

CGN-Estimate (estimation-grade)

CGN-Billing (settlement-grade)

A node that emits CGN-Billing records MUST satisfy all of the following:

A node that issues a CGN-denominated commercial charge MUST mark it as CGN-Billing on the wire and in headers (see §4.2). Charges presented as CGN-Estimate, or with no profile marker, are non-conformant for settlement and MAY be disputed by the counterparty without reference to a tokenizer trust tier.

2.2 Default Calculation (Fallback) — CGN-Estimate only

When the tokenizer cannot be determined, CGN-Estimate MAY fall back to:

CGN = ceil(UTF-8_bytes / 4)

This formula reflects the average behavior of mainstream LLM tokenizers (≈ 4 bytes/token for English, ≈ 3 bytes/token for Chinese) and acts as the most conservative baseline.

The byte-size fallback MUST NOT be used for CGN-Billing under any circumstances. If a node cannot resolve a verified_tokenizer for a request that would be billed, it MUST refuse to issue a CGN-Billing record for that request and either (a) downgrade the surface to CGN-Estimate (non-billable telemetry only) or (b) reject the request with a billing-class error.

2.3 Canonical Conversion Profile (CGN v1)

The canonical model-token conversion algorithm is cgn.v1:

CGN = ceil(((input_tokens * input_weight)
          + (output_tokens * output_weight)
          + (thinking_tokens * thinking_weight))
          * model_coefficient / scale)

All missing token classes are treated as 0. The result is a uint32. The default weights are input_weight = 1, output_weight = 4, thinking_weight = 2, scale = 1000, and model_coefficient = 1.

The machine-readable source of truth for provider/model coefficients, unknown-model behavior, and conformance vectors is cgn-profiles.yaml. It currently defines profiles for DeepSeek chat/reasoner, OpenAI general/reasoning models, Anthropic Haiku/Sonnet/Opus classes, Ollama-local models, and a default unknown fallback.

Unknown providers or model identifiers MUST use default.unknown for CGN-Estimate, SHOULD emit cgn_profile_defaulted telemetry, and MUST NOT be used for CGN-Billing. Operators MAY override model-pattern mappings locally, but such overrides MUST carry a different profile id or version so counterparties can distinguish them from the canonical table.

Profile applicability. cgn.v1 and cgn-profiles.yaml are normative for CGN-Estimate and tolerate the documented ±5 % drift between table values and the model’s native count. For CGN-Billing, both counterparties MUST agree on a specific profile version (pinned at session-start time or earlier and recorded inside the signed metering record) and MUST use the matching verified_tokenizer-derived native count; the ±5 % envelope does NOT apply. default.unknown applies only to CGN-Estimate; CGN-Billing has no fallback row.


3. Tokenizer Resolution Chain

When an agent issues a request, the node resolves the tokenizer in this order:

1. Explicit declaration by the agent (X-NWP-Tokenizer header)
   ↓ not declared
2. Auto-match from agent configuration / IdentFrame
   ↓ match failed
3. Default calculation (UTF-8 bytes / 4)

3.1 Explicit Declaration (highest priority)

The agent declares its tokenizer in the request header:

X-NWP-Tokenizer: cl100k_base

Node MUST recognize the declared tokenizer and use the corresponding algorithm to count tokens. If the node does not support that tokenizer, it SHOULD fall back to auto-match.

3.2 Auto-Match

The node infers the agent’s model family from IdentFrame metadata:

When either field is present in the IdentFrame, the node uses the matching tokenizer.

Estimation-only caveat (normative — issue #39). Both metadata.model_family and metadata.tokenizer reach the node as declared_tokenizer under the three-tier tokenizer trust model defined in NPS-3-NIP §5.1 — Trust boundary for unsigned metadata. The X-NWP-Tokenizer request header in §3.1 carries the same trust class. Auto-matched values from this section MUST be treated as estimation hints only and MUST NOT drive billing, settlement, quota elevation, reputation scoring, or any security-relevant decision. Settlement-grade and policy-grade flows MUST instead consume a verified_tokenizer (CA- or platform-attested) signal, falling back to observed_tokenizer_profile only for Node-internal abuse detection. A Node that bills or grants elevated quota off a declared_tokenizer is non-conformant.

3.3 Default Fallback

When the tokenizer cannot be determined, use ceil(UTF-8_bytes / 4) to compute CGN.


4. Request & Response

4.1 Request Headers

Header Required Description
X-NWP-Budget optional Maximum CGN budget (uint32)
X-NWP-Tokenizer optional Tokenizer identifier used by the agent

4.2 Response Headers

Header Profile Description
X-NWP-Tokens CGN-Estimate Actual CGN consumed by this response (estimation-grade)
X-NWP-Tokens-Native CGN-Estimate Native token consumption for this response (when the tokenizer is known)
X-NWP-Tokenizer-Used both Tokenizer identifier actually used by the node
X-NWP-Tokens-Profile both Either estimate or billing. Absent or estimate MUST be treated as CGN-Estimate by the counterparty.
X-NWP-Billing-Record CGN-Billing Reference (URI or content-hash) to the signed metering record for this response. MUST be present iff the response is billed under CGN-Billing.
X-NWP-Billing-Tokenizer-Tier CGN-Billing MUST be verified_tokenizer. Absent → not billable.

A response that omits both X-NWP-Billing-Record and X-NWP-Billing-Tokenizer-Tier MUST be interpreted by the counterparty as CGN-Estimate, regardless of any commercial agreement; nodes MUST NOT settle off CGN-Estimate-only responses.

4.3 Over-Budget Handling

When the response would exceed X-NWP-Budget:

  1. Node SHOULD trim the response first (fewer fields or records) to fit within budget.
  2. If trimming is impossible (e.g. a single record already exceeds budget), node MUST return a NWP-BUDGET-EXCEEDED error.
  3. Node MUST NOT silently truncate structured data (truncation can produce incomplete structures on the agent side).

5. Token Estimate in CapsFrame

The token_est field in a CapsFrame is in CGN:

{
  "frame": "0x04",
  "anchor_ref": "sha256:...",
  "count": 2,
  "data": [...],
  "token_est": 180,
  "tokenizer_used": "cl100k_base"
}

6. Implementation Notes

6.1 General

6.2 CGN-Estimate

6.3 CGN-Billing


7. Node-Operator CGN Limit (cgn_limit)

While X-NWP-Budget is an agent-declared per-request cap, cgn_limit is the node operator’s server-side cap on the CGN a single request may consume. Both caps exist independently; the effective budget for any request is:

effective_budget = min(cgn_limit, X-NWP-Budget)   // 0 means unlimited

If X-NWP-Budget is absent, effective_budget = cgn_limit. If cgn_limit is 0 (the default), only the agent-supplied X-NWP-Budget applies.

7.1 NWM declaration

Nodes that set cgn_limit > 0 MUST publish it in the NWM under token_budget.cgn_limit so agents can discover the cap before sending requests:

{
  "token_budget": {
    "cgn_limit": 5000,
    "profile": "cgn.v1"
  }
}

7.2 Enforcement in AnchorNodeMiddleware

AnchorNodeOptions.CgnLimit (uint32, default 0) sets the per-request node cap. The middleware enforces it by:

  1. Reading X-NWP-Budget from the request header (agent cap, may be absent).
  2. Computing effective_budget = cgn_limit > 0 ? min(cgn_limit, x_nwp_budget_or_max) : x_nwp_budget_or_max.
  3. Passing effective_budget to the response builder and CGN-Estimate accumulator.
  4. If the CGN tally of the response would exceed effective_budget: trim first (fewer fields / records); if trimming is impossible, return NWP-CGN-LIMIT-EXCEEDED (HTTP 400, NPS status NPS-CLIENT-REQUEST-TOO-LARGE).

7.3 CGN-Estimate vs. CGN-Billing enforcement

Profile cgn_limit behaviour
CGN-Estimate Advisory: node SHOULD trim; MAY exceed if trimming is impossible and the overage is flagged in X-NWP-Tokens.
CGN-Billing Strict: node MUST NOT emit a response that exceeds effective_budget; NWP-CGN-LIMIT-EXCEEDED is mandatory on overage.

7.4 Error code

Error Code HTTP Status NPS Status Description
NWP-CGN-LIMIT-EXCEEDED 400 NPS-CLIENT-REQUEST-TOO-LARGE Response would exceed the effective CGN budget (min(cgn_limit, X-NWP-Budget)); trimming was not possible. Response body SHOULD include effective_budget and estimated_cgn.

8. Streaming and Subscription Budget Policy

The X-NWP-Budget cap applies to synchronous request/response operations (QueryFrame → CapsFrame / StreamFrame batch). The following continuous-push operations are subject to modified rules:

7.1 Streaming Queries (QueryFrame stream: true)

7.2 SubscribeFrame / Push Streams (topology.stream, event subscriptions)

Long-running push streams (e.g. topology.stream via SubscribeFrame) represent an ongoing series of events with no fixed response size. Budget semantics differ:

Aspect Behavior
X-NWP-Budget enforcement Not applied by the node; push events are generated independently of any per-request budget cap
X-NWP-Tokens reporting The node SHOULD include this header on each push event (DiffFrame) reporting the CGN for that event’s payload
Agent-side enforcement The Agent is responsible for tracking cumulative CGN across events and disconnecting when its session budget is exhausted

Rationale: Enforcing X-NWP-Budget on push streams would require the node to buffer future events, which is incompatible with real-time topology change delivery. Agent-side enforcement is the correct locus for subscription-stream budget control.


Copyright: LabAcacia / INNO LOTUS PTY LTD · Apache 2.0