Skip to content

FinOps Guard

Control token costs and prevent oversized responses from overwhelming your AI model's context window. Define global limits that apply to every new server.

Access via Settings → FinOps Guard in Vinkius Cloud.


Why FinOps matters

A single verbose API endpoint can return payloads that consume 10,000+ tokens — a significant portion of a model's context window. Without guardrails, one tool call can crowd out the conversation history and degrade response quality. FinOps Guard ensures no single response dominates the context budget.

30-50%
TOKEN SAVINGS
5 – 500
ARRAY ITEMS RANGE
OOM Shield
PAYLOAD PROTECTION

Global FinOps configuration

GLOBAL FINOPS CONFIGURATION
Define default response limits and compression settings applied to every new server.

Response Limits

Max Array Items50 items
5Maximum number of items in array responses500
Max Payload SizeOOM SHIELD
100 KB2.0 MB50 MB

Compression

TOON Compression
Compress tool descriptions to reduce token overhead from schema definitions.

These defaults are inherited by new servers. Each server can override these settings individually from its detail page.


Response Limits

Max Array Items

Controls the maximum number of items returned in array responses. When an API returns a list with 500 records, FinOps truncates it to the configured limit (default: 50) and appends a count notice.

ValueUse case
5–20Tight context budgets, small model context windows
50 (default)Balanced — enough data for the AI without overwhelming the context
100–500Large context windows, APIs where complete lists matter

OOM Shield

An absolute byte ceiling for response bodies. Any response exceeding this limit is truncated before reaching the AI model.

The slider ranges from 100 KB to 50 MB with a default of 2 MB. This protects against:

  • APIs that return entire file contents in response bodies
  • Unfiltered database dumps from misconfigured endpoints
  • Binary payloads accidentally returned as base64

WARNING

The OOM Shield is your last line of defense against runaway responses. If disabled, a single 50MB response could consume your entire context window budget and cause out-of-memory errors in the runtime.


TOON Compression

Shared with the MCP Defaults tab. When enabled here, TOON compression applies globally. See the MCP Defaults page for details on compatibility and behavior.


Inheritance

FinOps settings follow the global → per-server inheritance model:

  1. Configure here — Set your preferred limits globally
  2. New servers inherit — Every new server starts with these defaults
  3. Override per-server — Adjust limits from each server's detail page
  4. Existing servers unaffected — Changing global defaults does not overwrite existing per-server settings

Next steps


Frequently Asked Questions

What does FinOps Guard protect against?

FinOps Guard prevents oversized API responses from overwhelming your AI model's context window. Without limits, a single verbose endpoint can return 10,000+ tokens worth of data — crowding out conversation history and degrading response quality. FinOps sets array ceilings and byte limits to keep every response within budget.

What is the OOM Shield?

OOM Shield is an absolute byte ceiling for response bodies. The slider ranges from 100 KB to 50 MB (default: 2 MB). Any response exceeding this limit is truncated before reaching the AI model. It protects against file contents in response bodies, unfiltered database dumps, and base64-encoded binary payloads.

How does Max Array Items work?

When an API returns a list with hundreds of records, FinOps truncates the array to the configured limit (default: 50 items, range: 5–500) and appends a count notice. The AI model receives enough data to reason effectively without wasting context window space on excess records.

Does FinOps Guard stack with TOON compression?

Yes. FinOps Guard handles response payload truncation, while TOON compression handles tool description optimization. Together, they can reduce total token consumption by 50-70% — FinOps controls the data volume, TOON controls the schema overhead.

Can I disable FinOps Guard for specific servers?

Yes. FinOps Guard follows the global-to-per-server inheritance model. Each server can have its own Max Array Items, OOM Shield, and TOON compression settings. Override them from the server's Edit modal without affecting other servers.

How much token savings does FinOps typically provide?

Combined with TOON compression, FinOps Guard typically saves 30-50% on token consumption per request. The exact savings depend on the verbosity of your upstream API responses and the configured array limits.