FinOps Guard
Control token costs and prevent oversized responses from overwhelming your AI model's context window. Define global limits that apply to every new server.
Access via Settings → FinOps Guard in Vinkius Cloud.
Why FinOps matters
A single verbose API endpoint can return payloads that consume 10,000+ tokens — a significant portion of a model's context window. Without guardrails, one tool call can crowd out the conversation history and degrade response quality. FinOps Guard ensures no single response dominates the context budget.
Global FinOps configuration
Response Limits
Max Array Items
Controls the maximum number of items returned in array responses. When an API returns a list with 500 records, FinOps truncates it to the configured limit (default: 50) and appends a count notice.
| Value | Use case |
|---|---|
| 5–20 | Tight context budgets, small model context windows |
| 50 (default) | Balanced — enough data for the AI without overwhelming the context |
| 100–500 | Large context windows, APIs where complete lists matter |
OOM Shield
An absolute byte ceiling for response bodies. Any response exceeding this limit is truncated before reaching the AI model.
The slider ranges from 100 KB to 50 MB with a default of 2 MB. This protects against:
- APIs that return entire file contents in response bodies
- Unfiltered database dumps from misconfigured endpoints
- Binary payloads accidentally returned as base64
WARNING
The OOM Shield is your last line of defense against runaway responses. If disabled, a single 50MB response could consume your entire context window budget and cause out-of-memory errors in the runtime.
TOON Compression
Shared with the MCP Defaults tab. When enabled here, TOON compression applies globally. See the MCP Defaults page for details on compatibility and behavior.
Inheritance
FinOps settings follow the global → per-server inheritance model:
- Configure here — Set your preferred limits globally
- New servers inherit — Every new server starts with these defaults
- Override per-server — Adjust limits from each server's detail page
- Existing servers unaffected — Changing global defaults does not overwrite existing per-server settings
Next steps
Frequently Asked Questions
What does FinOps Guard protect against?
FinOps Guard prevents oversized API responses from overwhelming your AI model's context window. Without limits, a single verbose endpoint can return 10,000+ tokens worth of data — crowding out conversation history and degrading response quality. FinOps sets array ceilings and byte limits to keep every response within budget.
What is the OOM Shield?
OOM Shield is an absolute byte ceiling for response bodies. The slider ranges from 100 KB to 50 MB (default: 2 MB). Any response exceeding this limit is truncated before reaching the AI model. It protects against file contents in response bodies, unfiltered database dumps, and base64-encoded binary payloads.
How does Max Array Items work?
When an API returns a list with hundreds of records, FinOps truncates the array to the configured limit (default: 50 items, range: 5–500) and appends a count notice. The AI model receives enough data to reason effectively without wasting context window space on excess records.
Does FinOps Guard stack with TOON compression?
Yes. FinOps Guard handles response payload truncation, while TOON compression handles tool description optimization. Together, they can reduce total token consumption by 50-70% — FinOps controls the data volume, TOON controls the schema overhead.
Can I disable FinOps Guard for specific servers?
Yes. FinOps Guard follows the global-to-per-server inheritance model. Each server can have its own Max Array Items, OOM Shield, and TOON compression settings. Override them from the server's Edit modal without affecting other servers.
How much token savings does FinOps typically provide?
Combined with TOON compression, FinOps Guard typically saves 30-50% on token consumption per request. The exact savings depend on the verbosity of your upstream API responses and the configured array limits.