MCP Defaults

Platform-wide defaults for how tools are exposed to AI clients. These settings apply to every new server — individual servers can override them.

Access via Settings → MCP Settings in Vinkius Cloud.

Tool Grouping

When you deploy an API with 30, 50, or 100+ endpoints, each endpoint becomes an MCP tool. This creates a problem: AI models have limited context windows, and listing all tools at once consumes a significant portion of that budget — sometimes leaving too little room for the actual conversation.

Tool Grouping solves this by collapsing related tools into navigable groups. Instead of seeing 50 individual tools, the AI model sees 5 groups and drills into the one it needs.

TOOL GROUPINGControls how tools are exposed to AI clients. Grouping reduces context window usage for large APIs.
Flat
Each endpoint becomes an individual MCP tool. Best for small APIs (< 20 tools).
Grouped
Related endpoints share one MCP tool with an action discriminator. Reduces token usage.
Auto (recommended)
Automatically groups when a server exceeds the threshold below.
Threshold20tools

Modes

Mode	Behavior	When to use
Flat	Every endpoint is its own MCP tool	APIs with fewer than 20 endpoints
Grouped	Tools are always collapsed by tag	APIs where you want to control discoverability
Auto (default)	Flat below the threshold, grouped above it	Most APIs — the platform adapts automatically

The grouping threshold (default: 20) controls when Auto mode switches from flat to grouped. Lower it if your AI model has a small context window; raise it if you want tools always visible.

TOON Token Compression

TOON (Tool Object Optimized Notation) is a compression protocol that rewrites tool descriptions and response payloads into a compact tabular format. In benchmarks, this reduces token consumption by 30-50% per request without losing information.

Instead of verbose JSON schema definitions, TOON encodes tools as structured tables that modern language models parse natively. The AI model receives the same semantic information in fewer tokens.

TOKEN COMPRESSIONTOON compresses tool descriptions into compact tabular format, reducing context window usage by ~30-50%.
Enable TOON compression
Works best with GPT-4, Claude 3.5+ and Gemini 2.0+.

Compatibility

TOON works with Claude 3.5+, GPT-4, and Gemini 2.0+. Older models or clients that expect raw JSON schemas may behave unpredictably — test before enabling in production.

Response Presenters

Response Presenters transform raw API responses into structured, AI-optimized formats. When an API returns deeply nested JSON with metadata, pagination cursors, and wrapper objects, Presenters extract the useful data and flatten it into a format that models reason about more effectively.

The result: higher-quality AI responses because the model receives cleaner input. Disable only if your AI client expects unprocessed responses.

Next steps

DLP

Data Shielding

Global PII redaction patterns.

Configure →FINOPS

FinOps Guard

Response limits and cost control.

Configure →

Frequently Asked Questions

What does Tool Grouping do?

Tool Grouping controls how your MCP tools are presented to AI clients. For large APIs (30+ endpoints), listing every tool individually consumes a significant portion of the AI model's context window. Grouping collapses related tools into navigable categories, reducing context overhead by an order of magnitude.

Which Tool Grouping mode should I use?

Use Auto (the default) for most APIs. It keeps tools flat when you have fewer than the threshold (default: 20) and automatically groups them above that number. Use Flat for small APIs under 20 tools, and Grouped when you want to manually control tool discoverability.

How much does TOON compression save?

In benchmarks, TOON reduces token consumption by 30-50% per request. It rewrites verbose JSON schema definitions into compact tabular format that modern language models parse natively — the AI model receives the same semantic information in fewer tokens.

Is TOON compression compatible with all AI models?

TOON works reliably with Claude 3.5+, GPT-4, and Gemini 2.0+. Older models or clients that expect raw JSON schemas may behave unpredictably. Always test in your environment before enabling in production.

What are Response Presenters?

Response Presenters transform raw API responses into structured, AI-optimized formats. They strip pagination cursors, metadata wrappers, and deeply nested objects — delivering cleaner input that improves the quality of AI model reasoning. Disable only if your client expects unprocessed responses.

Do MCP Defaults apply to existing servers?

No. These are creation-time defaults. Existing servers keep their current settings. You can update individual servers from their detail page at any time.

MCP Defaults ​

Tool Grouping ​

Modes ​

TOON Token Compression ​

Response Presenters ​

Next steps ​

Frequently Asked Questions ​

What does Tool Grouping do? ​

Which Tool Grouping mode should I use? ​

How much does TOON compression save? ​

Is TOON compression compatible with all AI models? ​

What are Response Presenters? ​

Do MCP Defaults apply to existing servers? ​