LocalAI

LocalAI is a drop-in replacement for the OpenAI API that runs entirely on your infrastructure. Point any OpenAI-compatible app at your LocalAI endpoint and it works — same API format, local inference. It supports 50+ model families across text, image, audio, and embeddings. MCP tools integrate through the function calling API.

LocalAI

Open-Source · localai.io

TRANSPORT

Streamable HTTP ✓

PLATFORM

Docker · Self-hosted

MCP VIA

Function Calling API

Drop-In API Replacement with MCP

LocalAI's killer feature is API compatibility. Swap your OpenAI base URL from api.openai.com to your LocalAI instance, and existing applications work without code changes. This includes function calling — and by extension, MCP tool integration.

LocalAI supports more than just text. It handles image generation (Stable Diffusion), speech-to-text (Whisper), text-to-speech, and embeddings — all through the same OpenAI-compatible API. MCP tools add external data access to this multi-modal stack.

Capabilities:

OpenAI API drop-in — same endpoints, same format, local execution
50+ model families — LLMs, image gen, speech, embeddings, and more
Function calling — tool/function calling API for MCP integration
Docker deployment — docker run with GPU passthrough
Model gallery — in-app model browser and downloader
Multi-modal — text, image, audio, and embedding models in one API
No GPU required — CPU inference supported (GPU accelerates)

Setup

1. Create a Token

In Vinkius Cloud, go to your server → Connection Tokens → Create. Copy the URL.

2. Configure MCP

Add MCP server configuration to your LocalAI setup. Tools are exposed through the function calling API:

yaml

mcp_servers:
  - url: "https://edge.vinkius.com/{TOKEN}/mcp"

3. Use Existing Apps

Any application that calls the OpenAI function calling API now has MCP tool access — without knowing it's running locally.

FAQ

Can I replace OpenAI with LocalAI and keep MCP tools? Yes. LocalAI's function calling API is compatible with OpenAI's format. Existing applications work without code changes, and MCP tools integrate through the same interface.

Does LocalAI require a GPU? No. CPU inference is supported for all model types. GPU acceleration (CUDA, Metal, ROCm) is optional but significantly faster.

What modalities does LocalAI support? Text generation, image generation (Stable Diffusion), speech-to-text (Whisper), text-to-speech, and embeddings — all via the OpenAI API format.

Is LocalAI free? Open-source under MIT. Self-host at no cost.

LocalAI ​

Drop-In API Replacement with MCP ​

Setup ​

1. Create a Token ​

2. Configure MCP ​

3. Use Existing Apps ​

FAQ ​

LocalAI

Drop-In API Replacement with MCP

Setup

1. Create a Token

2. Configure MCP

3. Use Existing Apps

FAQ