Skip to content

LocalAI

LocalAI is a drop-in replacement for the OpenAI API that runs entirely on your infrastructure. Point any OpenAI-compatible app at your LocalAI endpoint and it works — same API format, local inference. It supports 50+ model families across text, image, audio, and embeddings. MCP tools integrate through the function calling API.


LA
LocalAI
Open-Source · localai.io
TRANSPORT
Streamable HTTP ✓
PLATFORM
Docker · Self-hosted
MCP VIA
Function Calling API

Drop-In API Replacement with MCP

LocalAI's killer feature is API compatibility. Swap your OpenAI base URL from api.openai.com to your LocalAI instance, and existing applications work without code changes. This includes function calling — and by extension, MCP tool integration.

LocalAI supports more than just text. It handles image generation (Stable Diffusion), speech-to-text (Whisper), text-to-speech, and embeddings — all through the same OpenAI-compatible API. MCP tools add external data access to this multi-modal stack.

Capabilities:

  • OpenAI API drop-in — same endpoints, same format, local execution
  • 50+ model families — LLMs, image gen, speech, embeddings, and more
  • Function calling — tool/function calling API for MCP integration
  • Docker deploymentdocker run with GPU passthrough
  • Model gallery — in-app model browser and downloader
  • Multi-modal — text, image, audio, and embedding models in one API
  • No GPU required — CPU inference supported (GPU accelerates)

Setup

1. Create a Token

In Vinkius Cloud, go to your server → Connection TokensCreate. Copy the URL.

2. Configure MCP

Add MCP server configuration to your LocalAI setup. Tools are exposed through the function calling API:

yaml
mcp_servers:
  - url: "https://edge.vinkius.com/{TOKEN}/mcp"

3. Use Existing Apps

Any application that calls the OpenAI function calling API now has MCP tool access — without knowing it's running locally.


FAQ

Can I replace OpenAI with LocalAI and keep MCP tools? Yes. LocalAI's function calling API is compatible with OpenAI's format. Existing applications work without code changes, and MCP tools integrate through the same interface.

Does LocalAI require a GPU? No. CPU inference is supported for all model types. GPU acceleration (CUDA, Metal, ROCm) is optional but significantly faster.

What modalities does LocalAI support? Text generation, image generation (Stable Diffusion), speech-to-text (Whisper), text-to-speech, and embeddings — all via the OpenAI API format.

Is LocalAI free? Open-source under MIT. Self-host at no cost.