Run AI on Your Own Computer With Lemonade Server

June 10, 2026 · 4 min read · AI Developer Tools Lemonade

You do not need a cloud subscription to run AI. You do not need to send your questions to a server in another city, agree to any terms about your data, or pay per token.

If your computer has a modern graphics card — or even just a recent processor — you can run a capable AI model right there on your own hardware. Lemonade Server is the piece that makes this straightforward: it handles the download, figures out how to use your hardware, and serves everything through a standard interface that AI tools already understand.

Here is how to go from zero to a working local AI in three steps.

What You Need

Requirement Details
Operating system Windows, macOS, or Linux
RAM 8 GB minimum, 16 GB recommended
Storage 5–20 GB free for models
GPU (optional but faster) NVIDIA, AMD Radeon, or Apple Silicon
Time About 10 minutes (most of it is the model download)

No GPU? Not a problem. Lemonade falls back to CPU automatically. It is slower, but it works on any machine.

Step 1: Install Lemonade Server

On Windows, download and run the installer:

Download lemonade.msi

Run the .msi file like any other Windows application. Once installed, Lemonade starts a local server at http://localhost:13305 and adds itself to your system tray.

macOS users can grab the .pkg from the same releases page. Linux installation is covered in the official docs.

Step 2: Pull a Model

A model is the AI brain — a file, usually a few gigabytes, that your computer loads to do the actual thinking. Lemonade downloads and manages these for you.

Open a terminal and run:

lemonade pull Gemma-4-E2B-it-GGUF

This fetches a compact, capable model (around 2.5 GB) that runs comfortably on most hardware. Progress shows in the terminal. When it finishes, the model is stored locally — no re-downloading.

Want to see everything available?

lemonade list

Step 3: Start Chatting

Launch the browser-based chat interface:

lemonade launch claude

This opens a local web UI at http://localhost:13305 where you can chat with your model — exactly like a cloud AI, except the response never leaves your machine.

Or skip the UI and run a model directly from the terminal:

lemonade run Gemma-4-E2B-it-GGUF
Terminal — Lemonade Server
$ lemonade pull Gemma-4-E2B-it-GGUF
Downloading model... 2.5 GB
✓ Saved to local cache
$ lemonade launch claude
→ Chat UI at http://localhost:13305
$
Server running · Gemma-4-E2B-it-GGUF loaded · localhost:13305

What You Can Do With It

Once Lemonade is running, it speaks the same API language as the big cloud providers — which means tools you might already use can simply point at your own machine instead.

Use case How
Browser chat lemonade launch claude → opens local UI
Claude Code Set API base to http://localhost:13305/api/v1
Open WebUI Works as an Ollama-compatible server
AnythingLLM / Dify / n8n Same API base URL
Your own scripts Standard POST /api/v1/chat/completions
Image generation Stable Diffusion, runs locally
Speech-to-text Whisper, local transcription
Text-to-speech Local voice output

The multimodal features — image, speech, and voice — are covered in depth in the series below.

Which Hardware Does It Use?

Lemonade detects your hardware on install and picks the fastest available path. You do not configure any of this.

What you have How it runs
NVIDIA GPU CUDA — the fastest option
AMD Radeon RX 5000–9000 ROCm — full GPU acceleration
Any GPU (fallback) Vulkan — broad compatibility
AMD Ryzen AI (NPU) FLM — dedicated AI silicon
No GPU CPU — works everywhere, slower

Go Deeper

That is the quick start. If you want to understand why GPU acceleration for AI is more complicated than it sounds, why AMD cards were left out of local AI for so long, and what changed — the full series is below.

Read the series: Local AI on the Hardware You Already Own →


Lemonade Server is open-source. Source, releases, and full docs: github.com/lemonade-sdk/lemonade