Goodbye Cloud Costs, Hello Local AI: The n8n + Ollama Setup Guide -

Introduction

Pay-per-token bills got you flinching every time a prompt runs? You’re not alone. The combo of n8n for automation and Ollama for on-device LLMs gives you private, fast, and (after hardware) essentially zero-marginal-cost AI. In this guide, you’ll spin up Ollama locally, wire it into n8n’s AI nodes, and build a practical workflow (chat + RAG) that never leaves your machine. We’ll cover models that work well today, an OpenAI-compatible route (handy for existing n8n nodes), embeddings, and common gotchas like ports, model names, and token limits. By the end, you’ll have a share-worthy local AI stack that feels cloud-grade without the cloud tax. (Ollama’s OpenAI-compatible API is officially “experimental,” but it works great for most n8n use cases.) (Ollama Documentation)

Why n8n + Ollama Right Now

Running LLMs locally brings three big wins:

Cost control: No per-token fees. Your GPU/CPU does the work.
Privacy: Data never leaves your box—huge for sensitive automations.
Velocity: n8n’s AI nodes + Ollama’s local models let you prototype fast, then scale carefully.

Ollama ships an OpenAI-compatible endpoint (/v1/*) so tools expecting OpenAI can talk to your local model. This is perfect for mixing n8n’s OpenAI-style nodes with your local backend. Just remember it’s experimental and occasionally changes; if you need every feature, Ollama’s native REST API is the source of truth. (Ollama Documentation)

What You’ll Build

Local LLM chat inside n8n using Ollama (e.g., Llama 3.1 or Mistral).
Local RAG using n8n’s Embeddings Ollama node to vectorize your files and answer questions with citations. (n8n Docs)

We’ll show both:

OpenAI-compatible path (point n8n’s OpenAI nodes to Ollama), and
Native Ollama nodes (Embeddings + Model).

Prerequisites

A machine with recent CPU; for best results, an NVIDIA GPU with recent drivers.
Docker (recommended) or native installs.
n8n 1.x+ and Ollama latest.

Step 1 — Run Ollama Locally

Docker (recommended):

docker run -d --name ollama \
  -p 11434:11434 \
  -v ollama:/root/.ollama \
  --gpus all \
  ollama/ollama

Then pull a model:

# Examples: Llama 3.1 8B, Mistral 7B
ollama pull llama3.1
ollama pull mistral

List installed models:

ollama list

The default API listens on http://localhost:11434. Ollama also exposes OpenAI-compatible endpoints under /v1. (GitHub)

Quick API smoke test (chat completions):

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1",
    "messages": [{"role":"user","content":"Say hello from local AI"}]
  }'

OpenAI-compat is experimental; if you hit quirks, prefer Ollama’s native API. (Ollama Documentation)

Step 2 — Run n8n

Docker:

docker run -d --name n8n \
  -p 5678:5678 \
  -v n8n_data:/home/node/.n8n \
  --restart unless-stopped \
  n8nio/n8n

Open http://localhost:5678 to access the UI.

Step 3 — Connect n8n to Ollama (Two Paths)

Path A: Use n8n’s OpenAI-Style Nodes with Ollama (OpenAI-Compatible)

This is handy when you already built workflows around OpenAI nodes.

In n8n → Credentials → OpenAI, set:
- Base URL: http://localhost:11434/v1
- API Key: any non-empty string (Ollama ignores it)
In your OpenAI Chat or OpenAI nodes, set Model to the exact name you pulled (e.g., llama3.1, mistral).

This works because Ollama speaks a subset of OpenAI’s API. Just note some params differ (e.g., legacy /v1/completions vs chat), and not all features are 1:1. If a node expects OpenAI-only features (like Assistants), use native Ollama nodes instead. (Ollama)

Common issue: “Can’t connect to OpenAI with custom base URL.”
Solution: ensure base is /v1, model name matches, and the Ollama server is reachable at localhost:11434. Community threads confirm the pattern. (n8n Community)

Path B: Use n8n’s Native Ollama Nodes

n8n ships dedicated Ollama AI nodes (and an Embeddings Ollama node) so you can stay native and avoid OpenAI-compat edge cases.

Embeddings Ollama → generate vectors locally.
Ollama Model (message/completion sub-nodes) → run prompts against local models.
Docs & integration pages outline params and model names. (n8n Docs)

Step 4 — Build a Private Chatbot in n8n (Local LLM)

Minimal flow:

Webhook (GET/POST) receives prompt.
Ollama Model (Message) node:
- Model: llama3.1 (or your pick)
- System Prompt: “You are a helpful assistant. Keep replies concise.”
- User Message: {{$json.prompt}}
Respond to Webhook with the model’s text.

OpenAI-compatible variant: replace step 2 with OpenAI Chat node pointing at Ollama’s /v1 as above. (Ollama)

Step 5 — Local RAG with Embeddings Ollama

Read Binary Files (PDFs, docs).
Split Text (n8n text ops, eg 1–2k chars with overlap).
Embeddings Ollama node:
- Model: embeddings-capable variant (e.g., mxbai-embed-large or other embeddings models available via Ollama)
Store vectors in your DB or a lightweight vector store (SQLite table with cosine distance, or a plugin you prefer).
Retriever step → Top-k chunks.
Ollama Model (Message) with a prompt that includes retrieved context (“Answer using only the CONTEXT…”).

n8n’s docs cover the Embeddings Ollama node and how sub-nodes handle parameters across items. (n8n Docs)

Recommended Local Models (2025 Snapshot)

General chat: Llama 3.1 8B/70B (choose to fit your VRAM), Mistral 7B Instruct for speed.
Coding: Code Llama or newer code-tuned variants.
Embeddings: mxbai family or text-embedding models available via Ollama library.
Check the Ollama Library for current tags and sizes; pull what fits your hardware. (Ollama)

Pro tip: Start with 7–9B models for responsiveness, then graduate to bigger models if quality demands it.

Advanced Techniques

Function Calling & Tools

Recent community tests highlight strong tool-use with modern models (e.g., Llama 3.1, Mistral). When building tool-use flows in n8n, keep tool schemas compact and validate model outputs. (Model quality varies—benchmark in your own setup.) (Collabnix)

Streaming & Latency

n8n nodes that support streaming pair nicely with Ollama’s token streaming for snappy UX. If your flow components don’t stream, buffer chunks with an n8n Function node.

Multi-model Routing

Use a small model for quick classification, then route “hard” queries to a bigger local model. n8n’s branching makes this trivial.

Common Pitfalls & Fixes

Port/Network: Ollama must be reachable at http://localhost:11434. If you’re running n8n in Docker on another container, put both on the same Docker network and use the container name (http://ollama:11434). (n8n Community)
Model names: Use the exact name from ollama list (e.g., llama3.1). Mismatches cause 404s. (GitHub)
OpenAI-compat quirks: Some params/endpoints differ (and may change). If a node fails oddly, try the native Ollama nodes. (Ollama Documentation)
Context window: Smaller models have smaller windows. Chunk and retrieve; don’t paste books into a single prompt.
GPU memory: Bigger models need more VRAM. Start with 7B if you’re unsure. Ollama Library lists sizes. (Ollama)

Practical Example: n8n RAG Workflow (JSON Import)

Paste into n8n → Import from URL/Clipboard, then set your own file paths and DB step.

{
  "nodes": [
    {
      "id": "webhook",
      "name": "Ask (Webhook)",
      "type": "n8n-nodes-base.webhook",
      "parameters": { "httpMethod": "POST", "path": "ask-local" },
      "typeVersion": 1,
      "position": [260, 240]
    },
    {
      "id": "split",
      "name": "Split to Chunks",
      "type": "n8n-nodes-base.function",
      "parameters": {
        "functionCode": "const text = items[0].json.text || '';\nconst size = 1500, overlap = 200;\nconst chunks = [];\nfor (let i=0;i<text.length;i+= (size-overlap)) {\n  chunks.push({ chunk: text.slice(i, i+size) });\n}\nreturn chunks.map(c => ({ json: c }));"
      },
      "typeVersion": 2,
      "position": [620, 240]
    },
    {
      "id": "embed",
      "name": "Embeddings (Ollama)",
      "type": "n8n-nodes-langchain.embeddingsOllama",
      "parameters": {
        "model": "mxbai-embed-large",
        "text": "={{$json.chunk}}"
      },
      "typeVersion": 1,
      "position": [880, 240],
      "credentials": { "ollamaApi": "Ollama Local" }
    },
    {
      "id": "retrieve",
      "name": "Top-k Retrieve",
      "type": "n8n-nodes-base.function",
      "parameters": {
        "functionCode": "// TODO: Replace with your vector DB lookup.\n// This stub just echoes the first few chunks for demo.\nreturn items.slice(0,3);"
      },
      "typeVersion": 2,
      "position": [1140, 240]
    },
    {
      "id": "llm",
      "name": "Answer (Ollama Model)",
      "type": "n8n-nodes-langchain.llmMessageOllama",
      "parameters": {
        "model": "llama3.1",
        "systemMessage": "You are a concise expert. Cite facts from CONTEXT.",
        "userMessage": "Question: {{$json.question}}\\n\\nCONTEXT: {{ $items(\"retrieve\").map(i => i.json.chunk).join(\"\\n---\\n\") }}"
      },
      "typeVersion": 1,
      "position": [1400, 240],
      "credentials": { "ollamaApi": "Ollama Local" }
    },
    {
      "id": "respond",
      "name": "Respond",
      "type": "n8n-nodes-base.respondToWebhook",
      "parameters": { "responseBody": "={{$json.text || $json}}"},
      "typeVersion": 1,
      "position": [1660, 240]
    }
  ],
  "connections": {
    "Ask (Webhook)": { "main": [[{ "node": "Split to Chunks", "type": "main", "index": 0 }]] },
    "Split to Chunks": { "main": [[{ "node": "Embeddings (Ollama)", "type": "main", "index": 0 }]] },
    "Embeddings (Ollama)": { "main": [[{ "node": "Top-k Retrieve", "type": "main", "index": 0 }]] },
    "Top-k Retrieve": { "main": [[{ "node": "Answer (Ollama Model)", "type": "main", "index": 0 }]] },
    "Answer (Ollama Model)": { "main": [[{ "node": "Respond", "type": "main", "index": 0 }]] }
  }
}

Swap the “Top-k Retrieve” stub for your vector DB (Postgres+pgvector, SQLite+cosine, etc.). The important part is Embeddings Ollama → store vectors → Answer (Ollama Model) with retrieved context. (n8n Docs)

Troubleshooting FAQ

The OpenAI node won’t connect to my local endpoint.
Confirm Base URL = http://localhost:11434/v1 and that your model name exists in ollama list. If n8n runs in Docker, use the container hostname (http://ollama:11434). (GitHub)

Responses differ from OpenAI.
That’s normal—models differ. Choose the best local model for your task and tune prompts. (n8n Community)

Which models are current “safe bets”?
Llama 3.1 (8B/70B), Mistral 7B for speed, code-tuned variants for dev tasks; check Ollama Library for fresh tags. (Ollama)

Conclusion

Local AI is no longer a hack—it’s a strategy. With n8n as your automation brain and Ollama as your on-device LLM engine, you get privacy, speed, and freedom from cloud bills.

Key takeaways:

Point n8n’s OpenAI nodes at http://localhost:11434/v1 or use native Ollama nodes. (Ollama)
Use Embeddings Ollama for private RAG. (n8n Docs)
Start with smaller models; scale up as needed. (Ollama)

Next steps: wire this into a real workflow—support inbox triage, on-prem knowledge search, or code review summaries—then benchmark models and iterate.

What will you build first with your new local AI stack?

Sources & Further Reading:

Ollama OpenAI-compatible API overview and blog. (Ollama Documentation)
n8n Embeddings Ollama node docs & integration page. (n8n Docs)
Ollama Library (models & tags) and GitHub. (Ollama)
n8n OpenAI node docs and community threads on OpenAI-compatible setups. (n8n Docs)

Goodbye Cloud Costs, Hello Local AI: The n8n + Ollama Setup Guide

Byadmin

Introduction

Why n8n + Ollama Right Now

What You’ll Build

Prerequisites

Step 1 — Run Ollama Locally

Step 2 — Run n8n

Step 3 — Connect n8n to Ollama (Two Paths)

Path A: Use n8n’s OpenAI-Style Nodes with Ollama (OpenAI-Compatible)

Path B: Use n8n’s Native Ollama Nodes

Step 4 — Build a Private Chatbot in n8n (Local LLM)

Step 5 — Local RAG with Embeddings Ollama

Recommended Local Models (2025 Snapshot)

Advanced Techniques

Function Calling & Tools

Streaming & Latency

Multi-model Routing

Common Pitfalls & Fixes

Practical Example: n8n RAG Workflow (JSON Import)

Troubleshooting FAQ

Conclusion

Like this:

Related

By admin

Related Post

AI-Ready APIs: The Practical, LLM-Friendly Playbook

Leave a ReplyCancel reply

You missed

AI-Ready APIs: The Practical, LLM-Friendly Playbook

Mastering IoT Data Management with InfluxDB 3

The 2025 Python Toolchain Shift: uv + Ruff + PEP 723 (Complete Guide)

Goodbye Cloud Costs, Hello Local AI: The n8n + Ollama Setup Guide

Goodbye Cloud Costs, Hello Local AI: The n8n + Ollama Setup Guide

Byadmin

Introduction

Why n8n + Ollama Right Now

What You’ll Build

Prerequisites

Step 1 — Run Ollama Locally

Step 2 — Run n8n

Step 3 — Connect n8n to Ollama (Two Paths)

Path A: Use n8n’s OpenAI-Style Nodes with Ollama (OpenAI-Compatible)

Path B: Use n8n’s Native Ollama Nodes

Step 4 — Build a Private Chatbot in n8n (Local LLM)

Step 5 — Local RAG with Embeddings Ollama

Recommended Local Models (2025 Snapshot)

Advanced Techniques

Function Calling & Tools

Streaming & Latency

Multi-model Routing

Common Pitfalls & Fixes

Practical Example: n8n RAG Workflow (JSON Import)

Troubleshooting FAQ

Conclusion

Share this:

Like this:

Related

By admin

Related Post

AI-Ready APIs: The Practical, LLM-Friendly Playbook

Leave a ReplyCancel reply

You missed

AI-Ready APIs: The Practical, LLM-Friendly Playbook

Mastering IoT Data Management with InfluxDB 3

The 2025 Python Toolchain Shift: uv + Ruff + PEP 723 (Complete Guide)

Goodbye Cloud Costs, Hello Local AI: The n8n + Ollama Setup Guide