Build Your Private AI Powerhouse: Ollama + Open-WebUI + n8n with Docker

Want to leverage the power of Large Language Models (LLMs) and workflow automation without sending your data to third-party clouds? This guide will walk you through setting up a powerful, private, and local AI stack using Docker, featuring:

Ollama: Run open-source LLMs (like Llama3.1, Gemma3, Qwen 3, Deepseek R1, etc.) directly on your hardware.
Open-WebUI: A user-friendly web interface to chat with your local Ollama models.
n8n: A self-hosted workflow automation tool (like Zapier) to connect APIs and build intelligent automations, capable of leveraging your local Ollama instance.

All components will communicate over a dedicated Docker network for seamless integration.

Benefits:

Privacy: Your data and prompts stay entirely on your local machine.
Control: You choose the models, tools, and configurations.
Cost-Effective: Utilizes open-source software and your existing hardware.
Customization: Tailor models and workflows to your specific needs.

Let's dive in!

Prerequisites

Before starting, ensure you have the following installed and configured on your server (this guide assumes a Linux host, like Ubuntu):

Docker Engine: The core containerization platform. (Official Docker Install Guide)
Docker Compose: Useful for managing multi-container applications (often included with Docker Desktop or installed as a plugin). (Official Docker Compose Install Guide)
(For GPU Acceleration) Nvidia GPU: A CUDA-compatible Nvidia graphics card.
(For GPU Acceleration) Nvidia Drivers: Appropriate drivers installed for your Linux distribution. Verify with:

nvidia-smi

(For GPU Acceleration) Nvidia Container Toolkit: Allows Docker containers to access the host's GPU. Install it by following the official guide:

Add NVIDIA package repositories:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Update package list and install:

sudo apt-get update

sudo apt-get install -y nvidia-container-toolkit

Configure Docker runtime and restart Docker:

sudo nvidia-ctk runtime configure --runtime=docker

sudo systemctl restart docker

Verify toolkit (optional but recommended):

docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

Step 1: Create a Dedicated Docker Network

We'll create a custom bridge network so our containers can easily find and communicate with each other using their service names.

docker network create ollama-net

This creates a network named ollama-net.

Step 2: Set Up Ollama Container

Now, let's run the Ollama server container, enabling GPU access and connecting it to our network.

docker run -d \
  --gpus all \
  --network ollama-net \
  -v ollama_data:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  --restart unless-stopped \
  ollama/ollama:latest

Let's break down this command:

-d: Run the container in detached mode (in the background).
--gpus all: Crucial for GPU acceleration. Makes the host Nvidia GPUs available to the container via the Nvidia Container Toolkit. Omit this if you only have CPU.
--network ollama-net: Attaches the container to our custom network.
-v ollama_data:/root/.ollama: Creates a named Docker volume ollama_data and mounts it inside the container where Ollama stores its downloaded models. This ensures your models persist even if the container is removed and recreated.
-p 11434:11434: Publishes Ollama's API port (11434) to the same port on your host server, making it accessible from your local network.
--name ollama: Assigns a convenient name to the container.
--restart unless-stopped: Ensures the container automatically restarts if it crashes or the server reboots.
ollama/ollama:latest: Specifies the official Ollama image to use.

Verify Ollama is running:

docker ps

You should see the ollama container listed as "Up".

Step 3: Pull Initial LLM Models

With Ollama running, we can now download some versatile local models. We'll use docker exec to run commands inside the ollama container. Here, we'll grab relatively lightweight (<10B parameters) but powerful models.

Qwen3

8b parameters, ~5.2GB

docker exec -it ollama ollama pull qwen3

This is probably going to be the best all-around model you can use, as of the time of this writing. It's multimodal, and supports agentic use and tool-calling, supports both thinking and non-thinking modes, and has a 128k token context window.

Below are some other models I've had good overall success with in the past as well.

DeepSeek-R1

7b parameters, ~4.7GB

docker exec -it ollama ollama pull deepseek-r1

Gemma3

4b parameters, ~3.3GB

docker exec -it ollama ollama pull gemma3

Llama3.1

8b parameters, ~4.9GB

docker exec -it ollama ollama pull llama3.1

Depending on your hardware, you may be able to scale up or down on the size of the model you can use, but I've found these tend to work admirably well.

Note: Model availability and tags can change. Check the Ollama Library for the latest models and specific tags (like different sizes or fine-tunes). Pulling models can take some time depending on their size and your internet speed.

Pro Tip 1: Set Up an Alias for Ollama Commands

To make working with Ollama in Docker more convenient, you can set up a bash alias in your ~/.bashrc file (or whichever shell you're using):

echo 'alias ollama="docker exec -it ollama ollama"' >> ~/.bashrc

source ~/.bashrc

With this alias, you can run Ollama commands as if it were installed globally on your system:

So, instead of typing:

docker exec -it ollama ollama run qwen3

You can simply type:

ollama run qwen3

Any command typed after the alias functions exactly as if you had typed out the full Docker exec command.

Note: This would cause conflicts if you also have Ollama installed globally on your host system. In that case, consider using a slightly different alias like docker-ollama instead.

Pro Tip 2: Test Model Performance with Verbose Mode

When trying out a new model, you can use Ollama's verbose mode to see detailed performance statistics, including tokens per second:

docker exec -it ollama ollama run qwen3 --verbose

Or if you set up the alias from Pro Tip 1:

ollama run qwen3 --verbose

The verbose output will show you:

Token generation speed (tokens/sec)
Context size
Total tokens processed

This information is invaluable for:

Comparing performance between different models
Determining if your hardware can efficiently run larger models
Troubleshooting slow response times
Optimizing your setup for better performance

For example, you might see output like:

total duration:       50.742290897s
load duration:        19.662375ms
prompt eval count:    28 token(s)
prompt eval duration: 155.80633ms
prompt eval rate:     179.71 tokens/s
eval count:           1205 token(s)
eval duration:        50.565964637s
eval rate:            23.83 tokens/s

If you see low tokens/sec numbers (below 10), your hardware might be struggling with the model size, and you should consider using a smaller model for better responsiveness.

Step 4: Set Up Open-WebUI Container

Open-WebUI provides a fantastic chat interface for your Ollama models. Let's run its container, connect it to Ollama, and enable GPU support (useful if the UI itself leverages any GPU features, uses CUDA images).

docker run -d \
  -p 3000:8080 \
  --gpus all \
  --network ollama-net \
  -v open-webui:/app/backend/data \
  -e OLLAMA_BASE_URL=http://ollama:11434 \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:cuda

Key parts of this command:

-p 3000:8080: Maps port 3000 on your host server to the default port 8080 inside the Open-WebUI container. You'll access the UI via port 3000.
--gpus all: Provides GPU access (use the :cuda image tag). Omit this and use the :main tag if you don't have/need GPU support for the WebUI itself.
--network ollama-net: Connects Open-WebUI to the same network as Ollama.
-v open-webui:/app/backend/data: Creates a named volume open-webui to store UI settings, chat history, user data, etc., persistently.
-e OLLAMA_BASE_URL=http://ollama:11434: This is how Open-WebUI finds Ollama. We're telling it that the Ollama API is available at the hostname ollama (which Docker resolves to the Ollama container's IP within the ollama-net network) on port 11434.
--name open-webui: Names the container.
--restart always: Alias for --restart unless-stopped.
ghcr.io/open-webui/open-webui:cuda: The official Open-WebUI image suitable for CUDA/GPU systems. Use :main for CPU-only.

Step 5: Set Up n8n Container

Now, let's add the n8n workflow automation engine to our network.

docker run -d \
  --name n8n \
  --network ollama-net \
  -p 5678:5678 \
  -v n8n_data:/home/node/.n8n \
  -e GENERIC_TIMEZONE="America/Los_Angeles" \
  -e N8N_SECURE_COOKIE=false \
  --restart unless-stopped \
  docker.n8n.io/n8nio/n8n

Explanation:

--name n8n: Names the container.
--network ollama-net: Connects n8n to the same network, allowing it to make HTTP requests directly to http://ollama:11434.
-p 5678:5678: Exposes n8n's web UI on port 5678 of your host server.
-v n8n_data:/home/node/.n8n: Creates a named volume n8n_data for persisting workflows, credentials, and execution data. Crucial!
-e GENERIC_TIMEZONE="America/Los_Angeles": Sets the timezone for accurate scheduling and logs (adjust if you're not in Pacific Time).
-e N8N_SECURE_COOKIE=false: Allows logging into n8n over plain HTTP on your local network. Remove this or set to true if you later configure HTTPS access.
--restart unless-stopped: Keeps n8n running.
docker.n8n.io/n8nio/n8n: The official n8n image.

Step 6: (Optional) Set Up Watchtower for Automatic Updates

To keep your containers updated automatically, you can run Watchtower. This command configures it to check daily and only update the containers we've just set up.

docker run -d \
  --name watchtower \
  -v /var/run/docker.sock:/var/run/docker.sock \
  --restart unless-stopped \
  containrrr/watchtower \
  --cleanup \
  --interval 86400 \
  ollama open-webui n8n

Explanation:

-v /var/run/docker.sock:/var/run/docker.sock: Allows Watchtower to interact with the Docker daemon.
--cleanup: Removes old images after updates.
--interval 86400: Checks for updates every 24 hours (86400 seconds).
ollama open-webui n8n: Specifies only these containers should be monitored and updated by Watchtower.

Step 7: Accessing Your Services

Everything should now be running! You can access the web interfaces from another computer on your local network:

Open-WebUI: http://<your_server_ip>:3000
n8n: http://<your_server_ip>:5678 (Requires initial owner account setup on first visit)
Ollama API (Directly): http://<your_server_ip>:11434 (Useful for direct API calls or testing)

Replace <your_server_ip> with the actual local IP address of the server running Docker.

Conclusion & Next Steps

Congratulations! You now have a powerful, private AI and automation stack running locally.

Explore Open-WebUI: Chat with your different models, customize prompts, and explore its settings.
Build n8n Workflows: Start automating tasks. Try using the HTTP Request node in n8n to call your Ollama API at http://ollama:11434/api/chat to add AI capabilities to your automations.
Add More Models: Use docker exec -it ollama ollama pull <model_name:tag> to expand your collection.
Secure Access: If exposing these services (especially n8n) beyond your trusted local network, implement proper security measures like HTTPS (using a reverse proxy like Nginx Proxy Manager or Traefik with Let's Encrypt) and strong authentication.
Backup Volumes: Regularly back up your named Docker volumes (ollama_data, open-webui, n8n_data) to prevent data loss.

Enjoy experimenting with your self-hosted AI powerhouse!