Jake Coombs

Self-Hosting Private AI

Over the past week, I've had my tech equivalent of the 'Minecraft Two Week Phase'. I have just recently moved my Desktop PC (a.k.a Gaia, see Naming My Devices) from Windows to Linux, Linux Mint specifically for my very first distro. And in doing so I have reignited my fire to build a decent self-hosted setup at home. I set one up before, but it didn't last long because it was a bit too basic and didn't add much value to my life. But not this time!

One of the main drivers for this is that I wanted to have a private local AI agent that I could use to help me with some of my financial details, given that we're closing in on the end of the tax year. As someone who cares a bit more than the average person about my digital privacy and protection, it came as a surprise to absolutely nobody, that I did not want a third-party processing my personal finances. I was also been inspired by Pewdiepie, following his journey in setting up private AI. He showed off his impressive setup where he has a council of AI models deciding which answer is best, but scarily, they figured out how to rig the system, I'm not trying to go that far, a simple LLM will do me.

Before I get into the system and building it out, here is the hardware I had available for this setup (or so I thought):

  1. Desktop PC with an Nvidia 3080 10gb GPU (a.k.a Gaia) - the power house.
  2. Raspberry Pi 5 8gb (a.k.a Hermes) - the messenger responsible for navigation.
  3. Raspberry Pi 2gb (a.k.a Hestia) - the manager of the home.

In wanting to be more considerate of my energy use, I only want Gaia powered on when actually in use, so I want to leverage Wake-on-Lan to power on the PC remotely.


I started with the main target of my mission, the local AI. For this I decided to use Ollama, with Open WebUI as the interface. I started by installing Docker on Gaia, and got setup with a simple compose file combining both services so they could interact with each other. Docker compose makes it so easy to get applications up and running, usually. In this case I wanted to allow Ollama to utilise the GPU for AI processing. But I kept running into issues in getting Docker to be able to use the GPU, confusingly docker run would be able to use the GPU, but docker compose could not.

The following is the final configuration which seemed to work, I think it was the environment variable which did the trick:

    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all

After finally getting Docker to use the GPU, I spun up the containers, and had an Open WebUI interface up and ready to use. I don't really know which model is best or what Gaia can handle suitably. I don't have much knowledge on the different models, but from my understanding, the more parameters, the better the model, a classic case of higher number = more gooder. So I played around with some models between 4b and 12b parameters, mostly Llama3.1:8b and gemma3:12b, but I found that the latter gave me slightly better results, so I just stuck with it because it worked. I want to dive more into what's actually behind the models in the future.

Now I had a local AI up and running, I tested my new toy to help map out the architecture of the system. Here's what I ended on:

graph TD A[Internet] --> B[Domain] B --> C[Router] C --> D[VPN] subgraph "Pi 5 8gb (NAS)" D --> E[Reverse Proxy] E --> F[Dashboard] E --> G[Immich] end subgraph "Desktop PC" E -- Wake on Lan --> H[Open-WebUI] H --> I[Ollama] end subgraph "Pi 2gb" E --> J[Home Assistant] end

I'm starting to like Mermaid quite a bit now, but I keep having to fight it to get the nodes to be where I want.

Now, if you were paying attention, you may have noticed the words "or so I thought" where I listed the hardware. In this current economy, having your own processing power is like owning gold, it costs an arm and a leg to get something new these days, so it's best to make the most of what you've got. In my head, for whatever reason, I thought I had a Raspberry Pi with 2gb of memory, which was gifted to be me by my Dad when I was a teenager some 10ish years ago. Now, it had been some time since I had used it in any way, so I couldn't remember the specific model, which you need to flash the firmware, and I could not tell from just looking at the Pi, but after looking online, it appeared I had either the Pi 1 B+ or the Pi 2. So I started with the Pi 1 firmware, flashed the SD card and booted. And... bingo, first try. It turns out I had a Pi 1 B+ with 512mb of RAM, which in today's standards is not much. For reference, my phone 24 times the memory! And from some initial research, it looks like Home Assistant OS, which I had planned to run on it, is not supported on it any longer, which is a shame. I'm not sure what purpose to give it for now, given its limited memory, but I'm sure in time it will have some purpose.

After having used Open WebUI through an exposed port for a few days using HTTP, it was time for me to upgrade to HTTPS and use a real domain, especially as it was something which I wanted to share with my flatmate. So I needed a reverse proxy, I have previously used both Caddy and Godoxy with success, but ultimately opted for Caddy given its popularity and support, as well as having more refined control from a single Caddyfile. I went through a bit of back and forth trying to get this set up, thinking it would be one of the easier parts to get up and running. I had initially tried to get a local domain such as ai.localhost available on the network. However, given that it was being hosted on a separate machine than it was trying to be accessed from, it was not working. So, I ended up using one of my domains (not this one) and pointing it to the local network address of Hermes in the DNS settings. This way the domain only works when connected to the hpme network. Another thing I had to figure out, is that in order for TLS to work, I had to set up certificates properly through Cloudflare, so ended up using this Caddy Cloudflare docker container. After going through some troubleshooting steps, I finally had Open WebUI accessible through ai.domain.com! My initial goal was complete, but the endless rabbit hole of progress does not stop.

As I only had a single subdomain up, it meant that the base domain was empty, and that just did not sit right with me, every domain needs a home, and for a self-hosted homelab, it has to be a dashboard. To start with I just wanted something easy to setup, just to get up and running. My favourite place to go to browse my options is selfh.st, and after having a gander, I landed on dashy as my choice, as it was configured from a single yaml file, and seemed simple. So far, it does the job perfectly, combining selfh.st app icons with status indicators, I was able to quickly spin up a very simple and functional dashboard.

At this point, I have docker containers running on two machines; Caddy and Dashy on Hermes, and Ollama and Open WebUI on Gaia. I wanted to be able to manage the containers from a central hub, which is how I found out about the concept of a Portainer Agent, which essentially a node which the central Portainer Server can connect to in order to extend control. I had no issues getting this up and running and was able to control the containers on Gaia from Hermes in no time.

Finally, I wanted to be able to access my private AI from outside of my network. There were two ways in which I could have approached this, exposing the domain externally, or using a VPN. I went for the latter as the more secure option. I initially tried to install a Wireguard VPN natively through sudo apt install wireguard, but that actually kicked me out of SSH and blocked all connections. Then I stumbled on Jeff Geerling's video about PiVPN. Following that I was able to get a Wireguard VPN setup in no time, and it worked right out of the gate.

For now, I have reached my real MVP (Minimum Viable Product), having a private AI model which I can access from anywhere, and tools to control my resources. As of writing, this is what the current setup looks like:

graph TD A[Wireguard Client] subgraph "Home Network" subgraph "Hermes" D["PiVPN (Wireguard)"] E[Caddy] E --> F[Dashy] E --> G[Portainer] end subgraph "Gaia" G <--> J[Portainer-Agent] H --> I[Ollama] E --> H[Open WebUI] end end A --> D

Looking ahead, I want to expand more on my self-hosted setup. The next feature I want to add is Wake-on-Lan (WoL), which would give me the control to remotely power on Gaia without having to press the physical button. In addition, I'd like to build up Hermes to be a NAS and host Immich on it to replace Google Photos. For now, that is as far as my plans go, but as with all homelab projects, does it ever reach an end?