Best LLMs to Run at Home with Docker…

Jun 19

Learn about the top LLM models you can run in Docker, even on low-power systems that don't have a dedicated GPU.

4 Comments

I got this setup and working with an old Intel NUC (6th Gen i5 2C/4T CPU @2.2GHz) and it’s hellaslooooooooow. I imagine upgrading the RAM from 8GB to 32GB would help a little but this machine runs my Jellyfin server and a couple other services (and does so very nicely) so I think it’s time I got a dedicated LLM machine. I’ve been looking at options and based on my (**VERY**) limited budget, I’m inclined to get an HP Elite T655 (Modern embedded AMD Ryzen R2314 (4c/4t, up to 3.5 GHz) with a Radeon Vega 6 iGPU). Apparently I can run `llama.cpp` with iGPU acceleration using Intel’s oneAPI and the IPEX-LLM backend. A reasonably performant dedicated local LLM on a power-sipping thin client with compiled iGPU support for $150 or so?! I think that would be an AWESOME Between The Clouds follow up article to this one!

Expand full comment

Reply (1)

Between the Clouds Newsletter

Jun 23

Donnie,

Thank you for sharing your setup here. This is great feedback on the hardware and what you have experienced in terms of performance. I really like the HP Elite machines and I think these are great foundations for labs and LLMs. I will definitely get something on the books to explore the iGPU acceleration further. So stay tuned!

Expand full comment

Donnie Ferris

Jun 19

Nice write up, Brandon. I’ve been wanting to get into local AI - not only for integration with Home Assistant (without any cloud dependency) but also for dev projects and just, in general, being able to have the “ChatGPT subscription” experience without the monthly cost.

While I’m sure you intended for this article to be accessible for laypersons, you lost me at “3B parameters”, “1B parameters”, etc. I have zero understanding of such terms; might make for a good follow-up post! An even better suggestion for a super helpful post would be to discuss which (of your recommended) LLMs are best suited for various tasks - recommendations on what to use (and how to setup and optimize) for Home Assistant integration, code development, setting up self-hosted services, etc. (Those are my primary use cases.)

Last but not least, when I used your docker-compose.yml and tried to bring up the containers, I got the following error message and I’m not sure what to do to fix it:

“Error response from daemon: pull access denied for openwebui/openwebui, repository does not exist or may require 'docker login': denied: requested access to the resource is denied”

Expand full comment

Reply (1)

Between the Clouds Newsletter

Jun 20

Hey, really appreciate the comment and great questions!

You're absolutely right that these types of terms like "3B parameters" and "4-bit quantized" can be a bit cryptic if you're just getting into local AI. I will add a follow-up post breaking that down in plain language what these terms are and that is a great idea!

To give a quick explanation here: when we say something like “3B parameters,” it is referring to the size of the model. So, basically how much data it was trained on and how complex it is. More parameters generally means more capabilities, but also more hardware is required to run them (RAM, CPU/GPU). That said, smaller models like 1B-4B can still perform very well for many home tasks like Home Assistant voice interaction, basic coding, and general Q&A.

I really like your idea about creating a use-case guide for different local AI needs like:

Home Assistant integration (voice commands, automation triggers),dev assistance or code generation, setting up local AI tools for various home lab or self-hosted services, etc.

As for the Docker Compose error you ran into. I checked and it looks like the openwebui repo has changed recently. Try this repo instead: open-webui/open-webui. I have updated the post as well (so just replace openwebui/openwebui with this)

ghcr.io/open-webui/open-webui:main

Expand full comment