Sandboxing Local LLMs

Ollama running in LXD with GPU pass-through

Jeremy Cheng
9 min readFeb 27, 2024

BACKGROUND

Yup… I caught the AI bug too so I have been exploring with all sorts of opensource Gen-AI tech for both work and on my free time. Out of all the tools I have been playing with so-far, Ollama stands out from the rest because I have used it quite often in my daily life ever since the day I discovered it. From a pure user perspective, being able to run LLMs locally on my own computer/server just feels much better than using a managed solution like ChatGPT for more reasons than just privacy. On the other hand, from a developer’s perspective and being an opensource fan aside, because Ollama is opensource, I am able to have more flexibility on how I can leverage it for my projects. It also puts me at ease that my interactions with these models and the derived output are less likely to be leaked when it’s self-hosted vs blindly trusting that a company will truly sandbox your data just because you opted for it in a settings screen. Sure, it’s not quite ChatGPT 4.0 yet but it’s catching up by the day. I also find that large corps release their new opensource models to Ollama pretty quickly which is super nice. The lag is not bad at all.

As you can probably guess, there’s no way I would start installing a bunch of random stuff directly on the host of my computers and servers…

--

--