Member-only story

Exactly How Much vRAM (and Which GPU) Can Serve Your LLM?

What if you still want to run a massive model in your small GPU?

Thuwarakesh Murallie
AI Advances
5 min readFeb 7, 2025

--

Image from Canva

The world has never been this different.

Everything we do in just a few years will have some “AI.” If you’re reading this, you don’t need an introduction to LLMs and GPUs.

After using closed models like GPT and Claude, you would eventually decide to try the open-source models. That is a fair move.

The first(and one of the best) way to access open-source models is from an API provider like Groq or Together AI. But your company may want something even safer.

As far as I can tell, open-source models have two significant benefits.

  1. Cost: open-source model inferences cost drastically less.
  2. Privacy: With open models, you can have everything on-premise. Including the data you send for inference.

To truly enjoy privacy, you must host the LLM on your servers. Tools like Ollama have made it easy for you. But there’s still a key question to be answered. One that concerns your CFO.

The GPU cost is so high. How much could you spend on it?

To answer this question, estimate how much vRAM you need to serve your LLM. This post is…

--

--

Written by Thuwarakesh Murallie

Data Science Journalist & Independent Consultant

Responses (4)

What are your thoughts?