Product Thumbnail

SelfHostLLM

Calculate the GPU memory you need for LLM inference

Open Source
Developer Tools
Artificial Intelligence
GitHub

Calculate GPU memory requirements and max concurrent requests for self-hosted LLM inference. Support for Llama, Qwen, DeepSeek, Mistral and more. Plan your AI infrastructure efficiently.

Top comment

Built to simplify planning for self-hosted AI deployments.

Unlike other AI infrastructure tools, SelfHostLLM lets you precisely estimate GPU requirements and concurrency for Llama, Qwen, DeepSeek, Mistral, and more using custom config.

B̶u̶t̶ n̶o̶w̶ I̶ w̶̶a̶n̶t̶ t̶o̶ s̶e̶e̶ ̶A̶p̶p̶l̶e̶ s̶i̶l̶i̶c̶o̶n̶ ̶a̶d̶d̶e̶d̶ t̶o̶ t̶h̶e̶ m̶i̶x̶!

Update: Now there's a Mac version too!

Comment highlights

Super useful — sizing GPU memory and concurrency upfront saves a ton of headaches. Love that it works with different models.

No way, this is exactly what I needed! Figuring out GPU memory for LLMs has always been such a headache—super smart to automate it. Any plans to support multi-GPU setups?

Here is the Mac version: https://selfhostllm.org/mac/

Hi all, I'm the creator of SelfHostLLM.org.

You can read more about why I created it here:


https://www.linkedin.com/posts/e...

Love how SelfHostLLM lets you actually estimate GPU needs for different LLMs—no more guessing and overbuying fr. Super smart idea, realy impressed!