VPS with 16 vCPUs and 8GB DDR4/5?

CloudHopper · March 29

I'm looking for a high-compute VPS to run a Large Language Model, (AI chatbot, text classifier etc).

I already have a heavily quantised model running fairly well on a VPS with 8 Epyc vCPUs and 16GB of RAM, but it only uses 5.5GB and its bottleneck is compute so I'm looking for an optimised solution*

Storage and bandwidth requirements are minimal, so the specs I'm looking for are something like:

Compute: 16x fast vCPUs, (or maybe more?)
Ram: 8GB DDR4/5
Storage: 50GB NVMe
Bandwidth: 1TB - 100mbps
IPv4 - optional

Looking for offers at around €20 p/m, and open to Quarterly payments if it sweetens the deal.

*I know I should use a GPU but this is LET 🤷‍♀️

sh97 · March 29

You can build your own config with @crunchbits

https://crunchbits.com/vps

Your required specs come around 15$
That said, you'd probably be better off with a VDS.

davide · March 29

With the latest llama and this model:

-rw-r--r-- 1 user user  22G Sep 20  2023 WizardLM-Uncensored-SuperCOT-Storytelling.Q5_K_M.gguf

I get about 1 token per second on an Intel E3-1245 v3. Cost of cpu + motherboard + memory: 100€, hosted in basement. Just saying.

This is the piece on eBay (no aff), you can ask him to dump VAT and eBay fees too.

CloudHopper · March 29

@sh97 said:
You can build your own config with @crunchbits

https://crunchbits.com/vps

Your required specs come around 15$
That said, you'd probably be better off with a VDS.

Oh wow, I could get 32 x vCPUs and 8GB of RAM for $19 a month! 😲

Looks to be sold out at the moment, but I'll keep a close eye on their stock because I didn't think a CPU/RAM ratio of 4/1 would be possible.

A VDS would make sense if I was going to use it enough, but for now I'm only experimenting so I'm looking to do it as cheaply as possible.

Very much appreciate the tip about @crunchbits though 👍

dev_vps · March 29

@CloudHopper said:
I'm looking for a high-compute VPS to run a Large Language Model, (AI chatbot, text classifier etc).

Go for Ryzen based dedicated VDS or a physical server.

CloudHopper · March 29

@davide said:
With the latest llama and this model:
-rw-r--r-- 1 user user  22G Sep 20  2023 WizardLM-Uncensored-SuperCOT-Storytelling.Q5_K_M.gguf
I get about 1 token per second on an Intel E3-1245 v3. Cost of cpu + motherboard + memory: 100€, hosted in basement. Just saying.

This is the piece on eBay (no aff), you can ask him to dump VAT and eBay fees too.

Nice, that's a great example of how quantisation is making these things available to us mortals. And, for me, Q5 is really the sweet spot. I see a quality improvement over Q4, but I've only got slower inference and no obvious quality improvements from Q6-Q8 models.

I've also told my model that that it runs on a server with limited resources, which doesn't improve the inference speed but it makes it get to the point with fewer tokens.

I'm definitely thinking about a dedicated server for running them locally at some point, but I'll know when I've exceeded the viability of doing it on low end boxes and then a self-hosted server will be the only viable option

Howdy, Stranger!

Categories

In this Discussion

VPS with 16 vCPUs and 8GB DDR4/5?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

VPS with 16 vCPUs and 8GB DDR4/5?

Comments