New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
VPS with 16 vCPUs and 8GB DDR4/5?
CloudHopper
Member
in Requests
I'm looking for a high-compute VPS to run a Large Language Model, (AI chatbot, text classifier etc).
I already have a heavily quantised model running fairly well on a VPS with 8 Epyc vCPUs and 16GB of RAM, but it only uses 5.5GB and its bottleneck is compute so I'm looking for an optimised solution*
Storage and bandwidth requirements are minimal, so the specs I'm looking for are something like:
Compute: 16x fast vCPUs, (or maybe more?)
Ram: 8GB DDR4/5
Storage: 50GB NVMe
Bandwidth: 1TB - 100mbps
IPv4 - optional
Looking for offers at around €20 p/m, and open to Quarterly payments if it sweetens the deal.
*I know I should use a GPU but this is LET 🤷♀️
Comments
You can build your own config with @crunchbits
https://crunchbits.com/vps
Your required specs come around 15$
That said, you'd probably be better off with a VDS.
With the latest llama and this model:
I get about 1 token per second on an Intel E3-1245 v3. Cost of cpu + motherboard + memory: 100€, hosted in basement. Just saying.
This is the piece on eBay (no aff), you can ask him to dump VAT and eBay fees too.
Oh wow, I could get 32 x vCPUs and 8GB of RAM for $19 a month! 😲
Looks to be sold out at the moment, but I'll keep a close eye on their stock because I didn't think a CPU/RAM ratio of 4/1 would be possible.
A VDS would make sense if I was going to use it enough, but for now I'm only experimenting so I'm looking to do it as cheaply as possible.
Very much appreciate the tip about @crunchbits though 👍
Go for Ryzen based dedicated VDS or a physical server.
Nice, that's a great example of how quantisation is making these things available to us mortals. And, for me, Q5 is really the sweet spot. I see a quality improvement over Q4, but I've only got slower inference and no obvious quality improvements from Q6-Q8 models.
I've also told my model that that it runs on a server with limited resources, which doesn't improve the inference speed but it makes it get to the point with fewer tokens.
I'm definitely thinking about a dedicated server for running them locally at some point, but I'll know when I've exceeded the viability of doing it on low end boxes and then a self-hosted server will be the only viable option