Anyone got 768GB or maybe 1TB spare DDR4?

Neoon · June 18

Kinda not so serious ask, but I still ask.
768GB to 1TB DDR4 Otca Channel will do.

Apparently GLM 5.2 runs good enough.
So any any? Like LET range yk

Neoon · June 19

Actually, Q2 qwant runs with 256GB better would be 512GB though for a Q4 quant.

barbarza · June 19

$7..... (per megabyte)

Neoon · June 19

@barbarza said:
$7

nikio · June 19

I'm saving up for a house deposit using sticks of 16GB ddr4. I think I'm half way there at 64gb total.

stablecloud · June 19

That's a lot of chrome tabs

Hotmarer · June 19

I've tested it and I can't confirm this. The model's weight in 4-bit quantization is less than 500 GB. I tested the model also on 8x RTX PRO 6000 cards, and the performance difference is huge. These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.
People forget that models weights are not a big problem, because they stil need to have VRAM/RAM for kv cache and context. It is not worth running such a model with an 8k context...

Neoon · June 19

@Hotmarer said:
I've tested it and I can't confirm this. The model's weight in 4-bit quantization is less than 500 GB. I tested the model also on 8x RTX PRO 6000 cards, and the performance difference is huge. These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.
People forget that models weights are not a big problem, because they stil need to have VRAM/RAM for kv cache and context. It is not worth running such a model with an 8k context...

You have 8x RTX PRO just laying around?

plumberg · June 19

@Neoon said:

@Hotmarer said:
I've tested it and I can't confirm this. The model's weight in 4-bit quantization is less than 500 GB. I tested the model also on 8x RTX PRO 6000 cards, and the performance difference is huge. These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.
People forget that models weights are not a big problem, because they stil need to have VRAM/RAM for kv cache and context. It is not worth running such a model with an 8k context...

You have 8x RTX PRO just laying around?

In them basement

Neoon · June 19

@plumberg said:

@Neoon said:

@Hotmarer said:
I've tested it and I can't confirm this. The model's weight in 4-bit quantization is less than 500 GB. I tested the model also on 8x RTX PRO 6000 cards, and the performance difference is huge. These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.
People forget that models weights are not a big problem, because they stil need to have VRAM/RAM for kv cache and context. It is not worth running such a model with an 8k context...

You have 8x RTX PRO just laying around?

In them basement

Of course the basment and upstairs he sells cat and dog food, the usual.

plumberg · June 19

@Neoon said:

@plumberg said:

@Neoon said:

@Hotmarer said:
I've tested it and I can't confirm this. The model's weight in 4-bit quantization is less than 500 GB. I tested the model also on 8x RTX PRO 6000 cards, and the performance difference is huge. These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.
People forget that models weights are not a big problem, because they stil need to have VRAM/RAM for kv cache and context. It is not worth running such a model with an 8k context...

You have 8x RTX PRO just laying around?

In them basement

Of course the basment and upstairs he sells cat and dog food, the usual.

Dont forget them baguettes

Neoon · June 19

@plumberg said:

@Neoon said:

@plumberg said:

@Neoon said:

@Hotmarer said:
I've tested it and I can't confirm this. The model's weight in 4-bit quantization is less than 500 GB. I tested the model also on 8x RTX PRO 6000 cards, and the performance difference is huge. These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.
People forget that models weights are not a big problem, because they stil need to have VRAM/RAM for kv cache and context. It is not worth running such a model with an 8k context...

You have 8x RTX PRO just laying around?

In them basement

Of course the basment and upstairs he sells cat and dog food, the usual.

Dont forget them baguettes

Nah, Either you sell cat and dog food or baguettes.

OhJohn · June 19

@Hotmarer said: These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.

I actually do run huge models on CPU/RAM only machines and yes, they are slow, but for background tasks or if you fine with the speed (e.g. around 10-12 t/s on gpt-oss:120b (mid-range model) and 2-4 t/s on e.g. GLM/Deepseek/Minimax/Kimi etc. large models) those setups do work.

@Neoon have you tried unsloth ui/studio? Or still on llama.cpp (which is probably the best idea). I have no time atm to switch from my old ollama setup to llama.cpp. Do a perfect bash script to install and perfectly configure llama.cpp, put it somewhere in your repos and I might try it out and get you access via vpn... (which really depends on what you want to do with that). 768GB machine so e.g. the 4-bit q should fit.

Neoon · June 19

@OhJohn said:

@Hotmarer said: These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.

I actually do run huge models on CPU/RAM only machines and yes, they are slow, but for background tasks or if you fine with the speed (e.g. around 10-12 t/s on gpt-oss:120b (mid-range model) and 2-4 t/s on e.g. GLM/Deepseek/Minimax/Kimi etc. models.

@Neoon have you tried unsloth ui/studio? Or still on llama.cpp (which is probably the best idea). I have no time atm to switch from my old ollama setup to llama.cpp. Do a perfect bash script to install and perfectly configure llama.cpp, put it somewhere in your repos and I might try it out and get you access via vpn... (which really depends on what you want to do with that). 768GB machine so e.g. the 4-bit q should fit.

Unsloth studio is cancer on Windows, won't even start, so I didn't bother.
VPN uuuh, hot, I am into VPN's.

For RÄM only, use ik_llama.cpp instead if you wanna use only a specific model.
Otherwise llama.cpp, everything else is just a wrapper and/or trash.

There already is: https://pastebin.com/raw/gKYBcXqc
You just mod it a little bit: https://pastebin.com/raw/s7bgVsyH

dbadude · June 19

anyone got a spare nvidia with 300GB nvram? So i can eat some bread and wash my clothes.

davide · June 19

@dbadude said:
anyone got a spare nvidia with 300GB nvram? So i can eat some bread and wash my clothes.

Dam bro sorry to hear that, see how these poor guys cope with hunger, they press mosquitos into burgers and fry them:

Neoon · June 19

@dbadude said:
anyone got a spare nvidia with 300GB nvram? So i can eat some bread and wash my clothes.

Bro, did NVIDIA even release a 300GB GPU? Ragebait I sense.
At least if you shitpost, shitpost with confidence.

aphex · June 19

@OhJohn said: I actually do run huge models on CPU/RAM only machines and yes, they are slow, but for background tasks or if you fine with the speed (e.g. around 10-12 t/s on gpt-oss:120b (mid-range model) and 2-4 t/s on e.g. GLM/Deepseek/Minimax/Kimi etc. large models) those setups do work.

I am quite intrigued. What is the power consumption cost like, for example running 10 hours of 100% on all cores dual cpu vs 1 hour — let's say 100 t/s.. generous estimate — of multi GPU time ?

Is this more efficient by far if you have stuff you do not care about batching overnight ? Or simply leftover hardware

OhJohn · June 19

@aphex said: Or simply leftover hardware

its actually idle time on a hardware that is needed once a week where using hourly compute on e.g. a hyperscaler would be more expensive (for running it 4 or 5 times a month) than the server for a month.

So it has a few days idling per week that i use for self-hosted llm experiments and testing etc.

Otherwise any LLM subscription by the big AI names would be cheaper (at least with the number of tokens i produce).

dbadude · June 19

@Neoon said:

@dbadude said:
anyone got a spare nvidia with 300GB nvram? So i can eat some bread and wash my clothes.

Bro, did NVIDIA even release a 300GB GPU? Ragebait I sense.
At least if you shitpost, shitpost with confidence.

just a joke, i dont know a thing about nvidia hardware

OhJohn · June 19

@dbadude said: i dont know a thing about nvidia hardware

welcome to the club.

Howdy, Stranger!

Categories

In this Discussion

Anyone got 768GB or maybe 1TB spare DDR4?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Anyone got 768GB or maybe 1TB spare DDR4?

Comments