Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Anyone got 768GB or maybe 1TB spare DDR4?

NeoonNeoon Community Contributor, Veteran

Kinda not so serious ask, but I still ask.
768GB to 1TB DDR4 Otca Channel will do.

Apparently GLM 5.2 runs good enough.
So any any? Like LET range yk

Thanked by 1blanehol

Comments

  • NeoonNeoon Community Contributor, Veteran

    Actually, Q2 qwant runs with 256GB better would be 512GB though for a Q4 quant.

  • barbarzabarbarza Member
    edited June 19

    $7..... (per megabyte)

  • NeoonNeoon Community Contributor, Veteran
  • nikionikio Member

    I'm saving up for a house deposit using sticks of 16GB ddr4. I think I'm half way there at 64gb total.

    Thanked by 2Sharmaishaan72 t0m
  • stablecloudstablecloud Member, Patron Provider

    That's a lot of chrome tabs

  • HotmarerHotmarer Member
    edited June 19

    I've tested it and I can't confirm this. The model's weight in 4-bit quantization is less than 500 GB. I tested the model also on 8x RTX PRO 6000 cards, and the performance difference is huge. These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.
    People forget that models weights are not a big problem, because they stil need to have VRAM/RAM for kv cache and context. It is not worth running such a model with an 8k context...

  • NeoonNeoon Community Contributor, Veteran

    @Hotmarer said:
    I've tested it and I can't confirm this. The model's weight in 4-bit quantization is less than 500 GB. I tested the model also on 8x RTX PRO 6000 cards, and the performance difference is huge. These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.
    People forget that models weights are not a big problem, because they stil need to have VRAM/RAM for kv cache and context. It is not worth running such a model with an 8k context...

    You have 8x RTX PRO just laying around?

  • plumbergplumberg Veteran, Megathread Squad

    @Neoon said:

    @Hotmarer said:
    I've tested it and I can't confirm this. The model's weight in 4-bit quantization is less than 500 GB. I tested the model also on 8x RTX PRO 6000 cards, and the performance difference is huge. These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.
    People forget that models weights are not a big problem, because they stil need to have VRAM/RAM for kv cache and context. It is not worth running such a model with an 8k context...

    You have 8x RTX PRO just laying around?

    In them basement

  • NeoonNeoon Community Contributor, Veteran

    @plumberg said:

    @Neoon said:

    @Hotmarer said:
    I've tested it and I can't confirm this. The model's weight in 4-bit quantization is less than 500 GB. I tested the model also on 8x RTX PRO 6000 cards, and the performance difference is huge. These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.
    People forget that models weights are not a big problem, because they stil need to have VRAM/RAM for kv cache and context. It is not worth running such a model with an 8k context...

    You have 8x RTX PRO just laying around?

    In them basement

    Of course the basment and upstairs he sells cat and dog food, the usual.

    Thanked by 1plumberg
  • plumbergplumberg Veteran, Megathread Squad

    @Neoon said:

    @plumberg said:

    @Neoon said:

    @Hotmarer said:
    I've tested it and I can't confirm this. The model's weight in 4-bit quantization is less than 500 GB. I tested the model also on 8x RTX PRO 6000 cards, and the performance difference is huge. These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.
    People forget that models weights are not a big problem, because they stil need to have VRAM/RAM for kv cache and context. It is not worth running such a model with an 8k context...

    You have 8x RTX PRO just laying around?

    In them basement

    Of course the basment and upstairs he sells cat and dog food, the usual.

    Dont forget them baguettes

  • NeoonNeoon Community Contributor, Veteran

    @plumberg said:

    @Neoon said:

    @plumberg said:

    @Neoon said:

    @Hotmarer said:
    I've tested it and I can't confirm this. The model's weight in 4-bit quantization is less than 500 GB. I tested the model also on 8x RTX PRO 6000 cards, and the performance difference is huge. These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.
    People forget that models weights are not a big problem, because they stil need to have VRAM/RAM for kv cache and context. It is not worth running such a model with an 8k context...

    You have 8x RTX PRO just laying around?

    In them basement

    Of course the basment and upstairs he sells cat and dog food, the usual.

    Dont forget them baguettes

    Nah, Either you sell cat and dog food or baguettes.

  • OhJohnOhJohn Member
    edited June 19

    @Hotmarer said: These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.

    I actually do run huge models on CPU/RAM only machines and yes, they are slow, but for background tasks or if you fine with the speed (e.g. around 10-12 t/s on gpt-oss:120b (mid-range model) and 2-4 t/s on e.g. GLM/Deepseek/Minimax/Kimi etc. large models) those setups do work.

    @Neoon have you tried unsloth ui/studio? Or still on llama.cpp (which is probably the best idea). I have no time atm to switch from my old ollama setup to llama.cpp. Do a perfect bash script to install and perfectly configure llama.cpp, put it somewhere in your repos and I might try it out and get you access via vpn... (which really depends on what you want to do with that). 768GB machine so e.g. the 4-bit q should fit.

  • NeoonNeoon Community Contributor, Veteran

    @OhJohn said:

    @Hotmarer said: These online stories that every big model can be run on 1 TB of RAM are false. Any model run this way will be unusable.

    I actually do run huge models on CPU/RAM only machines and yes, they are slow, but for background tasks or if you fine with the speed (e.g. around 10-12 t/s on gpt-oss:120b (mid-range model) and 2-4 t/s on e.g. GLM/Deepseek/Minimax/Kimi etc. models.

    @Neoon have you tried unsloth ui/studio? Or still on llama.cpp (which is probably the best idea). I have no time atm to switch from my old ollama setup to llama.cpp. Do a perfect bash script to install and perfectly configure llama.cpp, put it somewhere in your repos and I might try it out and get you access via vpn... (which really depends on what you want to do with that). 768GB machine so e.g. the 4-bit q should fit.

    Unsloth studio is cancer on Windows, won't even start, so I didn't bother.
    VPN uuuh, hot, I am into VPN's.

    For RÄM only, use ik_llama.cpp instead if you wanna use only a specific model.
    Otherwise llama.cpp, everything else is just a wrapper and/or trash.

    There already is: https://pastebin.com/raw/gKYBcXqc
    You just mod it a little bit: https://pastebin.com/raw/s7bgVsyH

  • dbadudedbadude Member

    anyone got a spare nvidia with 300GB nvram? So i can eat some bread and wash my clothes.

  • davidedavide Member
    edited June 19

    @dbadude said:
    anyone got a spare nvidia with 300GB nvram? So i can eat some bread and wash my clothes.

    Dam bro sorry to hear that, see how these poor guys cope with hunger, they press mosquitos into burgers and fry them:

    Thanked by 1dbadude
  • NeoonNeoon Community Contributor, Veteran
    edited June 19

    @dbadude said:
    anyone got a spare nvidia with 300GB nvram? So i can eat some bread and wash my clothes.

    Bro, did NVIDIA even release a 300GB GPU? Ragebait I sense.
    At least if you shitpost, shitpost with confidence.

    Thanked by 1dbadude
  • aphexaphex Member

    @OhJohn said: I actually do run huge models on CPU/RAM only machines and yes, they are slow, but for background tasks or if you fine with the speed (e.g. around 10-12 t/s on gpt-oss:120b (mid-range model) and 2-4 t/s on e.g. GLM/Deepseek/Minimax/Kimi etc. large models) those setups do work.

    I am quite intrigued. What is the power consumption cost like, for example running 10 hours of 100% on all cores dual cpu vs 1 hour — let's say 100 t/s.. generous estimate — of multi GPU time ?

    Is this more efficient by far if you have stuff you do not care about batching overnight ? Or simply leftover hardware

  • OhJohnOhJohn Member
    edited June 19

    @aphex said: Or simply leftover hardware

    its actually idle time on a hardware that is needed once a week where using hourly compute on e.g. a hyperscaler would be more expensive (for running it 4 or 5 times a month) than the server for a month.

    So it has a few days idling per week that i use for self-hosted llm experiments and testing etc.

    Otherwise any LLM subscription by the big AI names would be cheaper (at least with the number of tokens i produce).

    Thanked by 1aphex
  • dbadudedbadude Member

    @Neoon said:

    @dbadude said:
    anyone got a spare nvidia with 300GB nvram? So i can eat some bread and wash my clothes.

    Bro, did NVIDIA even release a 300GB GPU? Ragebait I sense.
    At least if you shitpost, shitpost with confidence.

    just a joke, i dont know a thing about nvidia hardware

  • OhJohnOhJohn Member

    @dbadude said: i dont know a thing about nvidia hardware

    welcome to the club.

Sign In or Register to comment.