[2024 EXTENDED] Black Friday / Cyber Monday: FLASH SALE & MEGATHREAD

steny · December 2024

@plumberg said:

@steny said:

@plumberg said:

@steny said:

@plumberg said:

@steny said:
@plumberg said:

any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

You need quite a lot of

@plumberg said:
any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

For a self hosted LLM you need a lot of VRAM especially for coding. For home PC probably the best model you can run is currently Qwen2.5-32B-Coder . For coding, unlike chat you need at least half(8 bits) precision, so that means you will need around 40GB Vram, e.g. Dual rtx 3090, which is a setup I am using. Bigger models, like Qwen2.5-72B or LLama-Nemotron-70B are better, but you won't run it at home at that precision unless you build 4xGPU rig. For the largest open weight models like Mistral Large, you need dual H100 to run it, so you definitely need to rent GPU and won't be cheap but you get around a performance of GPT-4o in coding there, probably slightly less since you will still run it in half precision only.

First off, thanks for the detailed post.

So here is the deal. I have 0 GPU. But have a pair of dual E5-2699v4 pair with decent RAM (384 gb ddr4 2400 speeed or something).

I am not really interested in getting fastest responses. As long as it spits out decent I am game.

Or am I dreaming of hosting a llm ? What are your thoughts?

Running on Ram would be awfully slow, especially those large models you could theoretically run with that amount of ram.

How slow are we talking about? Any idea?

And will that change the quality of the output?

Reguards

The inference speed is mainly dependend on memory bandwidth, Dual rtx 3090 runs 70B model around 15-20 tokens/second. There is some speed loss due to dual setup, yet DDR 2400 bandwidth is about 50xtimes less, So expect bellow 1 Token per second, where token is like 3-4 characters. And that is just a middle sized models, the large ones would be in fractions of tokens per second.

Gotcha. Well I wanna try it out though and see where it takes me. Thanks.

LM Studio is the easiest to deploy platform imo, try it and let me know the real results, I am curious.

donli · December 2024

Savvy · December 2024

18+

emgh · December 2024

@steny said: LM Studio is the easiest to deploy platform imo, try it and let me know the real results, I am curious.

what my i m can runs

Apple M3
16 GB RAM
3 cm baenis

tansel · December 2024

I thought I was too clumsy and entered the wrong promo code, but it turns out it’s really gone. Thank you for your guidance. Regards!

donli · December 2024

emgh · December 2024

@tansel said:
I thought I was too clumsy and entered the wrong promo code, but it turns out it’s really gone. Thank you for your guidance. Regards!

Re

plumberg · December 2024

@steny said:

@plumberg said:

@steny said:

@plumberg said:

@steny said:

@plumberg said:

@steny said:
@plumberg said:

any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

You need quite a lot of

@plumberg said:
any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

For a self hosted LLM you need a lot of VRAM especially for coding. For home PC probably the best model you can run is currently Qwen2.5-32B-Coder . For coding, unlike chat you need at least half(8 bits) precision, so that means you will need around 40GB Vram, e.g. Dual rtx 3090, which is a setup I am using. Bigger models, like Qwen2.5-72B or LLama-Nemotron-70B are better, but you won't run it at home at that precision unless you build 4xGPU rig. For the largest open weight models like Mistral Large, you need dual H100 to run it, so you definitely need to rent GPU and won't be cheap but you get around a performance of GPT-4o in coding there, probably slightly less since you will still run it in half precision only.

First off, thanks for the detailed post.

So here is the deal. I have 0 GPU. But have a pair of dual E5-2699v4 pair with decent RAM (384 gb ddr4 2400 speeed or something).

I am not really interested in getting fastest responses. As long as it spits out decent I am game.

Or am I dreaming of hosting a llm ? What are your thoughts?

Running on Ram would be awfully slow, especially those large models you could theoretically run with that amount of ram.

How slow are we talking about? Any idea?

And will that change the quality of the output?

Reguards

The inference speed is mainly dependend on memory bandwidth, Dual rtx 3090 runs 70B model around 15-20 tokens/second. There is some speed loss due to dual setup, yet DDR 2400 bandwidth is about 50xtimes less, So expect bellow 1 Token per second, where token is like 3-4 characters. And that is just a middle sized models, the large ones would be in fractions of tokens per second.

Gotcha. Well I wanna try it out though and see where it takes me. Thanks.

LM Studio is the easiest to deploy platform imo, try it and let me know the real results, I am curious.

LM studio...
This?
https://lmstudio.ai/

Reg

emgh · December 2024

@plumberg said:

@steny said:

@plumberg said:

@steny said:

@plumberg said:

@steny said:

@plumberg said:

@steny said:
@plumberg said:

any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

You need quite a lot of

@plumberg said:
any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

For a self hosted LLM you need a lot of VRAM especially for coding. For home PC probably the best model you can run is currently Qwen2.5-32B-Coder . For coding, unlike chat you need at least half(8 bits) precision, so that means you will need around 40GB Vram, e.g. Dual rtx 3090, which is a setup I am using. Bigger models, like Qwen2.5-72B or LLama-Nemotron-70B are better, but you won't run it at home at that precision unless you build 4xGPU rig. For the largest open weight models like Mistral Large, you need dual H100 to run it, so you definitely need to rent GPU and won't be cheap but you get around a performance of GPT-4o in coding there, probably slightly less since you will still run it in half precision only.

First off, thanks for the detailed post.

So here is the deal. I have 0 GPU. But have a pair of dual E5-2699v4 pair with decent RAM (384 gb ddr4 2400 speeed or something).

I am not really interested in getting fastest responses. As long as it spits out decent I am game.

Or am I dreaming of hosting a llm ? What are your thoughts?

Running on Ram would be awfully slow, especially those large models you could theoretically run with that amount of ram.

How slow are we talking about? Any idea?

And will that change the quality of the output?

Reguards

The inference speed is mainly dependend on memory bandwidth, Dual rtx 3090 runs 70B model around 15-20 tokens/second. There is some speed loss due to dual setup, yet DDR 2400 bandwidth is about 50xtimes less, So expect bellow 1 Token per second, where token is like 3-4 characters. And that is just a middle sized models, the large ones would be in fractions of tokens per second.

Gotcha. Well I wanna try it out though and see where it takes me. Thanks.

LM Studio is the easiest to deploy platform imo, try it and let me know the real results, I am curious.

LM studio...
This?
https://lmstudio.ai/

Reg

No

This https://mzunguhosting.ml/

Reg

plumberg · December 2024

@emgh said:

@plumberg said:

@steny said:

@plumberg said:

@steny said:

@plumberg said:

@steny said:

@plumberg said:

@steny said:
@plumberg said:

any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

You need quite a lot of

@plumberg said:
any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

For a self hosted LLM you need a lot of VRAM especially for coding. For home PC probably the best model you can run is currently Qwen2.5-32B-Coder . For coding, unlike chat you need at least half(8 bits) precision, so that means you will need around 40GB Vram, e.g. Dual rtx 3090, which is a setup I am using. Bigger models, like Qwen2.5-72B or LLama-Nemotron-70B are better, but you won't run it at home at that precision unless you build 4xGPU rig. For the largest open weight models like Mistral Large, you need dual H100 to run it, so you definitely need to rent GPU and won't be cheap but you get around a performance of GPT-4o in coding there, probably slightly less since you will still run it in half precision only.

First off, thanks for the detailed post.

So here is the deal. I have 0 GPU. But have a pair of dual E5-2699v4 pair with decent RAM (384 gb ddr4 2400 speeed or something).

I am not really interested in getting fastest responses. As long as it spits out decent I am game.

Or am I dreaming of hosting a llm ? What are your thoughts?

Running on Ram would be awfully slow, especially those large models you could theoretically run with that amount of ram.

How slow are we talking about? Any idea?

And will that change the quality of the output?

Reguards

The inference speed is mainly dependend on memory bandwidth, Dual rtx 3090 runs 70B model around 15-20 tokens/second. There is some speed loss due to dual setup, yet DDR 2400 bandwidth is about 50xtimes less, So expect bellow 1 Token per second, where token is like 3-4 characters. And that is just a middle sized models, the large ones would be in fractions of tokens per second.

Gotcha. Well I wanna try it out though and see where it takes me. Thanks.

LM Studio is the easiest to deploy platform imo, try it and let me know the real results, I am curious.

LM studio...
This?
https://lmstudio.ai/

Reg

No

This https://mzunguhosting.ml/

Reg

Gotcha

Reg

Saragoldfarb · December 2024

@emgh said:
@Saragoldfarb what do you think

Sorry, I'm drunk. I'll discuss in the morning depending who I wake up with.

donli · December 2024

@Saragoldfarb said:

@emgh said:
@Saragoldfarb what do you think

Sorry, I'm drunk. I'll discuss in the morning depending who I wake up with.

Be careful in Dubai, they still chop things off there. Regards.

emgh · December 2024

newtork is fast my man

Saragoldfarb · December 2024

@emgh said:
how many minutes of music did you guys listen to 2024

Almost every hour when I'm awake, the kid ain't nagging and I'm not working on my feetpic onlyfans. You do the math.

cybertech · December 2024

MOARRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR

donli · December 2024

@emgh said:
newtork is fast my man

Only when the power is on.

plumberg · December 2024

@emgh said:
newtork is fast my man

Reg

Saragoldfarb · December 2024

@donli said:

@Saragoldfarb said:

@emgh said:
@Saragoldfarb what do you think

Sorry, I'm drunk. I'll discuss in the morning depending who I wake up with.

Be careful in Dubai, they still chop things off there. Regards.

Sound advice.

emgh · December 2024

@donli said:

@emgh said:
newtork is fast my man

Only when the power is on.

Which is over 50 % of the time so it's quite good

emgh · December 2024

@plumberg said: Reg

Reg'd

emgh · December 2024

@Saragoldfarb said:

@donli said:

@Saragoldfarb said:

@emgh said:
@Saragoldfarb what do you think

Sorry, I'm drunk. I'll discuss in the morning depending who I wake up with.

Be careful in Dubai, they still chop things off there. Regards.

Sound advice.

You don't have to follow this advice in Sweden. Regards.

_MS_ · December 2024

Nokia Lumia 1020, the legend.

emgh · December 2024

I like my uptime down low, down low, down low, down low, down low, down low, down low
I like my servers all hacked, all hacked, all hacked, all hacked, all hacked, all hacked
Ay holla if ya like ya uptime down low, down low, down low, down low, down low, down low, down low
I like my servers all hacked (transposed George Becerra, I got you nigga), all hacked, all hacked, all hacked, all hacked, all hacked
Ay holla if ya like ya uptime down low, down low, down low, down low, down low, down low, down low

I like my servers all hacked, all hacked, all hacked, all hacked, all hacked, all hacked
Hook
I like my uptime down low and my servers all hacked
Can see me droppin' twenty-fours with a router in the rack
Ya like ya Switch-Ports hot and ya servers all hacked
If ya pings real high and ya networks pitch black
I like my uptime down low and my servers all hacked
Can see me droppin' twenty-fours with a router in the rack
Ya like ya Switch-Ports hot and ya servers all hacked
If ya pings real high and ya networks pitch black

Verse 1
I'm the scam in my city, ain't nobody fuckin' wit' me
You can ask the bot chasers and all the abuse lister's
I'm a known shit host; I always have spammers
And the skids and the phishers are always buyin' from me

Load's on triple-sixes, watchin' adult streamin'
Servers, I'm never fixing, got the customers screamin'
And ready to drop the session, ready to get to flappin'
Ready to show these providers a serious route dampen'
Pimp stolen' the prefixes, with the Resource-Keys and ROA's missin'
Krebs try to pursue me; it's nothin' but fucks-given

Addicted to shitty hostin', yes, always re-sellin' mitigation
Think it's bad now, you should've seen our chinese mitigation
Give unmetered ports to fraudulent carders so you gotta buy mitigation
Dont worry about performance cause it's never ending disturbance
Route-loops at 700 Wilshire and packets dropped at 350 Cermak
Still my peering dippin' while my prefix-filters slippin'

Hook
I like my uptime down low and my servers all hacked
Can see me droppin' twenty-fours with a router in the rack
Ya like ya Switch-Ports hot and ya servers all hacked
If ya pings real high and ya networks pitch black
I like my uptime down low and my servers all hacked
Can see me droppin' twenty-fours with a router in the rack
Ya like ya Switch-Ports hot and ya servers all hacked
If ya pings real high and ya networks pitch black

Verse 2
We got no egress filters; I know you heard about us
Client ports get to slammin', and we ain't worried 'bout much
On this Juniper I clutch
In NSFocus I trust, if a SYN(sin) flood starts, bet our transit turns to dust
Got ya peering in the Rib and ya circuit fucked up
Nodes over-provisioned thought a VPS wouldn't bust
Sessions depart, George can't restart, this eBay Cisco'
Lose ten deals daily, tryin' to steal five mo'
Ya see the Router slowin' down CPU Load
On the oldest BIRD(Bird) daemon, fuckin' GNU hos
On a bill and half with my partner Young George (George)
Dodgin' Debt Collectors on IP version fo'
Other hackers', crackers, told dudes I'm a joke
With some stolen code and a network all slow
(Hey), nigga, don't you hit me 'less you buyin' MineCraft nodes'
My routing-table missin', and my IDS bitchin'

Hook
I like my uptime down low and my servers all hacked
Can see me droppin' twenty-fours with a router in the rack
Ya like ya Switch-Ports hot and ya servers all hacked
If ya pings real high and ya networks pitch black
I like my uptime down low and my servers all hacked
Can see me droppin' twenty-fours with a router in the rack
Ya like ya Switch-Ports hot and ya servers all hacked
If ya pings real high and ya networks pitch black

Verse 3
I buy the bandwidth down under, man, somebody better tell 'em
'For I lose a hundred million packets and have every server bailin'
I got some Indians on support and my connection-table all heavy
And now our whole control-plane is flowin really fuckin' sketchy
If ya ever think ya buyin' from me, just forget it
The page never loads just the numbers 503 in it
BGP has me stuck, and the Router's all fucked
Ya think the sessions really up?, you got life fucked up
A couple lines in LD_Preload(LD Preload) will have ya night fucked up
Connection live? Connection died? Guess it might be up.
Meanwhile, George is pollin' SNMP pretending we give a fuck
Paid posts on Low-End-Talk every time I get a buck

Hook
I like my uptime down low and my servers all hacked
Can see me droppin' twenty-fours with a router in the rack
Ya like ya Switch-Ports hot and ya servers all hacked
If ya pings real high and ya networks pitch black
I like my uptime down low and my servers all hacked
Can see me droppin' twenty-fours with a router in the rack
Ya like ya Switch-Ports hot and ya servers all hacked
If ya pings real high and ya networks pitch black

FAT32 · December 2024

You guys are so active

emgh · December 2024

@FAT32 said:
You guys are so active

u n lik3 up tim down l0w ?

Saragoldfarb · December 2024

@emgh said:

@Saragoldfarb said:

@donli said:

@Saragoldfarb said:

@emgh said:
@Saragoldfarb what do you think

Sorry, I'm drunk. I'll discuss in the morning depending who I wake up with.

Be careful in Dubai, they still chop things off there. Regards.

Sound advice.

You don't have to follow this advice in Sweden. Regards.

I like chopping wood...

FAT32 · December 2024

@emgh said:

@FAT32 said:
You guys are so active

u n lik3 up tim down l0w ?

Lik3 mi srvr al haked

emgh · December 2024

@FAT32 said:

@emgh said:

@FAT32 said:
You guys are so active

u n lik3 up tim down l0w ?

Lik3 mi srvr al haked

post your most important server's ip and a clue to the root password (don't forget to allow password and root login)

Saragoldfarb · December 2024

@emgh said:

@FAT32 said:

@emgh said:

@FAT32 said:
You guys are so active

u n lik3 up tim down l0w ?

Lik3 mi srvr al haked

post your most important server's ip and a clue to the root password (don't forget to allow password and root login)

Ip: 69.69.69.69
Clue: YOU-KNOW-WHO

FAT32 · December 2024

@emgh said:

@FAT32 said:

@emgh said:

@FAT32 said:
You guys are so active

u n lik3 up tim down l0w ?

Lik3 mi srvr al haked

post your most important server's ip and a clue to the root password (don't forget to allow password and root login)

10.0.35.1
6 characters all lowercase

Howdy, Stranger!

Categories

In this Discussion

[2024 EXTENDED] Black Friday / Cyber Monday: FLASH SALE & MEGATHREAD

Comments

MOARRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR

Howdy, Stranger!

Quick Links

Categories

In this Discussion

[2024 EXTENDED] Black Friday / Cyber Monday: FLASH SALE & MEGATHREAD

Comments

MOARRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR