Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

[2024 EXTENDED] Black Friday / Cyber Monday: FLASH SALE & MEGATHREAD

1101710181020102210231337

Comments

  • @plumberg said:

    @steny said:

    @plumberg said:

    @steny said:

    @plumberg said:

    @steny said:
    @plumberg said:

    any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

    You need quite a lot of

    @plumberg said:
    any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

    For a self hosted LLM you need a lot of VRAM especially for coding. For home PC probably the best model you can run is currently Qwen2.5-32B-Coder . For coding, unlike chat you need at least half(8 bits) precision, so that means you will need around 40GB Vram, e.g. Dual rtx 3090, which is a setup I am using. Bigger models, like Qwen2.5-72B or LLama-Nemotron-70B are better, but you won't run it at home at that precision unless you build 4xGPU rig. For the largest open weight models like Mistral Large, you need dual H100 to run it, so you definitely need to rent GPU and won't be cheap but you get around a performance of GPT-4o in coding there, probably slightly less since you will still run it in half precision only.

    First off, thanks for the detailed post.

    So here is the deal. I have 0 GPU. But have a pair of dual E5-2699v4 pair with decent RAM (384 gb ddr4 2400 speeed or something).

    I am not really interested in getting fastest responses. As long as it spits out decent I am game.

    Or am I dreaming of hosting a llm ? What are your thoughts?

    Running on Ram would be awfully slow, especially those large models you could theoretically run with that amount of ram.

    How slow are we talking about? Any idea?

    And will that change the quality of the output?

    Reguards

    The inference speed is mainly dependend on memory bandwidth, Dual rtx 3090 runs 70B model around 15-20 tokens/second. There is some speed loss due to dual setup, yet DDR 2400 bandwidth is about 50xtimes less, So expect bellow 1 Token per second, where token is like 3-4 characters. And that is just a middle sized models, the large ones would be in fractions of tokens per second.

    Gotcha. Well I wanna try it out though and see where it takes me. Thanks.

    LM Studio is the easiest to deploy platform imo, try it and let me know the real results, I am curious.

    Thanked by 3plumberg uptown r3k
  • image

    Thanked by 4Savvy emgh FAT32 r3k
  • 18+

  • emghemgh Member, Megathread Squad

    @steny said: LM Studio is the easiest to deploy platform imo, try it and let me know the real results, I am curious.

    what my i m can runs

    Apple M3
    16 GB RAM
    3 cm baenis

  • I thought I was too clumsy and entered the wrong promo code, but it turns out it’s really gone. Thank you for your guidance. Regards!

    Thanked by 3emgh uptown r3k
  • image

    Thanked by 4vr10 uptown Savvy r3k
  • emghemgh Member, Megathread Squad

    @tansel said:
    I thought I was too clumsy and entered the wrong promo code, but it turns out it’s really gone. Thank you for your guidance. Regards!

    Re

    Thanked by 2uptown r3k
  • plumbergplumberg Veteran, Megathread Squad

    @steny said:

    @plumberg said:

    @steny said:

    @plumberg said:

    @steny said:

    @plumberg said:

    @steny said:
    @plumberg said:

    any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

    You need quite a lot of

    @plumberg said:
    any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

    For a self hosted LLM you need a lot of VRAM especially for coding. For home PC probably the best model you can run is currently Qwen2.5-32B-Coder . For coding, unlike chat you need at least half(8 bits) precision, so that means you will need around 40GB Vram, e.g. Dual rtx 3090, which is a setup I am using. Bigger models, like Qwen2.5-72B or LLama-Nemotron-70B are better, but you won't run it at home at that precision unless you build 4xGPU rig. For the largest open weight models like Mistral Large, you need dual H100 to run it, so you definitely need to rent GPU and won't be cheap but you get around a performance of GPT-4o in coding there, probably slightly less since you will still run it in half precision only.

    First off, thanks for the detailed post.

    So here is the deal. I have 0 GPU. But have a pair of dual E5-2699v4 pair with decent RAM (384 gb ddr4 2400 speeed or something).

    I am not really interested in getting fastest responses. As long as it spits out decent I am game.

    Or am I dreaming of hosting a llm ? What are your thoughts?

    Running on Ram would be awfully slow, especially those large models you could theoretically run with that amount of ram.

    How slow are we talking about? Any idea?

    And will that change the quality of the output?

    Reguards

    The inference speed is mainly dependend on memory bandwidth, Dual rtx 3090 runs 70B model around 15-20 tokens/second. There is some speed loss due to dual setup, yet DDR 2400 bandwidth is about 50xtimes less, So expect bellow 1 Token per second, where token is like 3-4 characters. And that is just a middle sized models, the large ones would be in fractions of tokens per second.

    Gotcha. Well I wanna try it out though and see where it takes me. Thanks.

    LM Studio is the easiest to deploy platform imo, try it and let me know the real results, I am curious.

    LM studio...
    This?
    https://lmstudio.ai/

    Reg

    Thanked by 3emgh uptown r3k
  • emghemgh Member, Megathread Squad

    @plumberg said:

    @steny said:

    @plumberg said:

    @steny said:

    @plumberg said:

    @steny said:

    @plumberg said:

    @steny said:
    @plumberg said:

    any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

    You need quite a lot of

    @plumberg said:
    any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

    For a self hosted LLM you need a lot of VRAM especially for coding. For home PC probably the best model you can run is currently Qwen2.5-32B-Coder . For coding, unlike chat you need at least half(8 bits) precision, so that means you will need around 40GB Vram, e.g. Dual rtx 3090, which is a setup I am using. Bigger models, like Qwen2.5-72B or LLama-Nemotron-70B are better, but you won't run it at home at that precision unless you build 4xGPU rig. For the largest open weight models like Mistral Large, you need dual H100 to run it, so you definitely need to rent GPU and won't be cheap but you get around a performance of GPT-4o in coding there, probably slightly less since you will still run it in half precision only.

    First off, thanks for the detailed post.

    So here is the deal. I have 0 GPU. But have a pair of dual E5-2699v4 pair with decent RAM (384 gb ddr4 2400 speeed or something).

    I am not really interested in getting fastest responses. As long as it spits out decent I am game.

    Or am I dreaming of hosting a llm ? What are your thoughts?

    Running on Ram would be awfully slow, especially those large models you could theoretically run with that amount of ram.

    How slow are we talking about? Any idea?

    And will that change the quality of the output?

    Reguards

    The inference speed is mainly dependend on memory bandwidth, Dual rtx 3090 runs 70B model around 15-20 tokens/second. There is some speed loss due to dual setup, yet DDR 2400 bandwidth is about 50xtimes less, So expect bellow 1 Token per second, where token is like 3-4 characters. And that is just a middle sized models, the large ones would be in fractions of tokens per second.

    Gotcha. Well I wanna try it out though and see where it takes me. Thanks.

    LM Studio is the easiest to deploy platform imo, try it and let me know the real results, I am curious.

    LM studio...
    This?
    https://lmstudio.ai/

    Reg

    No

    This https://mzunguhosting.ml/

    Reg

  • plumbergplumberg Veteran, Megathread Squad

    @emgh said:

    @plumberg said:

    @steny said:

    @plumberg said:

    @steny said:

    @plumberg said:

    @steny said:

    @plumberg said:

    @steny said:
    @plumberg said:

    any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

    You need quite a lot of

    @plumberg said:
    any recommendations for selfhosted llm which would help with code generation? Claude does amazingly well, but I end up making it work so much that I am rate-limited.

    For a self hosted LLM you need a lot of VRAM especially for coding. For home PC probably the best model you can run is currently Qwen2.5-32B-Coder . For coding, unlike chat you need at least half(8 bits) precision, so that means you will need around 40GB Vram, e.g. Dual rtx 3090, which is a setup I am using. Bigger models, like Qwen2.5-72B or LLama-Nemotron-70B are better, but you won't run it at home at that precision unless you build 4xGPU rig. For the largest open weight models like Mistral Large, you need dual H100 to run it, so you definitely need to rent GPU and won't be cheap but you get around a performance of GPT-4o in coding there, probably slightly less since you will still run it in half precision only.

    First off, thanks for the detailed post.

    So here is the deal. I have 0 GPU. But have a pair of dual E5-2699v4 pair with decent RAM (384 gb ddr4 2400 speeed or something).

    I am not really interested in getting fastest responses. As long as it spits out decent I am game.

    Or am I dreaming of hosting a llm ? What are your thoughts?

    Running on Ram would be awfully slow, especially those large models you could theoretically run with that amount of ram.

    How slow are we talking about? Any idea?

    And will that change the quality of the output?

    Reguards

    The inference speed is mainly dependend on memory bandwidth, Dual rtx 3090 runs 70B model around 15-20 tokens/second. There is some speed loss due to dual setup, yet DDR 2400 bandwidth is about 50xtimes less, So expect bellow 1 Token per second, where token is like 3-4 characters. And that is just a middle sized models, the large ones would be in fractions of tokens per second.

    Gotcha. Well I wanna try it out though and see where it takes me. Thanks.

    LM Studio is the easiest to deploy platform imo, try it and let me know the real results, I am curious.

    LM studio...
    This?
    https://lmstudio.ai/

    Reg

    No

    This https://mzunguhosting.ml/

    Reg

    Gotcha

    Reg

    Thanked by 3emgh uptown r3k
  • SaragoldfarbSaragoldfarb Member, Megathread Squad

    @emgh said:
    @Saragoldfarb what do you think

    Sorry, I'm drunk. I'll discuss in the morning depending who I wake up with.

  • donlidonli Member
    edited December 2024

    @Saragoldfarb said:

    @emgh said:
    @Saragoldfarb what do you think

    Sorry, I'm drunk. I'll discuss in the morning depending who I wake up with.

    Be careful in Dubai, they still chop things off there. Regards.

  • emghemgh Member, Megathread Squad

    newtork is fast my man

  • SaragoldfarbSaragoldfarb Member, Megathread Squad

    @emgh said:
    how many minutes of music did you guys listen to 2024

    Almost every hour when I'm awake, the kid ain't nagging and I'm not working on my feetpic onlyfans. You do the math.

    Thanked by 3emgh uptown r3k
  • MOARRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR

    Thanked by 5emgh FAT32 uptown Savvy r3k
  • @emgh said:
    newtork is fast my man

    Only when the power is on.

    Thanked by 3emgh uptown r3k
  • plumbergplumberg Veteran, Megathread Squad

    @emgh said:
    newtork is fast my man

    Reg

  • SaragoldfarbSaragoldfarb Member, Megathread Squad

    @donli said:

    @Saragoldfarb said:

    @emgh said:
    @Saragoldfarb what do you think

    Sorry, I'm drunk. I'll discuss in the morning depending who I wake up with.

    Be careful in Dubai, they still chop things off there. Regards.

    Sound advice.

    Thanked by 3emgh uptown r3k
  • emghemgh Member, Megathread Squad

    @donli said:

    @emgh said:
    newtork is fast my man

    Only when the power is on.

    Which is over 50 % of the time so it's quite good

    Thanked by 2uptown r3k
  • emghemgh Member, Megathread Squad

    Reg'd

    Thanked by 3plumberg uptown r3k
  • emghemgh Member, Megathread Squad

    @Saragoldfarb said:

    @donli said:

    @Saragoldfarb said:

    @emgh said:
    @Saragoldfarb what do you think

    Sorry, I'm drunk. I'll discuss in the morning depending who I wake up with.

    Be careful in Dubai, they still chop things off there. Regards.

    Sound advice.

    You don't have to follow this advice in Sweden. Regards.

    Thanked by 3plumberg uptown r3k
  • Nokia Lumia 1020, the legend.

    Thanked by 3emgh uptown r3k
  • emghemgh Member, Megathread Squad

    I like my uptime down low, down low, down low, down low, down low, down low, down low
    I like my servers all hacked, all hacked, all hacked, all hacked, all hacked, all hacked
    Ay holla if ya like ya uptime down low, down low, down low, down low, down low, down low, down low
    I like my servers all hacked (transposed George Becerra, I got you nigga), all hacked, all hacked, all hacked, all hacked, all hacked
    Ay holla if ya like ya uptime down low, down low, down low, down low, down low, down low, down low

    I like my servers all hacked, all hacked, all hacked, all hacked, all hacked, all hacked
    Hook
    I like my uptime down low and my servers all hacked
    Can see me droppin' twenty-fours with a router in the rack
    Ya like ya Switch-Ports hot and ya servers all hacked
    If ya pings real high and ya networks pitch black
    I like my uptime down low and my servers all hacked
    Can see me droppin' twenty-fours with a router in the rack
    Ya like ya Switch-Ports hot and ya servers all hacked
    If ya pings real high and ya networks pitch black

    Verse 1
    I'm the scam in my city, ain't nobody fuckin' wit' me
    You can ask the bot chasers and all the abuse lister's
    I'm a known shit host; I always have spammers
    And the skids and the phishers are always buyin' from me

    Load's on triple-sixes, watchin' adult streamin'
    Servers, I'm never fixing, got the customers screamin'
    And ready to drop the session, ready to get to flappin'
    Ready to show these providers a serious route dampen'
    Pimp stolen' the prefixes, with the Resource-Keys and ROA's missin'
    Krebs try to pursue me; it's nothin' but fucks-given

    Addicted to shitty hostin', yes, always re-sellin' mitigation
    Think it's bad now, you should've seen our chinese mitigation
    Give unmetered ports to fraudulent carders so you gotta buy mitigation
    Dont worry about performance cause it's never ending disturbance
    Route-loops at 700 Wilshire and packets dropped at 350 Cermak
    Still my peering dippin' while my prefix-filters slippin'

    Hook
    I like my uptime down low and my servers all hacked
    Can see me droppin' twenty-fours with a router in the rack
    Ya like ya Switch-Ports hot and ya servers all hacked
    If ya pings real high and ya networks pitch black
    I like my uptime down low and my servers all hacked
    Can see me droppin' twenty-fours with a router in the rack
    Ya like ya Switch-Ports hot and ya servers all hacked
    If ya pings real high and ya networks pitch black

    Verse 2
    We got no egress filters; I know you heard about us
    Client ports get to slammin', and we ain't worried 'bout much
    On this Juniper I clutch
    In NSFocus I trust, if a SYN(sin) flood starts, bet our transit turns to dust
    Got ya peering in the Rib and ya circuit fucked up
    Nodes over-provisioned thought a VPS wouldn't bust
    Sessions depart, George can't restart, this eBay Cisco'
    Lose ten deals daily, tryin' to steal five mo'
    Ya see the Router slowin' down CPU Load
    On the oldest BIRD(Bird) daemon, fuckin' GNU hos
    On a bill and half with my partner Young George (George)
    Dodgin' Debt Collectors on IP version fo'
    Other hackers', crackers, told dudes I'm a joke
    With some stolen code and a network all slow
    (Hey), nigga, don't you hit me 'less you buyin' MineCraft nodes'
    My routing-table missin', and my IDS bitchin'

    Hook
    I like my uptime down low and my servers all hacked
    Can see me droppin' twenty-fours with a router in the rack
    Ya like ya Switch-Ports hot and ya servers all hacked
    If ya pings real high and ya networks pitch black
    I like my uptime down low and my servers all hacked
    Can see me droppin' twenty-fours with a router in the rack
    Ya like ya Switch-Ports hot and ya servers all hacked
    If ya pings real high and ya networks pitch black

    Verse 3
    I buy the bandwidth down under, man, somebody better tell 'em
    'For I lose a hundred million packets and have every server bailin'
    I got some Indians on support and my connection-table all heavy
    And now our whole control-plane is flowin really fuckin' sketchy
    If ya ever think ya buyin' from me, just forget it
    The page never loads just the numbers 503 in it
    BGP has me stuck, and the Router's all fucked
    Ya think the sessions really up?, you got life fucked up
    A couple lines in LD_Preload(LD Preload) will have ya night fucked up
    Connection live? Connection died? Guess it might be up.
    Meanwhile, George is pollin' SNMP pretending we give a fuck
    Paid posts on Low-End-Talk every time I get a buck

    Hook
    I like my uptime down low and my servers all hacked
    Can see me droppin' twenty-fours with a router in the rack
    Ya like ya Switch-Ports hot and ya servers all hacked
    If ya pings real high and ya networks pitch black
    I like my uptime down low and my servers all hacked
    Can see me droppin' twenty-fours with a router in the rack
    Ya like ya Switch-Ports hot and ya servers all hacked
    If ya pings real high and ya networks pitch black

    Thanked by 4lukast__ uptown admax r3k
  • FAT32FAT32 Administrator, Deal Compiler Extraordinaire

    You guys are so active :joy:

  • emghemgh Member, Megathread Squad

    @FAT32 said:
    You guys are so active :joy:

    u n lik3 up tim down l0w ?

    Thanked by 4FAT32 uptown Savvy r3k
  • SaragoldfarbSaragoldfarb Member, Megathread Squad

    @emgh said:

    @Saragoldfarb said:

    @donli said:

    @Saragoldfarb said:

    @emgh said:
    @Saragoldfarb what do you think

    Sorry, I'm drunk. I'll discuss in the morning depending who I wake up with.

    Be careful in Dubai, they still chop things off there. Regards.

    Sound advice.

    You don't have to follow this advice in Sweden. Regards.

    I like chopping wood...

    Thanked by 2uptown r3k
  • FAT32FAT32 Administrator, Deal Compiler Extraordinaire

    @emgh said:

    @FAT32 said:
    You guys are so active :joy:

    u n lik3 up tim down l0w ?

    Lik3 mi srvr al haked

  • emghemgh Member, Megathread Squad

    @FAT32 said:

    @emgh said:

    @FAT32 said:
    You guys are so active :joy:

    u n lik3 up tim down l0w ?

    Lik3 mi srvr al haked

    post your most important server's ip and a clue to the root password (don't forget to allow password and root login)

  • SaragoldfarbSaragoldfarb Member, Megathread Squad

    @emgh said:

    @FAT32 said:

    @emgh said:

    @FAT32 said:
    You guys are so active :joy:

    u n lik3 up tim down l0w ?

    Lik3 mi srvr al haked

    post your most important server's ip and a clue to the root password (don't forget to allow password and root login)

    Ip: 69.69.69.69
    Clue: YOU-KNOW-WHO

  • FAT32FAT32 Administrator, Deal Compiler Extraordinaire

    @emgh said:

    @FAT32 said:

    @emgh said:

    @FAT32 said:
    You guys are so active :joy:

    u n lik3 up tim down l0w ?

    Lik3 mi srvr al haked

    post your most important server's ip and a clue to the root password (don't forget to allow password and root login)

    10.0.35.1
    6 characters all lowercase

This discussion has been closed.