Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

LLM (deepseek?) on KimSufi server

2456

Comments

  • jndjnd Member
    edited January 2025

    @beanman109 said:

    @Levi said:

    @beanman109 said: I got a 90 line HTML/JS code prompt for a clock / countdown timer website completed in about 1-2 minutes (rough guess)

    That's bad... very bad. In cgpt 4o it is like 10 - 15 seconds or less.

    It's running on a E3-1275 v5 that I pay $100 a year for - cheaper than ChatGPT ¯_(ツ)_/¯

    $100 per year is quite expensive for such slow output. I mean I just checked one of the better coder models at OpenRouter, you can go through more than 400M tokens before reaching your $100:
    Qwen2.5 Coder 32B Instruct
    qwen/qwen-2.5-coder-32b-instruct
    Created Nov 11, 2024
    33,000 context (might be too low for larger projects)
    $0.07/M input tokens, $0.16/M output tokens

  • beanman109beanman109 Member, Host Rep, Megathread Squad

    @jnd said:

    @beanman109 said:

    @Levi said:

    @beanman109 said: I got a 90 line HTML/JS code prompt for a clock / countdown timer website completed in about 1-2 minutes (rough guess)

    That's bad... very bad. In cgpt 4o it is like 10 - 15 seconds or less.

    It's running on a E3-1275 v5 that I pay $100 a year for - cheaper than ChatGPT ¯_(ツ)_/¯

    $100 per year is quite expensive for such slow output. I mean I just checked one of the better coder models at OpenRouter, you can go through more than 400M tokens before reaching your $100:
    Qwen2.5 Coder 32B Instruct
    qwen/qwen-2.5-coder-32b-instruct
    Created Nov 11, 2024
    33,000 context (might be too low for larger projects)
    $0.07/M input tokens, $0.16/M output tokens

    sir this is a website about selfhosting and servers
    to me it adds up to pay $100 per year for a box that can selfhost a remote LLM reasonably well for what i intend to use it for, that's not including the fact that server can have other uses while it's not cpu maxed outputting prompts

    also i don't understand LLM tokens so i don't know how quickly i would go through 400M of them, could smash them out in a month for all i know

  • beanman109beanman109 Member, Host Rep, Megathread Squad

    @jnd said: $100 per year is quite expensive for such slow output. I mean I just checked one of the better coder models at OpenRouter, you can go through more than 400M tokens before reaching your $100:

    i just tried openrouter with $5 of credit, the $5 turned into $4.4 after they deducted fees? then i made a single prompt and now i'm at $4.37

    this aint it chief

  • jndjnd Member

    @beanman109 said:

    sir this is a website about selfhosting and servers
    to me it adds up to pay $100 per year for a box that can selfhost a remote LLM reasonably well for what i intend to use it for, that's not including the fact that server can have other uses while it's not cpu maxed outputting prompts

    also i don't understand LLM tokens so i don't know how quickly i would go through 400M of them, could smash them out in a month for all i know

    Sure, I know because I tried to self host LLM too and ended up giving away the VPS because it was waste of time without GPU (and with it you will pay much more). I'm just saying that you will be able to slowly run some tiny model with no so great output quality or you can pay pennies for something better. Or run it at your home PC.

    Roughly speaking one token is one word. You can try your typical task and see how many tokens does it consume. For small tasks it won't be much at all.

  • HostSlickHostSlick 🚩 Host Rep Tag Suspended
    edited January 2025

    I planned buy 50 or 100 Single/Dual Epyc Servers for the Start when we got another Batch of Racks which will be delivered in Summer.
    Time to offer GPU Options i guess. :smile:

    The Gigabyte G292-Z20 can fit up to 4 GPUs.

  • @HostSlick said: Time to offer GPU Options i guess.

    Unfeasable. Either you concentrate entirely on GPU business or make GPU hosting a side gig (very expensive one). GPU's goes old very fast, faster than CPU. If CPU can be milked for decade, GPU - 2 - 3 years before 2 generations pass. And you are done. Power draw alone is insane and no one wants to pay for 3 years old card...

  • HostSlickHostSlick 🚩 Host Rep Tag Suspended
    edited January 2025

    @Levi said:

    @HostSlick said: Time to offer GPU Options i guess.

    Unfeasable. Either you concentrate entirely on GPU business or make GPU hosting a side gig (very expensive one). GPU's goes old very fast, faster than CPU. If CPU can be milked for decade, GPU - 2 - 3 years before 2 generations pass. And you are done. Power draw alone is insane and no one wants to pay for 3 years old card...

    Side gig to try it out. The said Gigabyte servers i get myself anyway for normal uses. Why not do it then when client asks for quote and u can fullfill?

    And what you think about Nvidia Tesla A100? I recently got offer 4000€ each

    When client asks for it and has the money to rent it, why not. I dont except this going in bulk.

  • beanman109beanman109 Member, Host Rep, Megathread Squad

    @HostSlick said: And what you think about Nvidia Tesla A100? I recently got offer 4000€ each

    Sounds like those badboys have been mined on

  • 4000€ for gpu :o

  • wadhahwadhah Member, Host Rep

    @HostSlick said:

    @Levi said:

    @HostSlick said: Time to offer GPU Options i guess.

    Unfeasable. Either you concentrate entirely on GPU business or make GPU hosting a side gig (very expensive one). GPU's goes old very fast, faster than CPU. If CPU can be milked for decade, GPU - 2 - 3 years before 2 generations pass. And you are done. Power draw alone is insane and no one wants to pay for 3 years old card...

    Side gig to try it out. The said Gigabyte servers i get myself anyway for normal uses. Why not do it then when client asks for quote and u can fullfill?

    And what you think about Nvidia Tesla A100? I recently got offer 4000€ each

    When client asks for it and has the money to rent it, why not. I dont except this going in bulk.

    Not sure if i'm allowed to link sites or not so this is a screenshot from a goole search for the a100

    Is this not the same card you were offered? At less than half price? I can link the website if allowed (or just google a100 price), the 80gb version is 18.5k

    Thanked by 1farsighter
  • HostSlickHostSlick 🚩 Host Rep Tag Suspended

    @wadhah said:

    @HostSlick said:

    @Levi said:

    @HostSlick said: Time to offer GPU Options i guess.

    Unfeasable. Either you concentrate entirely on GPU business or make GPU hosting a side gig (very expensive one). GPU's goes old very fast, faster than CPU. If CPU can be milked for decade, GPU - 2 - 3 years before 2 generations pass. And you are done. Power draw alone is insane and no one wants to pay for 3 years old card...

    Side gig to try it out. The said Gigabyte servers i get myself anyway for normal uses. Why not do it then when client asks for quote and u can fullfill?

    And what you think about Nvidia Tesla A100? I recently got offer 4000€ each

    When client asks for it and has the money to rent it, why not. I dont except this going in bulk.

    Not sure if i'm allowed to link sites or not so this is a screenshot from a goole search for the a100

    Is this not the same card you were offered? At less than half price? I can link the website if allowed (or just google a100 price), the 80gb version is 18.5k

    40gb one

  • wadhahwadhah Member, Host Rep

    @HostSlick said:

    @wadhah said:

    @HostSlick said:

    @Levi said:

    @HostSlick said: Time to offer GPU Options i guess.

    Unfeasable. Either you concentrate entirely on GPU business or make GPU hosting a side gig (very expensive one). GPU's goes old very fast, faster than CPU. If CPU can be milked for decade, GPU - 2 - 3 years before 2 generations pass. And you are done. Power draw alone is insane and no one wants to pay for 3 years old card...

    Side gig to try it out. The said Gigabyte servers i get myself anyway for normal uses. Why not do it then when client asks for quote and u can fullfill?

    And what you think about Nvidia Tesla A100? I recently got offer 4000€ each

    When client asks for it and has the money to rent it, why not. I dont except this going in bulk.

    Not sure if i'm allowed to link sites or not so this is a screenshot from a goole search for the a100

    Is this not the same card you were offered? At less than half price? I can link the website if allowed (or just google a100 price), the 80gb version is 18.5k

    40gb one

    Holy shit man, is the offer you got reliable or just random?

  • HostSlickHostSlick 🚩 Host Rep Tag Suspended
    edited January 2025

    @wadhah said:

    @HostSlick said:

    @wadhah said:

    @HostSlick said:

    @Levi said:

    @HostSlick said: Time to offer GPU Options i guess.

    Unfeasable. Either you concentrate entirely on GPU business or make GPU hosting a side gig (very expensive one). GPU's goes old very fast, faster than CPU. If CPU can be milked for decade, GPU - 2 - 3 years before 2 generations pass. And you are done. Power draw alone is insane and no one wants to pay for 3 years old card...

    Side gig to try it out. The said Gigabyte servers i get myself anyway for normal uses. Why not do it then when client asks for quote and u can fullfill?

    And what you think about Nvidia Tesla A100? I recently got offer 4000€ each

    When client asks for it and has the money to rent it, why not. I dont except this going in bulk.

    Not sure if i'm allowed to link sites or not so this is a screenshot from a goole search for the a100

    Is this not the same card you were offered? At less than half price? I can link the website if allowed (or just google a100 price), the 80gb version is 18.5k

    40gb one

    Holy shit man, is the offer you got reliable or just random?

    Reliable supplier but The price apply for Minimum 10 pieces.

    eBay you can find alot though for 4500-4600, so.

    Thanked by 1wadhah
  • Buy an M2 apple devices

  • @beanman109 said:

    @cainyxues said:
    @beanman109 isn't there an uncensored model too [just saying]

    Not as far as I know? Unless the locally run version is uncensored

    yaa the local hosted version is uncensored, btw sorry to hear that phi-4 was not upto the mark. I saw the reviews online, any they were quite on point so I thought maybe they are good.

  • gksgks Member

    @rattlecattle said:
    Been running Deepseek R1 the distilled models, on a 128 GB dedi with a 8 GB GTX 1080 GPU. Its performance is acceptable so far.

    Can only run the distilled models of deepseek r1. Running the actual deepseek r1 isn't possible on consumer hardware anyway.

    Also the distilled models are not the same as the actual r1. Its more like say the base LLama model fine tuned with DeepSeek R1.

    Ryzen, DDR 5, and 2 x 8 GB gpu works?

  • @gks said:
    Ryzen, DDR 5, and 2 x 8 GB gpu works?

    Why not? It will run for sure, but what's the GPU model [AMD are generally bad as I have heard]

  • gksgks Member

    @cainyxues said:

    @gks said:
    Ryzen, DDR 5, and 2 x 8 GB gpu works?

    Why not? It will run for sure, but what's the GPU model [AMD are generally bad as I have heard]

    Nvidia used one on eBay would be fine start. Not worth for buying new expensive one for homelab

    Thanked by 1cainyxues
  • @gks said:

    @cainyxues said:

    @gks said:
    Ryzen, DDR 5, and 2 x 8 GB gpu works?

    Why not? It will run for sure, but what's the GPU model [AMD are generally bad as I have heard]

    Nvidia used one on eBay would be fine start. Not worth for buying new expensive one for homelab

    Yup, would be fine. but it might also be better to look into m series or k series GPUs as they have more vram & also work great for LLMs

  • @gks said:

    @rattlecattle said:
    Been running Deepseek R1 the distilled models, on a 128 GB dedi with a 8 GB GTX 1080 GPU. Its performance is acceptable so far.

    Can only run the distilled models of deepseek r1. Running the actual deepseek r1 isn't possible on consumer hardware anyway.

    Also the distilled models are not the same as the actual r1. Its more like say the base LLama model fine tuned with DeepSeek R1.

    Ryzen, DDR 5, and 2 x 8 GB gpu works?

    It does. Even can get away with older Xeons. The GPU - single or multiple matters the most. If running via Ollama it would automatically offload the layers to both the GPUs. The ideal config is to fit in the complete model in the combined GPU memory.

    That being said there is something like https://github.com/exo-explore/exo which aims to combine multiple devices as one powerful inference cluster. Haven't used though.

    Thanked by 1cainyxues
  • @rattlecattle said:
    It does. Even can get away with older Xeons. The GPU - single or multiple matters the most. If running via Ollama it would automatically offload the layers to both the GPUs. The ideal config is to fit in the complete model in the combined GPU memory.

    That being said there is something like https://github.com/exo-explore/exo which aims to combine multiple devices as one powerful inference cluster. Haven't used though.

    How does it do that since I have learnt that the more the slow data transfer the more it will bottleneck the performance [There was a very famous youtuber who tried this with mac minis through ethernet connection if I remember and it bottlenecked ]

  • NeoonNeoon Community Contributor, Veteran

    @rattlecattle said:

    @gks said:

    @rattlecattle said:
    Been running Deepseek R1 the distilled models, on a 128 GB dedi with a 8 GB GTX 1080 GPU. Its performance is acceptable so far.

    Can only run the distilled models of deepseek r1. Running the actual deepseek r1 isn't possible on consumer hardware anyway.

    Also the distilled models are not the same as the actual r1. Its more like say the base LLama model fine tuned with DeepSeek R1.

    Ryzen, DDR 5, and 2 x 8 GB gpu works?

    It does. Even can get away with older Xeons. The GPU - single or multiple matters the most. If running via Ollama it would automatically offload the layers to both the GPUs. The ideal config is to fit in the complete model in the combined GPU memory.

    That being said there is something like https://github.com/exo-explore/exo which aims to combine multiple devices as one powerful inference cluster. Haven't used though.

    So we buy more KS-Game-LE to make an A.I cluster? ok

    Thanked by 2cainyxues tux
  • gksgks Member

    @cainyxues said:

    @rattlecattle said:
    It does. Even can get away with older Xeons. The GPU - single or multiple matters the most. If running via Ollama it would automatically offload the layers to both the GPUs. The ideal config is to fit in the complete model in the combined GPU memory.

    That being said there is something like https://github.com/exo-explore/exo which aims to combine multiple devices as one powerful inference cluster. Haven't used though.

    How does it do that since I have learnt that the more the slow data transfer the more it will bottleneck the performance [There was a very famous youtuber who tried this with mac minis through ethernet connection if I remember and it bottlenecked ]

    I prefer dedicated servers with GPUs, motherboard bus rather than using ethernet for shuffling. I have a lots of data in old damn hdd. New trend in AI now to generate sql for data warehouse and generate reports and dashboards using AI itself.

    Thanked by 1cainyxues
  • allthemtingsallthemtings Member, Megathread Squad

    @Neoon said:

    @rattlecattle said:

    @gks said:

    @rattlecattle said:
    Been running Deepseek R1 the distilled models, on a 128 GB dedi with a 8 GB GTX 1080 GPU. Its performance is acceptable so far.

    Can only run the distilled models of deepseek r1. Running the actual deepseek r1 isn't possible on consumer hardware anyway.

    Also the distilled models are not the same as the actual r1. Its more like say the base LLama model fine tuned with DeepSeek R1.

    Ryzen, DDR 5, and 2 x 8 GB gpu works?

    It does. Even can get away with older Xeons. The GPU - single or multiple matters the most. If running via Ollama it would automatically offload the layers to both the GPUs. The ideal config is to fit in the complete model in the combined GPU memory.

    That being said there is something like https://github.com/exo-explore/exo which aims to combine multiple devices as one powerful inference cluster. Haven't used though.

    So we buy more KS-Game-LE to make an A.I cluster? ok

    That one guy who ordered bulk LE-B's with the 1245v5's

    Thanked by 1beanman109
  • rattlecattlerattlecattle Member
    edited January 2025

    @cainyxues said:

    @rattlecattle said:
    It does. Even can get away with older Xeons. The GPU - single or multiple matters the most. If running via Ollama it would automatically offload the layers to both the GPUs. The ideal config is to fit in the complete model in the combined GPU memory.

    That being said there is something like https://github.com/exo-explore/exo which aims to combine multiple devices as one powerful inference cluster. Haven't used though.

    How does it do that since I have learnt that the more the slow data transfer the more it will bottleneck the performance [There was a very famous youtuber who tried this with mac minis through ethernet connection if I remember and it bottlenecked ]

    Plain Ethernet would definitely bottleneck. Best to have the setup in a single system or to use something like GPUDirect RDMA for interconnection.

    @Neoon said: So we buy more KS-Game-LE to make an A.I cluster? ok

    Why not. Probably time to hook up the electric toothbrush. Every device counts. :smiley:

    Thanked by 1cainyxues
  • gksgks Member

    @beanman109 said:

    @jnd said: $100 per year is quite expensive for such slow output. I mean I just checked one of the better coder models at OpenRouter, you can go through more than 400M tokens before reaching your $100:

    i just tried openrouter with $5 of credit, the $5 turned into $4.4 after they deducted fees? then i made a single prompt and now i'm at $4.37

    this aint it chief

    If your project is not continuous, mean you can have azure trial account for experiment for 30 days, 200 usd credit, you may get some cheap trial like stuffs for usd 10 . I can't promise genuinely , but few online services offer them.

    Azure has AI services, it has chatgpt models, for learning purpose you may spend about 10 usd but get 200 usd credit. I use vector db, AI search, cosmos db, many analytics, data factory, etc along with Azure AI and IoT. For the month, the credit expires, obviously it is throwaway account.

    Not useful for serious projects, but for learning, it is cheap. You can use react web app, that works with python api for chatgpt, that way you can also stop paying 20 usd per month to chatgpt.

    Chatgpt batch reduce cost a lot.

    I wish these cloud platform open source their stack, or allow a meaningful way to learn things for students and early adapters, as the cloud vendor tools themselves would cost texhnical debt to you, cost lot when they stop supporting or eco system too weak.

  • you guys should also try groq, its free and fast also openrouter provides llama models for free right? also there is google gemini free tier with its flash 2.0 with real time interaction

  • @HostSlick said: Time to offer GPU Options i guess.

    You could try looking at AMD APU's - they can be configured to use as much system RAM as you like, take up far less space and consume far less power. The Ryzen "AI" chips are great for this. Similar performance to M4 chips, but with advantage of being able to use far more RAM (and running x86).

  • @cainyxues said:
    you guys should also try groq, its free and fast also openrouter provides llama models for free right? also there is google gemini free tier with its flash 2.0 with real time interaction

    this thread was supposed to be able running it on cheap dedi's - for all kinds of reasons. Theres countless threads about llm's on other services.

  • Adam1Adam1 Member
    edited January 2025

    @rattlecattle said: That being said there is something like https://github.com/exo-explore/exo which aims to combine multiple devices as one powerful inference cluster. Haven't used though.

    A YT channel I watch has dabbled with it , with mixed results clustering M4 Mac Minis. It'll probably get better, though, so it is something to watch.

Sign In or Register to comment.