Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

LLM (deepseek?) on KimSufi server

1356

Comments

  • Why not try Google Colab, it's free?

  • @Fengfeng said: Why not try Google Colab, it's free?

    pls stop going offtopic

  • I have a spare PC wondering if I can get this running there:

    i5 9600kf
    32gb ddr4 ram 2400 MHz
    240gb M2 SSD
    128gb SSD
    1tb HDD
    RTX 2060

    Would like to know as well if I can upgrade the GPU to get this running.

  • @Adam1 said:

    @cainyxues said:
    you guys should also try groq, its free and fast also openrouter provides llama models for free right? also there is google gemini free tier with its flash 2.0 with real time interaction

    this thread was supposed to be able running it on cheap dedi's - for all kinds of reasons. Theres countless threads about llm's on other services.

    oh sorry, it was just that people started talking about that so I recommended some

  • @webpro85 said:
    I have a spare PC wondering if I can get this running there:

    i5 9600kf
    32gb ddr4 ram 2400 MHz
    240gb M2 SSD
    128gb SSD
    1tb HDD
    RTX 2060

    Would like to know as well if I can upgrade the GPU to get this running.

    You can probably run 8-16 billion parameters models at ease like the rams good too as well as the vram too, you can try ollama with webui or even lmstudio too

    Thanked by 2webpro85 noob404
  • @beanman109 said:

    @kvz12 said:

    @beanman109 said:
    I still need to find a use for a 32GB dedi, what LLM would everyone recommend running locally? Not interested in Deepseek for obvious reasons

    What's the obvious reasons?

    Bias in the responses from the Chinese government, end of story - we don't need to turn this into a political thread, I just want answers that aren't deepseek.

    Does not affect you in any way unless you decide to ask questions about those topics. Unlikely.

  • beanman109beanman109 Member, Host Rep, Megathread Squad
    edited January 2025

    @webpro85 said:
    I have a spare PC wondering if I can get this running there:

    i5 9600kf
    32gb ddr4 ram 2400 MHz
    240gb M2 SSD
    128gb SSD
    1tb HDD
    RTX 2060

    Would like to know as well if I can upgrade the GPU to get this running.

    Like @cainyxues mentioned you'll easily be able to run 7b-14b models on this.
    I'm a big fan of ollama and open-webui at the moment, took me all of 30 seconds to get it setup if you use docker-compose

    version: '3.3'
    services:
      openWebUI:
        image: ghcr.io/open-webui/open-webui:main
        restart: always
        ports:
          - "8080:8080"
        extra_hosts:
          - "host.docker.internal:host-gateway"
        volumes:
          - /root/llm/open-webui:/app/backend/data
    
      ollama:
        image: ollama/ollama:latest
        ports:
          - "11434:11434"
        volumes:
          - /root/llm/ollama:/root/.ollama
    

    edit: just saw you have a GPU, heres a docker-compose that will utilise that if you want

    version: '3.3'
    
    services:
      openWebUI:
        image: ghcr.io/open-webui/open-webui:main
        restart: always
        ports:
          - "8080:8080"
        extra_hosts:
          - "host.docker.internal:host-gateway"
        volumes:
          - /root/llm/open-webui:/app/backend/data
    
      ollama:
        image: ollama/ollama:latest-cuda  # Use the CUDA-enabled GPU version of the image
        runtime: nvidia                   # Enable NVIDIA runtime for GPU access
        deploy:
          resources:
            reservations:
              devices:
                - capabilities:
                    - gpu                 # Allow the container to use the GPU
        ports:
          - "11434:11434"
        volumes:
          - /root/llm/ollama:/root/.ollama
    
  • HostSlickHostSlick 🚩 Host Rep Tag Suspended

    @Adam1 said:

    @HostSlick said: Time to offer GPU Options i guess.

    You could try looking at AMD APU's - they can be configured to use as much system RAM as you like, take up far less space and consume far less power. The Ryzen "AI" chips are great for this. Similar performance to M4 chips, but with advantage of being able to use far more RAM (and running x86).

    Didnt think About this yet but i will inform myself and Investigate in the Next weeks/months. Thanks for this Tip! :smile:

    Thanked by 1host_c
  • allthemtingsallthemtings Member, Megathread Squad
    edited January 2025
  • with 16gb m1 pro, having 200GB/s memory bandwidth i can run deepseek dstill R1 14b, still couldnt get a grip on the limits. chat windows size? max acumulated token?
    for tok/sec memory bandwidth is crucial that is what i get from twitter posts

  • Very interesting, have a ks-le-e and ks-le-b with 64 and 32gb ram sitting mostly idle, would really like to try this for ai tasks that are not realtime so tokens/s is not to important. I will spin up ollama and test a bit. Thanks for getting me on the tought of actually running llm on those servers

    Thanked by 1Adam1
  • @charger said: I will spin up ollama and test a bit.

    pls update with your findings :)

    Thanked by 1khalequzzaman
  • NeoonNeoon Community Contributor, Veteran
    edited January 2025

    32b running on a KS-LE-B, its just 11$/m though

    3t/s maybe

  • allthemtingsallthemtings Member, Megathread Squad

    @Neoon said:
    32b running on a KS-LE-B, its just 11$/m though

    3t/s maybe

    Anyone tested this with the LE-B with 1245v5 w/ iGPU?

    Thanked by 1plumberg
  • NeoonNeoon Community Contributor, Veteran

    @Neoon said:
    32b running on a KS-LE-B, its just 11$/m though

    3t/s maybe

  • 7950X3D with 48GB and 7900xtx can perfect run deepseek 32b,70b answer question so slowly.When i run 32b,it use all GPU memory(24GB) and 40G memory.This is my own pc.

    Thanked by 2wuck cainyxues
  • NeoonNeoon Community Contributor, Veteran

    Anyone of you tried the 72b model on the 128GB Kimsufi?

  • NeoonNeoon Community Contributor, Veteran
    edited January 2025

    Do you guys think, it will run well on a swap file?

  • @Neoon said:

    @Neoon said:
    32b running on a KS-LE-B, its just 11$/m though

    3t/s maybe

    I assume this is deepseek-r1:32b and ran on the e3 1270?

    here is 1245 v5 (32gb) for comparison

    and 1245 v6 (32gb), maybe slightly faster?

    interestingly they all started with the same joke, but atleast the second one was unique

    Thanked by 1ariq01
  • NeoonNeoon Community Contributor, Veteran

    @Neoon said:
    Do you guys think, it will run well on a swap file?

    It actually runs on 64gig without a swap file.
    But fuck hell, even slower.

    Maybe on bigger context sizes, it needs 60GB+

  • NeoonNeoon Community Contributor, Veteran
    edited January 2025

    72b running, on a 11$/m dedi, with 15GB to spare, is nuts.

    edit: here is the joke:

    Why don’t scientists trust atoms?
    Because they make up everything! 😄

    Thanked by 1BasToTheMax
  • Just to mess around I installed Ollama on one of the KS-LE-1s that I just received that has 32GB RAM and 2x480GB SSDs.

    I have installed the Deepseek R1, Deepseek V3 and Phi4 models.

    They work but are extremely slow.

  • NeoonNeoon Community Contributor, Veteran

    @barbarza said:
    Just to mess around I installed Ollama on one of the KS-LE-1s that I just received that has 32GB RAM and 2x480GB SSDs.

    I have installed the Deepseek R1, Deepseek V3 and Phi4 models.

    They work but are extremely slow.

    Yea but its private, nobody knows what you ask.
    Lets wait for the GAME delivery, should handle up to 32b, given its DDR4 it might be faster than any regular KS.

    Still for 11-12$/m steal.

    Thanked by 1barbarza
  • So how much ram is needed for each model?

  • NeoonNeoon Community Contributor, Veteran
    edited January 2025

    @BasToTheMax said:
    So how much ram is needed for each model?

    32b * 1.2, rougly what you need, at least that's what I read.
    The 70b just crashed on my 64GB machine.

    Wonder that it ran in the first place.

  • @Neoon said:

    @BasToTheMax said:
    So how much ram is needed for each model?

    32b * 1.2, rougly what you need, at least that's what I read.
    The 70b just crashed on my 64GB machine.

    Wonder that it ran in the first place.

    Oh okay. So the 8B model would use about 9.6 GB ram

  • @Adam1 said:

    @charger said: I will spin up ollama and test a bit.

    pls update with your findings :)

    KS-LE-B with E3-1245 v6 and 32gb of ram:
    deepseek-r1:32b Prompt eval: 2.79 t/s Response: 1.34 t/s Total: 1.36 t/s

    KS-LE-E with E5-1650 v3 and 64gb of ram:
    deepseek-r1:32b Prompt eval: 3.24 t/s Response: 1.85 t/s Total: 1.86 t/s

    So performance is not fantastic, but honestly for the few bucks a month and them mostly idling anyways I see a use case where tokens/s is not super important like background jobs and such

  • @Neoon said:
    72b running, on a 11$/m dedi, with 15GB to spare, is nuts.

    edit: here is the joke:

    Why don’t scientists trust atoms?
    Because they make up everything! 😄

    Have you tried the new DeepSeek R1 Dynamic 1.58-bit that just got released? They achieved an 80% size reduction. I'm interested in how well it can perform on a low/medium-end CPU.

  • NeoonNeoon Community Contributor, Veteran
    edited January 2025

    @Cybr said:

    @Neoon said:
    72b running, on a 11$/m dedi, with 15GB to spare, is nuts.

    edit: here is the joke:

    Why don’t scientists trust atoms?
    Because they make up everything! 😄

    Have you tried the new DeepSeek R1 Dynamic 1.58-bit that just got released? They achieved an 80% size reduction. I'm interested in how well it can perform on a low/medium-end CPU.

    If its on ollama fine, to lazy to compile shit.
    edit: seems like with some params, it compiles fine for CPU only.

    I wasn't going to install all these crap nvidia dependencies.

  • CybrCybr Member
    edited January 2025

    @Neoon said:

    @Cybr said:

    @Neoon said:
    72b running, on a 11$/m dedi, with 15GB to spare, is nuts.

    edit: here is the joke:

    Why don’t scientists trust atoms?
    Because they make up everything! 😄

    Have you tried the new DeepSeek R1 Dynamic 1.58-bit that just got released? They achieved an 80% size reduction. I'm interested in how well it can perform on a low/medium-end CPU.

    If its on ollama fine, to lazy to compile shit.
    edit: seems like with some params, it compiles fine for CPU only.

    I wasn't going to install all these crap nvidia dependencies.

    Looks like it is on ollama, but minimum VRAM+RAM=80GB, so your low end box probably won't have enough ram to even try it CPU only.

Sign In or Register to comment.