LLM (deepseek?) on KimSufi server

Fengfeng · January 2025

Why not try Google Colab, it's free?

Adam1 · January 2025

@Fengfeng said: Why not try Google Colab, it's free?

pls stop going offtopic

webpro85 · January 2025

I have a spare PC wondering if I can get this running there:

i5 9600kf
32gb ddr4 ram 2400 MHz
240gb M2 SSD
128gb SSD
1tb HDD
RTX 2060

Would like to know as well if I can upgrade the GPU to get this running.

cainyxues · January 2025

@Adam1 said:

@cainyxues said:
you guys should also try groq, its free and fast also openrouter provides llama models for free right? also there is google gemini free tier with its flash 2.0 with real time interaction

this thread was supposed to be able running it on cheap dedi's - for all kinds of reasons. Theres countless threads about llm's on other services.

oh sorry, it was just that people started talking about that so I recommended some

cainyxues · January 2025

@webpro85 said:
I have a spare PC wondering if I can get this running there:

i5 9600kf
32gb ddr4 ram 2400 MHz
240gb M2 SSD
128gb SSD
1tb HDD
RTX 2060

Would like to know as well if I can upgrade the GPU to get this running.

You can probably run 8-16 billion parameters models at ease like the rams good too as well as the vram too, you can try ollama with webui or even lmstudio too

kvz12 · January 2025

@beanman109 said:

@kvz12 said:

@beanman109 said:
I still need to find a use for a 32GB dedi, what LLM would everyone recommend running locally? Not interested in Deepseek for obvious reasons

What's the obvious reasons?

Bias in the responses from the Chinese government, end of story - we don't need to turn this into a political thread, I just want answers that aren't deepseek.

Does not affect you in any way unless you decide to ask questions about those topics. Unlikely.

beanman109 · January 2025

@webpro85 said:
I have a spare PC wondering if I can get this running there:

i5 9600kf
32gb ddr4 ram 2400 MHz
240gb M2 SSD
128gb SSD
1tb HDD
RTX 2060

Would like to know as well if I can upgrade the GPU to get this running.

Like @cainyxues mentioned you'll easily be able to run 7b-14b models on this.
I'm a big fan of ollama and open-webui at the moment, took me all of 30 seconds to get it setup if you use docker-compose

version: '3.3'
services:
  openWebUI:
    image: ghcr.io/open-webui/open-webui:main
    restart: always
    ports:
      - "8080:8080"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      - /root/llm/open-webui:/app/backend/data

  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - /root/llm/ollama:/root/.ollama

edit: just saw you have a GPU, heres a docker-compose that will utilise that if you want

version: '3.3'

services:
  openWebUI:
    image: ghcr.io/open-webui/open-webui:main
    restart: always
    ports:
      - "8080:8080"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      - /root/llm/open-webui:/app/backend/data

  ollama:
    image: ollama/ollama:latest-cuda  # Use the CUDA-enabled GPU version of the image
    runtime: nvidia                   # Enable NVIDIA runtime for GPU access
    deploy:
      resources:
        reservations:
          devices:
            - capabilities:
                - gpu                 # Allow the container to use the GPU
    ports:
      - "11434:11434"
    volumes:
      - /root/llm/ollama:/root/.ollama

HostSlick · January 2025

@Adam1 said:

@HostSlick said: Time to offer GPU Options i guess.

You could try looking at AMD APU's - they can be configured to use as much system RAM as you like, take up far less space and consume far less power. The Ryzen "AI" chips are great for this. Similar performance to M4 chips, but with advantage of being able to use far more RAM (and running x86).

Didnt think About this yet but i will inform myself and Investigate in the Next weeks/months. Thanks for this Tip!

allthemtings · January 2025

tridinebandim · January 2025

with 16gb m1 pro, having 200GB/s memory bandwidth i can run deepseek dstill R1 14b, still couldnt get a grip on the limits. chat windows size? max acumulated token?
for tok/sec memory bandwidth is crucial that is what i get from twitter posts

charger · January 2025

Very interesting, have a ks-le-e and ks-le-b with 64 and 32gb ram sitting mostly idle, would really like to try this for ai tasks that are not realtime so tokens/s is not to important. I will spin up ollama and test a bit. Thanks for getting me on the tought of actually running llm on those servers

Adam1 · January 2025

@charger said: I will spin up ollama and test a bit.

pls update with your findings

Neoon · January 2025

32b running on a KS-LE-B, its just 11$/m though

3t/s maybe

allthemtings · January 2025

@Neoon said:
32b running on a KS-LE-B, its just 11$/m though

3t/s maybe

Anyone tested this with the LE-B with 1245v5 w/ iGPU?

Neoon · January 2025

@Neoon said:
32b running on a KS-LE-B, its just 11$/m though

3t/s maybe

saobilin · January 2025

7950X3D with 48GB and 7900xtx can perfect run deepseek 32b,70b answer question so slowly.When i run 32b,it use all GPU memory(24GB) and 40G memory.This is my own pc.

Neoon · January 2025

Anyone of you tried the 72b model on the 128GB Kimsufi?

Neoon · January 2025

Do you guys think, it will run well on a swap file?

Pandy · January 2025

@Neoon said:

@Neoon said:
32b running on a KS-LE-B, its just 11$/m though

3t/s maybe

I assume this is deepseek-r1:32b and ran on the e3 1270?

here is 1245 v5 (32gb) for comparison

and 1245 v6 (32gb), maybe slightly faster?

interestingly they all started with the same joke, but atleast the second one was unique

Neoon · January 2025

@Neoon said:
Do you guys think, it will run well on a swap file?

It actually runs on 64gig without a swap file.
But fuck hell, even slower.

Maybe on bigger context sizes, it needs 60GB+

Neoon · January 2025

72b running, on a 11$/m dedi, with 15GB to spare, is nuts.

edit: here is the joke:

Why don’t scientists trust atoms?
Because they make up everything! 😄

barbarza · January 2025

Just to mess around I installed Ollama on one of the KS-LE-1s that I just received that has 32GB RAM and 2x480GB SSDs.

I have installed the Deepseek R1, Deepseek V3 and Phi4 models.

They work but are extremely slow.

Neoon · January 2025

@barbarza said:
Just to mess around I installed Ollama on one of the KS-LE-1s that I just received that has 32GB RAM and 2x480GB SSDs.

I have installed the Deepseek R1, Deepseek V3 and Phi4 models.

They work but are extremely slow.

Yea but its private, nobody knows what you ask.
Lets wait for the GAME delivery, should handle up to 32b, given its DDR4 it might be faster than any regular KS.

Still for 11-12$/m steal.

BasToTheMax · January 2025

So how much ram is needed for each model?

Neoon · January 2025

@BasToTheMax said:
So how much ram is needed for each model?

32b * 1.2, rougly what you need, at least that's what I read.
The 70b just crashed on my 64GB machine.

Wonder that it ran in the first place.

BasToTheMax · January 2025

@Neoon said:

@BasToTheMax said:
So how much ram is needed for each model?

32b * 1.2, rougly what you need, at least that's what I read.
The 70b just crashed on my 64GB machine.

Wonder that it ran in the first place.

Oh okay. So the 8B model would use about 9.6 GB ram

charger · January 2025

@Adam1 said:

@charger said: I will spin up ollama and test a bit.

pls update with your findings

KS-LE-B with E3-1245 v6 and 32gb of ram:
deepseek-r1:32b Prompt eval: 2.79 t/s Response: 1.34 t/s Total: 1.36 t/s

KS-LE-E with E5-1650 v3 and 64gb of ram:
deepseek-r1:32b Prompt eval: 3.24 t/s Response: 1.85 t/s Total: 1.86 t/s

So performance is not fantastic, but honestly for the few bucks a month and them mostly idling anyways I see a use case where tokens/s is not super important like background jobs and such

Cybr · January 2025

@Neoon said:
72b running, on a 11$/m dedi, with 15GB to spare, is nuts.

edit: here is the joke:

Why don’t scientists trust atoms?
Because they make up everything! 😄

Have you tried the new DeepSeek R1 Dynamic 1.58-bit that just got released? They achieved an 80% size reduction. I'm interested in how well it can perform on a low/medium-end CPU.

Neoon · January 2025

@Cybr said:

@Neoon said:
72b running, on a 11$/m dedi, with 15GB to spare, is nuts.

edit: here is the joke:

Why don’t scientists trust atoms?
Because they make up everything! 😄

Have you tried the new DeepSeek R1 Dynamic 1.58-bit that just got released? They achieved an 80% size reduction. I'm interested in how well it can perform on a low/medium-end CPU.

If its on ollama fine, to lazy to compile shit.
edit: seems like with some params, it compiles fine for CPU only.

I wasn't going to install all these crap nvidia dependencies.

Cybr · January 2025

@Neoon said:

@Cybr said:

@Neoon said:
72b running, on a 11$/m dedi, with 15GB to spare, is nuts.

edit: here is the joke:

Why don’t scientists trust atoms?
Because they make up everything! 😄

Have you tried the new DeepSeek R1 Dynamic 1.58-bit that just got released? They achieved an 80% size reduction. I'm interested in how well it can perform on a low/medium-end CPU.

If its on ollama fine, to lazy to compile shit.
edit: seems like with some params, it compiles fine for CPU only.

I wasn't going to install all these crap nvidia dependencies.

Looks like it is on ollama, but minimum VRAM+RAM=80GB, so your low end box probably won't have enough ram to even try it CPU only.

Howdy, Stranger!

Categories

In this Discussion

LLM (deepseek?) on KimSufi server

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

LLM (deepseek?) on KimSufi server

Comments