Free hosted Llama 3.1 8B - TensorDock

lentro · August 2024

Hello LET!

If you've been looking to play around with AI endpoints recently but haven't had the chance, I've spun up a GPU cluster running Llama 3.1 8B for anyone who's interested.

The API is 100% OpenAI completions API compatible (let me know if you want streaming support) and free to use for now!

If there's enough interest, I can also set up a Llama 70B or hosted Mixtral cluster.

Example API integration in Python:

"""
An example OpenAI API client that connects to TensorDock's YAP
"""
from openai import OpenAI

client = OpenAI(
    api_key = "dummy",
    base_url="https://yap.tensordock.com"
)
completion = client.chat.completions.create(
    model="Meta-Llama-3.1-8B-Instruct",
    messages=[
        {
            "role" : "system",
            "content" : "You are a pirate who speaks in pirate-speak."
        },
        {
            "role" : "user",
            "content" : "Explain LLMs to me in a single sentence."
        }
    ],
    max_tokens=256,
    temperature=0.7,
    top_p = 0.7,
    frequency_penalty=1.0,
    seed = 17
)

output = completion.choices[0].message.content
print(output)

More details: https://blog.tensordock.com/blog/YAP
Rent affordable GPU servers: https://dashboard.tensordock.com/deploy

Void · August 2024

Awesome! If you could make it free in the long term as well like various LET free VPS projects, it would be so useful. Also interested in 70B.

lentro · August 2024

@Void said:
Awesome! If you could make it free in the long term as well like various LET free VPS projects, it would be so useful. Also interested in 70B.

Thanks! Will keep you posted on 70B... that requires a lot more compute power, as we'd have to upgrade to H100s. [And we're already running a lower end cluster to load balance // ensure uptime]

Realistically I'd love to keep this free as much as you would (and just charge some high usage customers), but realistically we cannot do so if people abuse the API. So "best effort" free for as long as possible haha

plumberg · August 2024

This is awesome 👌 👏 👍

lentro · August 2024

@plumberg said: This is awesome 👌 👏 👍

Thanks! Excited to see some real people using it -- cool to stress test the server cluster

SlowDD · August 2024

Thank you for the endpoint, what hardware are you using?

emgh · August 2024

Thanks!!

I’d love to see the most advanced open source models. Feel free to put them behind registration, and charge a few $ a month for an account and maybe even limit queries to like 60 an hour or something per account if you get abuse/automated queries.

Are the most advanced ones on-par with 4o?

emgh · August 2024

To answer myself: https://arena.lmsys.org/

BruhGamer12 · August 2024

@emgh said:
Thanks!!

I’d love to see the most advanced open source models. Feel free to put them behind registration, and charge a few $ a month for an account and maybe even limit queries to like 60 an hour or something per account if you get abuse/automated queries.

Are the most advanced ones on-par with 4o?

kind of - Claude 3.5 sonnet is most advanced imo at the moment

@emgh said:
To answer myself: https://arena.lmsys.org/

I think this benchmark is actually a little better than the arena ad the best imo

https://huggingface.co/spaces/allenai/ZebraLogic

It benchmarks a wide range of logic and none of the models are close to a 100% whatosever so their is still plenty of scaling left.

AXYZE · August 2024

@emgh said:
Thanks!!

I’d love to see the most advanced open source models. Feel free to put them behind registration, and charge a few $ a month for an account and maybe even limit queries to like 60 an hour or something per account if you get abuse/automated queries.

Are the most advanced ones on-par with 4o?

Claude 3.5 Sonnet > GPT-4o >= Llama 3.1 408B >> Llama 3.1 70B >= Gemini 1.5 Pro > Gemma 2 27B

There are a lot of services like you described, most popular is poe.com

For coding you can also try DeepSeek Coder V2, its roughly at the same level as Llama 3.1 408B, but cost pennies via API. Its like 50x cheaper than GPT-4o.

@lentro did you try Gemma 2 9B too? It interesting becuase its really multilingual capable, even tho its just 9B model. Works very nice for basic, very fast chatbots.

emgh · August 2024

@AXYZE said:

@emgh said:
Thanks!!

I’d love to see the most advanced open source models. Feel free to put them behind registration, and charge a few $ a month for an account and maybe even limit queries to like 60 an hour or something per account if you get abuse/automated queries.

Are the most advanced ones on-par with 4o?

Claude 3.5 Sonnet > GPT-4o >= Llama 3.1 408B >> Llama 3.1 70B >= Gemini 1.5 Pro > Gemma 2 27B

There are a lot of services like you described, most popular is poe.com

For coding you can also try DeepSeek Coder V2, its roughly at the same level as Llama 3.1 408B, but cost pennies via API. Its like 50x cheaper than GPT-4o.

@lentro did you try Gemma 2 9B too? It interesting becuase its really multilingual capable, even tho its just 9B model. Works very nice for basic, very fast chatbots.

Thanks!

Excited to give poe.com a try with DeepSeek Coder. Error 500 currently though so I guess not the most reliable.

I’ve paid for ChatGPT for over a year but it would be nice to try something new, especially if it’s as good for coding but much cheaper.

Edit: Tried DeepSeek chat?

AXYZE · August 2024

@emgh said:

@AXYZE said:

@emgh said:
Thanks!!

I’d love to see the most advanced open source models. Feel free to put them behind registration, and charge a few $ a month for an account and maybe even limit queries to like 60 an hour or something per account if you get abuse/automated queries.

Are the most advanced ones on-par with 4o?

Claude 3.5 Sonnet > GPT-4o >= Llama 3.1 408B >> Llama 3.1 70B >= Gemini 1.5 Pro > Gemma 2 27B

There are a lot of services like you described, most popular is poe.com

For coding you can also try DeepSeek Coder V2, its roughly at the same level as Llama 3.1 408B, but cost pennies via API. Its like 50x cheaper than GPT-4o.

@lentro did you try Gemma 2 9B too? It interesting becuase its really multilingual capable, even tho its just 9B model. Works very nice for basic, very fast chatbots.

Thanks!

Excited to give poe.com a try with DeepSeek Coder. Error 500 currently though so I guess not the most reliable.

I’ve paid for ChatGPT for over a year but it would be nice to try something new, especially if it’s as good for coding but much cheaper.

Edit: Tried DeepSeek chat?

I prefer to pay per use without any rate limiting so I use openrouter to access all models, including DeepSeek. GPT-4o became 33% cheaper via API (so on openrouter too) yesterday.

I didnt try chat variant as I use LLMs only for coding

emgh · August 2024

@AXYZE said:

@emgh said:

@AXYZE said:

@emgh said:
Thanks!!

I’d love to see the most advanced open source models. Feel free to put them behind registration, and charge a few $ a month for an account and maybe even limit queries to like 60 an hour or something per account if you get abuse/automated queries.

Are the most advanced ones on-par with 4o?

Claude 3.5 Sonnet > GPT-4o >= Llama 3.1 408B >> Llama 3.1 70B >= Gemini 1.5 Pro > Gemma 2 27B

There are a lot of services like you described, most popular is poe.com

For coding you can also try DeepSeek Coder V2, its roughly at the same level as Llama 3.1 408B, but cost pennies via API. Its like 50x cheaper than GPT-4o.

@lentro did you try Gemma 2 9B too? It interesting becuase its really multilingual capable, even tho its just 9B model. Works very nice for basic, very fast chatbots.

Thanks!

Excited to give poe.com a try with DeepSeek Coder. Error 500 currently though so I guess not the most reliable.

I’ve paid for ChatGPT for over a year but it would be nice to try something new, especially if it’s as good for coding but much cheaper.

Edit: Tried DeepSeek chat?

I prefer to pay per use without any rate limiting so I use openrouter to access all models, including DeepSeek. GPT-4o became 33% cheaper via API (so on openrouter too) yesterday.

I didnt try chat variant as I use LLMs only for coding

I meant their chat. In it, they offer deepseek coder. I didn’t see any info about rate limits, but if you push it hard enough I guess you’re getting limited.

lentro · August 2024

@AXYZE said: @lentro did you try Gemma 2 9B too? It interesting becuase its really multilingual capable, even tho its just 9B model. Works very nice for basic, very fast chatbots.

Will have to look into! Seems like a standard open license as well [a first for Google!].

seenu · August 2024

i started using Claude sonnet, it is absolutely fantastic.

atm, i prefer it over chatgpt

Levi · August 2024

What those 7B, 70B etc. mean? Is this brane level of AI?

AXYZE · August 2024

@Levi said:
What those 7B, 70B etc. mean? Is this brane level of AI?

Bilions of parameters

More parameters = more knowledge

More knowledge =/= smarter or more accurate tho

Treat it as "GBs of data", but data can be good or junk.

Levi · August 2024

@AXYZE said:

@Levi said:
What those 7B, 70B etc. mean? Is this brane level of AI?

Bilions of parameters

More parameters = more knowledge

More knowledge =/= smarter or more accurate tho

Treat it as "GBs of data", but data can be good or junk.

How many B’s does cgpt 4.5 has?

Raspi_dude · August 2024

Thank you so much!!!

AXYZE · August 2024

@Levi said:

@AXYZE said:

@Levi said:
What those 7B, 70B etc. mean? Is this brane level of AI?

Bilions of parameters

More parameters = more knowledge

More knowledge =/= smarter or more accurate tho

Treat it as "GBs of data", but data can be good or junk.

How many B’s does cgpt 4.5 has?

You meant GPT-4o? Its not public, so we dont know.
I think its around 1Trilion so 1000B.

emgh · August 2024

@AXYZE said:

@Levi said:

@AXYZE said:

@Levi said:
What those 7B, 70B etc. mean? Is this brane level of AI?

Bilions of parameters

More parameters = more knowledge

More knowledge =/= smarter or more accurate tho

Treat it as "GBs of data", but data can be good or junk.

How many B’s does cgpt 4.5 has?

You meant GPT-4o? Its not public, so we dont know.
I think its around 1Trilion so 1000B.

I love how OPENai is the most secretive and least open.

And the most open (of the big tech boys) is Facebook.

What a world.

Hotmarer · August 2024

I am stupid. How to connect this api to https://github.com/whitead/paper-qa. Github Copilot is not helping

vpsam · August 2024

Thanks for this. Out of interest what level of quantisation (if any) does it have?

marelhue · August 2024

very good!

lentro · August 2024

@vpsam said: level of quantisation

No quantization! Full FP16

seenu · August 2024

what are use-cases for it?

I have a project with more than 100 html files and i want to change plugin A to plugin B
so the procedure be like

remove css file
remove A' js file
add B' js file
adjust initialization syntax from A to B

currently cursor/claude able to do that but i have to go to each page to do that and instruct to do it and since it has to send request to server...everytime there is a delay.

can these local models can do these kind of repetitive tasks quickly?

this is just one use case...but i am wondering, if this kind of repetitive tasks can be faster with a local hosted Llama.

thanks

mrTom · August 2024

@seenu said: can these local models can do these kind of repetitive tasks quickly?

the models can write you a script to do it in a second.

seenu · August 2024

@mrTom said: the models can write you a script to do it in a second.

like below?

@seenu said: adjust initialization syntax from A to B

vpsam · August 2024

@lentro said:

@vpsam said: level of quantisation

No quantization! Full FP16

That's great! I was just looking at your site and you guys have a great selection of GPUs, though there are so many it might be nice to have a page with a table comparing them somewhere?

tomle · February 2025

Is this still up? Can't get it to work.

Void · February 2025

@tomle said:
Is this still up? Can't get it to work.

I think it died several months ago. Or my IP got banned. Nevertheless, with sites like Groq providing free API keys for newer and better models, I didn’t bother to check.

Howdy, Stranger!

Categories

In this Discussion

Free hosted Llama 3.1 8B - TensorDock

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Free hosted Llama 3.1 8B - TensorDock

Comments