New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Free hosted Llama 3.1 8B - TensorDock
Hello LET!
If you've been looking to play around with AI endpoints recently but haven't had the chance, I've spun up a GPU cluster running Llama 3.1 8B for anyone who's interested.
The API is 100% OpenAI completions API compatible (let me know if you want streaming support) and free to use for now!
If there's enough interest, I can also set up a Llama 70B or hosted Mixtral cluster.
Example API integration in Python:
"""
An example OpenAI API client that connects to TensorDock's YAP
"""
from openai import OpenAI
client = OpenAI(
api_key = "dummy",
base_url="https://yap.tensordock.com"
)
completion = client.chat.completions.create(
model="Meta-Llama-3.1-8B-Instruct",
messages=[
{
"role" : "system",
"content" : "You are a pirate who speaks in pirate-speak."
},
{
"role" : "user",
"content" : "Explain LLMs to me in a single sentence."
}
],
max_tokens=256,
temperature=0.7,
top_p = 0.7,
frequency_penalty=1.0,
seed = 17
)
output = completion.choices[0].message.content
print(output)
More details: https://blog.tensordock.com/blog/YAP
Rent affordable GPU servers: https://dashboard.tensordock.com/deploy
Comments
Awesome! If you could make it free in the long term as well like various LET free VPS projects, it would be so useful. Also interested in 70B.
Thanks! Will keep you posted on 70B... that requires a lot more compute power, as we'd have to upgrade to H100s. [And we're already running a lower end cluster to load balance // ensure uptime]
Realistically I'd love to keep this free as much as you would (and just charge some high usage customers), but realistically we cannot do so if people abuse the API. So "best effort" free for as long as possible haha
This is awesome 👌 👏 👍
Thanks! Excited to see some real people using it -- cool to stress test the server cluster
Thank you for the endpoint, what hardware are you using?
Thanks!!
I’d love to see the most advanced open source models. Feel free to put them behind registration, and charge a few $ a month for an account and maybe even limit queries to like 60 an hour or something per account if you get abuse/automated queries.
Are the most advanced ones on-par with 4o?
To answer myself: https://arena.lmsys.org/
kind of - Claude 3.5 sonnet is most advanced imo at the moment
I think this benchmark is actually a little better than the arena ad the best imo
https://huggingface.co/spaces/allenai/ZebraLogic
It benchmarks a wide range of logic and none of the models are close to a 100% whatosever so their is still plenty of scaling left.
Claude 3.5 Sonnet > GPT-4o >= Llama 3.1 408B >> Llama 3.1 70B >= Gemini 1.5 Pro > Gemma 2 27B
There are a lot of services like you described, most popular is poe.com
For coding you can also try DeepSeek Coder V2, its roughly at the same level as Llama 3.1 408B, but cost pennies via API. Its like 50x cheaper than GPT-4o.
@lentro did you try Gemma 2 9B too? It interesting becuase its really multilingual capable, even tho its just 9B model. Works very nice for basic, very fast chatbots.
Thanks!
Excited to give poe.com a try with DeepSeek Coder. Error 500 currently though so I guess not the most reliable.
I’ve paid for ChatGPT for over a year but it would be nice to try something new, especially if it’s as good for coding but much cheaper.
Edit: Tried DeepSeek chat?
I prefer to pay per use without any rate limiting so I use openrouter to access all models, including DeepSeek. GPT-4o became 33% cheaper via API (so on openrouter too) yesterday.
I didnt try chat variant as I use LLMs only for coding
I meant their chat. In it, they offer deepseek coder. I didn’t see any info about rate limits, but if you push it hard enough I guess you’re getting limited.
Will have to look into! Seems like a standard open license as well [a first for Google!].
i started using Claude sonnet, it is absolutely fantastic.
atm, i prefer it over chatgpt
What those 7B, 70B etc. mean? Is this brane level of AI?
Bilions of parameters
More parameters = more knowledge
More knowledge =/= smarter or more accurate tho
Treat it as "GBs of data", but data can be good or junk.
How many B’s does cgpt 4.5 has?
Thank you so much!!!
You meant GPT-4o? Its not public, so we dont know.
I think its around 1Trilion so 1000B.
I love how OPENai is the most secretive and least open.
And the most open (of the big tech boys) is Facebook.
What a world.
I am stupid. How to connect this api to https://github.com/whitead/paper-qa. Github Copilot is not helping
Thanks for this. Out of interest what level of quantisation (if any) does it have?
very good!
No quantization! Full FP16
what are use-cases for it?
I have a project with more than 100 html files and i want to change plugin A to plugin B
so the procedure be like
currently cursor/claude able to do that but i have to go to each page to do that and instruct to do it and since it has to send request to server...everytime there is a delay.
can these local models can do these kind of repetitive tasks quickly?
this is just one use case...but i am wondering, if this kind of repetitive tasks can be faster with a local hosted Llama.
thanks
the models can write you a script to do it in a second.
like below?
That's great! I was just looking at your site and you guys have a great selection of GPUs, though there are so many it might be nice to have a page with a table comparing them somewhere?