Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Free hosted Llama 3.1 8B - TensorDock
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Free hosted Llama 3.1 8B - TensorDock

lentrolentro Member, Host Rep
edited August 6 in General

Hello LET!

If you've been looking to play around with AI endpoints recently but haven't had the chance, I've spun up a GPU cluster running Llama 3.1 8B for anyone who's interested.

The API is 100% OpenAI completions API compatible (let me know if you want streaming support) and free to use for now!

If there's enough interest, I can also set up a Llama 70B or hosted Mixtral cluster.

Example API integration in Python:

"""
An example OpenAI API client that connects to TensorDock's YAP
"""
from openai import OpenAI

client = OpenAI(
    api_key = "dummy",
    base_url="https://yap.tensordock.com"
)
completion = client.chat.completions.create(
    model="Meta-Llama-3.1-8B-Instruct",
    messages=[
        {
            "role" : "system",
            "content" : "You are a pirate who speaks in pirate-speak."
        },
        {
            "role" : "user",
            "content" : "Explain LLMs to me in a single sentence."
        }
    ],
    max_tokens=256,
    temperature=0.7,
    top_p = 0.7,
    frequency_penalty=1.0,
    seed = 17
)

output = completion.choices[0].message.content
print(output)

More details: https://blog.tensordock.com/blog/YAP
Rent affordable GPU servers: https://dashboard.tensordock.com/deploy

Comments

  • VoidVoid Member

    Awesome! If you could make it free in the long term as well like various LET free VPS projects, it would be so useful. Also interested in 70B.

    Thanked by 1lentro
  • lentrolentro Member, Host Rep

    @Void said:
    Awesome! If you could make it free in the long term as well like various LET free VPS projects, it would be so useful. Also interested in 70B.

    Thanks! Will keep you posted on 70B... that requires a lot more compute power, as we'd have to upgrade to H100s. [And we're already running a lower end cluster to load balance // ensure uptime]

    Realistically I'd love to keep this free as much as you would (and just charge some high usage customers), but realistically we cannot do so if people abuse the API. So "best effort" free for as long as possible haha :)

    Thanked by 3Void plumberg jasonxu
  • plumbergplumberg Veteran

    This is awesome 👌 👏 👍

    Thanked by 1lentro
  • lentrolentro Member, Host Rep
    edited August 6

    @plumberg said: This is awesome 👌 👏 👍

    Thanks! Excited to see some real people using it -- cool to stress test the server cluster :)

    Thanked by 1plumberg
  • SlowDDSlowDD Member

    Thank you for the endpoint, what hardware are you using?

    Thanked by 1lentro
  • emghemgh Member
    edited August 6

    Thanks!!

    I’d love to see the most advanced open source models. Feel free to put them behind registration, and charge a few $ a month for an account and maybe even limit queries to like 60 an hour or something per account if you get abuse/automated queries.

    Are the most advanced ones on-par with 4o?

    Thanked by 1lentro
  • emghemgh Member

    To answer myself: https://arena.lmsys.org/

    Thanked by 1lentro
  • @emgh said:
    Thanks!!

    I’d love to see the most advanced open source models. Feel free to put them behind registration, and charge a few $ a month for an account and maybe even limit queries to like 60 an hour or something per account if you get abuse/automated queries.

    Are the most advanced ones on-par with 4o?

    kind of - Claude 3.5 sonnet is most advanced imo at the moment

    @emgh said:
    To answer myself: https://arena.lmsys.org/

    I think this benchmark is actually a little better than the arena ad the best imo

    https://huggingface.co/spaces/allenai/ZebraLogic

    It benchmarks a wide range of logic and none of the models are close to a 100% whatosever so their is still plenty of scaling left.

    Thanked by 1emgh
  • AXYZEAXYZE Member

    @emgh said:
    Thanks!!

    I’d love to see the most advanced open source models. Feel free to put them behind registration, and charge a few $ a month for an account and maybe even limit queries to like 60 an hour or something per account if you get abuse/automated queries.

    Are the most advanced ones on-par with 4o?

    Claude 3.5 Sonnet > GPT-4o >= Llama 3.1 408B >> Llama 3.1 70B >= Gemini 1.5 Pro > Gemma 2 27B

    There are a lot of services like you described, most popular is poe.com

    For coding you can also try DeepSeek Coder V2, its roughly at the same level as Llama 3.1 408B, but cost pennies via API. Its like 50x cheaper than GPT-4o.

    @lentro did you try Gemma 2 9B too? It interesting becuase its really multilingual capable, even tho its just 9B model. Works very nice for basic, very fast chatbots.

    Thanked by 3emgh lentro abtdw
  • emghemgh Member
    edited August 7

    @AXYZE said:

    @emgh said:
    Thanks!!

    I’d love to see the most advanced open source models. Feel free to put them behind registration, and charge a few $ a month for an account and maybe even limit queries to like 60 an hour or something per account if you get abuse/automated queries.

    Are the most advanced ones on-par with 4o?

    Claude 3.5 Sonnet > GPT-4o >= Llama 3.1 408B >> Llama 3.1 70B >= Gemini 1.5 Pro > Gemma 2 27B

    There are a lot of services like you described, most popular is poe.com

    For coding you can also try DeepSeek Coder V2, its roughly at the same level as Llama 3.1 408B, but cost pennies via API. Its like 50x cheaper than GPT-4o.

    @lentro did you try Gemma 2 9B too? It interesting becuase its really multilingual capable, even tho its just 9B model. Works very nice for basic, very fast chatbots.

    Thanks!

    Excited to give poe.com a try with DeepSeek Coder. Error 500 currently though so I guess not the most reliable.

    I’ve paid for ChatGPT for over a year but it would be nice to try something new, especially if it’s as good for coding but much cheaper.

    Edit: Tried DeepSeek chat?

  • AXYZEAXYZE Member
    edited August 7

    @emgh said:

    @AXYZE said:

    @emgh said:
    Thanks!!

    I’d love to see the most advanced open source models. Feel free to put them behind registration, and charge a few $ a month for an account and maybe even limit queries to like 60 an hour or something per account if you get abuse/automated queries.

    Are the most advanced ones on-par with 4o?

    Claude 3.5 Sonnet > GPT-4o >= Llama 3.1 408B >> Llama 3.1 70B >= Gemini 1.5 Pro > Gemma 2 27B

    There are a lot of services like you described, most popular is poe.com

    For coding you can also try DeepSeek Coder V2, its roughly at the same level as Llama 3.1 408B, but cost pennies via API. Its like 50x cheaper than GPT-4o.

    @lentro did you try Gemma 2 9B too? It interesting becuase its really multilingual capable, even tho its just 9B model. Works very nice for basic, very fast chatbots.

    Thanks!

    Excited to give poe.com a try with DeepSeek Coder. Error 500 currently though so I guess not the most reliable.

    I’ve paid for ChatGPT for over a year but it would be nice to try something new, especially if it’s as good for coding but much cheaper.

    Edit: Tried DeepSeek chat?

    I prefer to pay per use without any rate limiting so I use openrouter to access all models, including DeepSeek. GPT-4o became 33% cheaper via API (so on openrouter too) yesterday.

    I didnt try chat variant as I use LLMs only for coding

  • emghemgh Member

    @AXYZE said:

    @emgh said:

    @AXYZE said:

    @emgh said:
    Thanks!!

    I’d love to see the most advanced open source models. Feel free to put them behind registration, and charge a few $ a month for an account and maybe even limit queries to like 60 an hour or something per account if you get abuse/automated queries.

    Are the most advanced ones on-par with 4o?

    Claude 3.5 Sonnet > GPT-4o >= Llama 3.1 408B >> Llama 3.1 70B >= Gemini 1.5 Pro > Gemma 2 27B

    There are a lot of services like you described, most popular is poe.com

    For coding you can also try DeepSeek Coder V2, its roughly at the same level as Llama 3.1 408B, but cost pennies via API. Its like 50x cheaper than GPT-4o.

    @lentro did you try Gemma 2 9B too? It interesting becuase its really multilingual capable, even tho its just 9B model. Works very nice for basic, very fast chatbots.

    Thanks!

    Excited to give poe.com a try with DeepSeek Coder. Error 500 currently though so I guess not the most reliable.

    I’ve paid for ChatGPT for over a year but it would be nice to try something new, especially if it’s as good for coding but much cheaper.

    Edit: Tried DeepSeek chat?

    I prefer to pay per use without any rate limiting so I use openrouter to access all models, including DeepSeek. GPT-4o became 33% cheaper via API (so on openrouter too) yesterday.

    I didnt try chat variant as I use LLMs only for coding

    I meant their chat. In it, they offer deepseek coder. I didn’t see any info about rate limits, but if you push it hard enough I guess you’re getting limited.

  • lentrolentro Member, Host Rep

    @AXYZE said: @lentro did you try Gemma 2 9B too? It interesting becuase its really multilingual capable, even tho its just 9B model. Works very nice for basic, very fast chatbots.

    Will have to look into! Seems like a standard open license as well [a first for Google!].

  • seenuseenu Member
    edited August 8

    i started using Claude sonnet, it is absolutely fantastic.

    atm, i prefer it over chatgpt

  • LeviLevi Member

    What those 7B, 70B etc. mean? Is this brane level of AI?

  • AXYZEAXYZE Member
    edited August 8

    @Levi said:
    What those 7B, 70B etc. mean? Is this brane level of AI?

    Bilions of parameters

    More parameters = more knowledge

    More knowledge =/= smarter or more accurate tho

    Treat it as "GBs of data", but data can be good or junk.

    Thanked by 1lentro
  • LeviLevi Member
    edited August 8

    @AXYZE said:

    @Levi said:
    What those 7B, 70B etc. mean? Is this brane level of AI?

    Bilions of parameters

    More parameters = more knowledge

    More knowledge =/= smarter or more accurate tho

    Treat it as "GBs of data", but data can be good or junk.

    How many B’s does cgpt 4.5 has?

  • Thank you so much!!!

  • AXYZEAXYZE Member

    @Levi said:

    @AXYZE said:

    @Levi said:
    What those 7B, 70B etc. mean? Is this brane level of AI?

    Bilions of parameters

    More parameters = more knowledge

    More knowledge =/= smarter or more accurate tho

    Treat it as "GBs of data", but data can be good or junk.

    How many B’s does cgpt 4.5 has?

    You meant GPT-4o? Its not public, so we dont know.
    I think its around 1Trilion so 1000B.

  • emghemgh Member

    @AXYZE said:

    @Levi said:

    @AXYZE said:

    @Levi said:
    What those 7B, 70B etc. mean? Is this brane level of AI?

    Bilions of parameters

    More parameters = more knowledge

    More knowledge =/= smarter or more accurate tho

    Treat it as "GBs of data", but data can be good or junk.

    How many B’s does cgpt 4.5 has?

    You meant GPT-4o? Its not public, so we dont know.
    I think its around 1Trilion so 1000B.

    I love how OPENai is the most secretive and least open.

    And the most open (of the big tech boys) is Facebook.

    What a world.

  • I am stupid. How to connect this api to https://github.com/whitead/paper-qa. Github Copilot is not helping :(

  • vpsamvpsam Member

    Thanks for this. Out of interest what level of quantisation (if any) does it have?

  • very good!

  • lentrolentro Member, Host Rep

    @vpsam said: level of quantisation

    No quantization! Full FP16 :)

    Thanked by 1vpsam
  • seenuseenu Member

    what are use-cases for it?


    I have a project with more than 100 html files and i want to change plugin A to plugin B
    so the procedure be like

    • remove css file
    • remove A' js file
    • add B' js file
    • adjust initialization syntax from A to B

    currently cursor/claude able to do that but i have to go to each page to do that and instruct to do it and since it has to send request to server...everytime there is a delay.

    can these local models can do these kind of repetitive tasks quickly?

    this is just one use case...but i am wondering, if this kind of repetitive tasks can be faster with a local hosted Llama.

    thanks

  • mrTommrTom Member

    @seenu said: can these local models can do these kind of repetitive tasks quickly?

    the models can write you a script to do it in a second.

  • seenuseenu Member

    @mrTom said: the models can write you a script to do it in a second.

    like below?

    @seenu said: adjust initialization syntax from A to B

  • vpsamvpsam Member

    @lentro said:

    @vpsam said: level of quantisation

    No quantization! Full FP16 :)

    That's great! I was just looking at your site and you guys have a great selection of GPUs, though there are so many it might be nice to have a page with a table comparing them somewhere?

    Thanked by 1lentro
Sign In or Register to comment.