Any good GPU providers for AI, Machine Learning tasks?

Globepeer · December 2019

I don't see any cheaper options than your typical AWS, Microsoft and Google solutions.

Jun · December 2019

Google Cloud preemptive GPU is the cheapest as far as what I have found. Google Colab provides one K80 for free on ipython notebook as well. Regardless, no matter how I did the math, it always boiled down to the same conclusion for me: it is way much cheaper to just buy the hardware.

greattomeetyou · December 2019

Jun said: it is way much cheaper to just buy the hardware.

Did AMD GPU catch up?

stefeman · December 2019

Hetzner 1080 is cheapest at 116€/m

at that price you can get 2080 TI or Titan if you pay it monthly to the store.

poisson · December 2019

@greattomeetyou said:

Jun said: it is way much cheaper to just buy the hardware.

Did AMD GPU catch up?

By asking this question you evidently don't know anything about machine learning. The problem is not that AMD GPUs are lousy but because the machine learning libraries are heavily dependent on CUDA and it is too much trouble to rewrite the massive libraries.

greattomeetyou · December 2019

poisson said: By asking this question you evidently don't know anything about machine learning.

You are correct. Did AMD spend money to catch up in the software area?

poisson · December 2019

@greattomeetyou said:

poisson said: By asking this question you evidently don't know anything about machine learning.

You are correct. Did AMD spend money to catch up in the software area?

It is not up to them. AMD is has the libraries that can enable developers to develop their algorithms on, but they cannot force developers to migrate. A large part is because nobody writes from scratch and build on the work of others, which are mostly developed for Nvidia. Sorry but unlikely that AMD GPUs will be widely adopted for machine learning for a long long time.

perennate · December 2019

There are several factors that make the major cloud vendors expensive.

First, AWS, Google Cloud, and Azure all have agreements with NVIDIA (and probably AMD) where they can only offer the expensive datacenter products like NVIDIA Tesla V100, and not cheaper hardware like Titan X (which is advertised for consumer markets, but performs quite well for ML too).

On top of that, though, these cloud platforms focus on allowing you to scale to huge workloads (like hundreds or thousands of GPUs). At this scale it can become costly for companies to maintain their own hardware. The cost of redundant power systems, leaving GPUs idle outside of peak demand, etc. all add up.

So yeah, if you just want a small cluster of 1 or even 8 GPUs, and will have reasonable utilization of the hardware (say, >25%), as Jun said, it'll be much cheaper to purchase yourself and set it up, despite the upfront costs. If you use it enough, and your electricity pricing is $0.2/kWh, then you pay at most like $0.1/hr per GPU after the upfront costs (10x less).

If your utilization is low, though (you only need to train model for maybe 24 hours per month), cloud may be cheaper, and here are some options:

AWS - NVIDIA Tesla V100 is top-performance GPU and spot instance pricing is generally around $1/hr (does not include disk/network). Spot instances may be shutoff if they run out of capacity.
Google Cloud - preemptible V100 is $0.74/hr. This does not include the memory/CPU either, after adding that I'd say it's comparable to AWS.
PaperSpace - less known and I can't speak to their quality but I think they use consumer-grade GPUs so they can offer lower pricing. But I checked the pricing just now and it really doesn't seem cheaper at all. $0.78/hr for P5000.

Make sure you look at spot / preemptible pricing because otherwise it is 3x more expensive.

perennate · December 2019

poisson said: It is not up to them. AMD is has the libraries that can enable developers to develop their algorithms on, but they cannot force developers to migrate. A large part is because nobody writes from scratch and build on the work of others, which are mostly developed for Nvidia. Sorry but unlikely that AMD GPUs will be widely adopted for machine learning for a long long time.

AMD has TensorFlow fork with support for their GPUs and people have reported success with it, with varying performance. https://github.com/ROCmSoftwarePlatform/tensorflow-upstream

There is pytorch support as well (https://rocm.github.io/pytorch.html).

I haven't used it personally, and I suspect it's not super straightforward to install. But I disagree with what you said about developers needing to migrate -- the vast majority (but not all) of ML models that people release are built entirely on top of platforms like TensorFlow and pytorch, and so as long as the platform is supported you should be able to run both training and inference on AMD GPUs.

There are some cases where people extend the platform with their own operations to get better performance for special models (like differentiable warping layers), or crazy people like YOLO researcher (Joseph Redmon) who wrote his own deep learning library using CUDA (!), but it is rare.

Anyway, the savings from migrating from NVIDIA's datacenter GPUs to NVIDIA's consumer GPUs is much bigger than migrating from NVIDIA to AMD.

Edit: in other words, it really is a self-inflicted problem -- AMD hardware is fine but their software is sorely lacking. Recently they have made progress but it's still more of a pain to setup with AMD than with NVIDIA.

PureVoltage · December 2019

Are you looking for something to use hourly? Or are you wanting a system for a month, year?
We have built out and provided clients with quite a few custom GPU servers in the past however they're not really cheap systems and use a boat load of power.

Globepeer · December 2019

I'm looking for sporadic use, otherwise purchasing the hardware as others said would be the best option. I'm doing this mainly for learning and fun.

Jun · December 2019

@Globepeer said:
I'm looking for sporadic use, otherwise purchasing the hardware as others said would be the best option. I'm doing this mainly for learning and fun.

For fun and learning purpose, use Google Colab. It gives you an online ipython notebook with K-80 gpu for free. Otherwise, I would recommend that you just buy a "gaming PC" and use it for fun in both ways (game and learning). I have a friend who managed to install two gpus on a mATX (I don't even know how the heck this is even possible) NAS for his personal gpu cluster as well.

exception0x876 · December 2019

@stefeman said:
Hetzner 1080 is cheapest at 116€/m

at that price you can get 2080 TI or Titan if you pay it monthly to the store.

Nocix has RTX 2070 for $99/mo

poisson · December 2019

@perennate said:

poisson said: It is not up to them. AMD is has the libraries that can enable developers to develop their algorithms on, but they cannot force developers to migrate. A large part is because nobody writes from scratch and build on the work of others, which are mostly developed for Nvidia. Sorry but unlikely that AMD GPUs will be widely adopted for machine learning for a long long time.

AMD has TensorFlow fork with support for their GPUs and people have reported success with it, with varying performance. https://github.com/ROCmSoftwarePlatform/tensorflow-upstream

There is pytorch support as well (https://rocm.github.io/pytorch.html).

I haven't used it personally, and I suspect it's not super straightforward to install. But I disagree with what you said about developers needing to migrate -- the vast majority (but not all) of ML models that people release are built entirely on top of platforms like TensorFlow and pytorch, and so as long as the platform is supported you should be able to run both training and inference on AMD GPUs.

There are some cases where people extend the platform with their own operations to get better performance for special models (like differentiable warping layers), or crazy people like YOLO researcher (Joseph Redmon) who wrote his own deep learning library using CUDA (!), but it is rare.

Anyway, the savings from migrating from NVIDIA's datacenter GPUs to NVIDIA's consumer GPUs is much bigger than migrating from NVIDIA to AMD.

Edit: in other words, it really is a self-inflicted problem -- AMD hardware is fine but their software is sorely lacking. Recently they have made progress but it's still more of a pain to setup with AMD than with NVIDIA.

Thanks for your perspective. I mostly agree, especially the self-inflicted problem part. I am not entirely sure if it is that straightforward for TensorFlow and PyTorch to support AMD. I am not an expert on this front (I am more applied ML and AI) but what I hear from colleagues who understand the GPU technicalities underlying TensorFlow and PyTorch is that Nvidia support is so far ahead AMD probably not going to work up to that level.

Depending on the workload, it might be cheaper to just roll your own GPU. I decided to buy a second hand GTX 1070 because of this reason.

poisson · December 2019

@Jun said:

@Globepeer said:
I'm looking for sporadic use, otherwise purchasing the hardware as others said would be the best option. I'm doing this mainly for learning and fun.

For fun and learning purpose, use Google Colab. It gives you an online ipython notebook with K-80 gpu for free. Otherwise, I would recommend that you just buy a "gaming PC" and use it for fun in both ways (game and learning). I have a friend who managed to install two gpus on a mATX (I don't even know how the heck this is even possible) NAS for his personal gpu cluster as well.

That's a feat on a mATX. I suppose you have to choose the cards wisely.

Howdy, Stranger!

Categories

In this Discussion

Any good GPU providers for AI, Machine Learning tasks?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Any good GPU providers for AI, Machine Learning tasks?

Comments