New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Comments
Google Cloud preemptive GPU is the cheapest as far as what I have found. Google Colab provides one K80 for free on ipython notebook as well. Regardless, no matter how I did the math, it always boiled down to the same conclusion for me: it is way much cheaper to just buy the hardware.
Did AMD GPU catch up?
Hetzner 1080 is cheapest at 116€/m
at that price you can get 2080 TI or Titan if you pay it monthly to the store.
By asking this question you evidently don't know anything about machine learning. The problem is not that AMD GPUs are lousy but because the machine learning libraries are heavily dependent on CUDA and it is too much trouble to rewrite the massive libraries.
You are correct. Did AMD spend money to catch up in the software area?
It is not up to them. AMD is has the libraries that can enable developers to develop their algorithms on, but they cannot force developers to migrate. A large part is because nobody writes from scratch and build on the work of others, which are mostly developed for Nvidia. Sorry but unlikely that AMD GPUs will be widely adopted for machine learning for a long long time.
There are several factors that make the major cloud vendors expensive.
First, AWS, Google Cloud, and Azure all have agreements with NVIDIA (and probably AMD) where they can only offer the expensive datacenter products like NVIDIA Tesla V100, and not cheaper hardware like Titan X (which is advertised for consumer markets, but performs quite well for ML too).
On top of that, though, these cloud platforms focus on allowing you to scale to huge workloads (like hundreds or thousands of GPUs). At this scale it can become costly for companies to maintain their own hardware. The cost of redundant power systems, leaving GPUs idle outside of peak demand, etc. all add up.
So yeah, if you just want a small cluster of 1 or even 8 GPUs, and will have reasonable utilization of the hardware (say, >25%), as Jun said, it'll be much cheaper to purchase yourself and set it up, despite the upfront costs. If you use it enough, and your electricity pricing is $0.2/kWh, then you pay at most like $0.1/hr per GPU after the upfront costs (10x less).
If your utilization is low, though (you only need to train model for maybe 24 hours per month), cloud may be cheaper, and here are some options:
Make sure you look at spot / preemptible pricing because otherwise it is 3x more expensive.
AMD has TensorFlow fork with support for their GPUs and people have reported success with it, with varying performance. https://github.com/ROCmSoftwarePlatform/tensorflow-upstream
There is pytorch support as well (https://rocm.github.io/pytorch.html).
I haven't used it personally, and I suspect it's not super straightforward to install. But I disagree with what you said about developers needing to migrate -- the vast majority (but not all) of ML models that people release are built entirely on top of platforms like TensorFlow and pytorch, and so as long as the platform is supported you should be able to run both training and inference on AMD GPUs.
There are some cases where people extend the platform with their own operations to get better performance for special models (like differentiable warping layers), or crazy people like YOLO researcher (Joseph Redmon) who wrote his own deep learning library using CUDA (!), but it is rare.
Anyway, the savings from migrating from NVIDIA's datacenter GPUs to NVIDIA's consumer GPUs is much bigger than migrating from NVIDIA to AMD.
Edit: in other words, it really is a self-inflicted problem -- AMD hardware is fine but their software is sorely lacking. Recently they have made progress but it's still more of a pain to setup with AMD than with NVIDIA.
Are you looking for something to use hourly? Or are you wanting a system for a month, year?
We have built out and provided clients with quite a few custom GPU servers in the past however they're not really cheap systems and use a boat load of power.
I'm looking for sporadic use, otherwise purchasing the hardware as others said would be the best option. I'm doing this mainly for learning and fun.
For fun and learning purpose, use Google Colab. It gives you an online ipython notebook with K-80 gpu for free. Otherwise, I would recommend that you just buy a "gaming PC" and use it for fun in both ways (game and learning). I have a friend who managed to install two gpus on a mATX (I don't even know how the heck this is even possible) NAS for his personal gpu cluster as well.
Nocix has RTX 2070 for $99/mo
Thanks for your perspective. I mostly agree, especially the self-inflicted problem part. I am not entirely sure if it is that straightforward for TensorFlow and PyTorch to support AMD. I am not an expert on this front (I am more applied ML and AI) but what I hear from colleagues who understand the GPU technicalities underlying TensorFlow and PyTorch is that Nvidia support is so far ahead AMD probably not going to work up to that level.
Depending on the workload, it might be cheaper to just roll your own GPU. I decided to buy a second hand GTX 1070 because of this reason.
That's a feat on a mATX. I suppose you have to choose the cards wisely.