What is the best technology choice for building a large-scale local storage solution?
I am working on a non-serious project that requires storing a large amount of video resources. The estimated scale is between 500TB-1PB, and I hope to keep maintenance costs as low as possible. Therefore, a single-point solution is acceptable, and a cluster solution is not necessary.
I have some alternative solutions, which one is better? Can you share your experience with me?
Solution 1: Riad-Z2/Z3 based on Truenas, with solid-state hard drives as metadata cache.
Solution 2: Build on bare metal using Minio.
Solution 3: Ceph or similar.
Solution 4: Traditional software RAID with bcache.
I lack experience in managing large local storage, so it is difficult for me to choose. I look forward to some valuable community sharing.
SFTP to some hetzner dedis
What is your budget? Would using a NAS like Synology be cheaper?
I guess that simply depends on the longevity.
The answer is negative. Synology's hardware is known for being expensive. I have a NAS based on the Unraid system at home with a capacity of 100T. However, I am discussing hosting or renting from an IDC service provider. Maintaining large local storage in a home environment is not cost-effective.
we don't know what should ve longevity of the videos but i think renting storage dedis or using aws (or aws alike s3 service) glacier deep storage 1$/TB if you don't need to access it continuously
*will be cheaper
Yes, but considering redundancy, I need to further build a ZFS array or something like Minio erasure coding on top of it. This is quite a headache.
you should ask @atmwebhost!
He had ideas of setting up servers in moon, cyber attacking all datacentres down with a high spec machine with AI robots in a DC
Or storage in the city of Atlantis from @yoursunny
@danblaze 3.7PB array managed by single head server, all connected over local network
A ultimate solution, I have watched the video and concluded that I may not be able to obtain it
Lolz, it was meant to give you some idea on what others are doing to setup custom local network based NAS type storage media. Since you feel this will be too much (which I agree with fully), I feel that it is best to not have this type of storage locally available, but engage a third party to store your data. AWS, GoogleCloud, OracleCloud, all offers large scale storage.
Also check out JuiceFS https://juicefs.com/docs/community/introduction/ with Cloudflare R2 storage. Did a write up and benchmarks at https://github.com/centminmod/centminmod-juicefs which can give you an idea
Really? QNAP is generally cheaper than Synology but far inferior. Making a hot swap equivalent of a Synology for less is very difficult.
Plus, you'd save labour from having to roll your own, which can be time and money saving.
FYI, setup JuiceFS with Cloudflare R2 s3 object storage on my other server which has 2x 960GB NVMe raid 1.
JuiceFS allows you to shard the R2 buckets for storage for better performance which seems to have helped a bit for big file reads and for 1MB big file writes at least Though still relatively slower PUT/GET object latencies from my Dallas server due to R2 locations available. But still adequate for my needs so far
The table below shows comparison between 10x Cloudflare R2 sharded JuiceFS mount vs 5x Cloudflare R2 sharded JuiceFS mount vs 1x Cloudflare JuiceFS mount (default). All R2 storage locations are with location hint North American East.
For 1024MB big file size
For 1MB big file size
Personally multiple servers will keep the costs down and accessing it from a few high powered head servers is more ideal.
Epically if you plan on scaling this larger over time.
Scaling over multiple servers can also allow you to go with some smaller drive options while giving less disk space can reduce the costs in the long run.
Wow, a very meaningful test, thank you.
You're welcome. Been using JuiceFS with Cloudflare R2 for over 1 year now and loving it
I also added JuiceFS Benchmarks 10x R2 Sharded Mount + Redis Metadata Caching
Default 1MB big file.