Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


50-100TB video storage cluster
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

50-100TB video storage cluster

AXYZEAXYZE Member

Hey.

I need cost-effective & reliable solution.
50-100TB storage.
Only large (10-25GB) video files that will be streamed.
100% legit stuff, no "dmca ignored" needed.
No need for high availability, but it would be nice benefit.
It needs to have good network in Poland.

My current idea is to get 4x Hetzner Auction Servers (i7 3770, 32GB ram 4x8TB) within same DC.
That will give me 32TB per server, 128TB total.
80-90TB usable would be great, thats why I look at erasure coding instead of RAID10. Raid5 isnt a great idea for such array because of rebuild time right?

Now, which would be best solution to do cheapest, but still reliable cluster?

  1. Proxmox cluster + CEPH + erasure coding
  2. Proxmox cluster + GlusterFS + erasure coding
  3. Proxmox + ZFS simple master/slave (50% usable only :( )
  4. One Proxmox + connected Minio cluster + erasure coding
  5. Something else?

These servers at Hetzner will have just 1GbE link, so I'm not sure if distributed FS is correct thing to do.

I dont have limited budget, but lets say its 100-200euro/mo. Its more like a fun project for me, I would want to check if its possible to do cheap-ass storage cluster that is highly reliable and provides good enough performance.

«13

Comments

  • AstroAstro Member

    There are 40 euro 40TB servers as well!

    Thanked by 1AXYZE
  • AXYZEAXYZE Member

    @Astro said:
    There are 40 euro 40TB servers as well!

    So 160TB for 160 euro. Perfect now I just need solution to get 100TB usable and reliable storage on it :)

    Theres no SSD cache and just 1Gbit, thats why I dont know if these solutions are good

  • servarica_haniservarica_hani Member, Patron Provider
    edited July 2022

    for reliability and even usability it is much easier to have 1 server with many disks
    Another thing you need is at least 25% parity overhead to be safe
    So if you go by zfs raidz6 you will need to have 8 disk groups with 2 disks for parity (we do 6 disks with 2 disks for parity but you can live with 8)

    management of single server is always much more easier than ceph cluster (honestly 100TB usable is too low to justify for ceph cluster)

    From our tests that we posted on reddit ZFS was much more better than Ceph in terms of performance (we tested 72 disks in single server as ZFS and 72 disks in 5x servers as ceph cluster )
    Note: we have years of experience in ZFS and its optimization while Ceph is new to us so I assume with better optimization ceph can perform better but i dont think it can reach zfs levels

    ceph make sense once you reach certain level , for example for us with more than 1k of disks it started to be painful to manage all that as individual servers and thats why we are interested in ceph

  • ManishPantManishPant Member, Host Rep

    If you checkout Hetzner Server auction there is one server of Euro 140.70 / month and Euro 144.70/month Having 160TB HDD and 2*960 Nvme Drive in single server

    https://www.hetzner.com/sb?hddcount_from=0&hdd_from=16000&hdd_to=17000

    Thanked by 1AXYZE
  • AXYZEAXYZE Member

    @servarica_hani said:
    for reliability and even usability it is much easier to have 1 server with many disks
    Another thing you need is at least 25% parity overhead to be safe
    So if you go by zfs raidz6 you will need to have 8 disk groups with 2 disks for parity (we do 6 disks with 2 disks for parity but you can live with 8)

    management of single server is always much more easier than ceph cluster (honestly 100TB usable is too low to justify for ceph cluster)

    From our tests that we posted on reddit ZFS was much more better than Ceph in terms of performance (we tested 72 disks in single server as ZFS and 72 disks in 5x servers as ceph cluster )
    Note: we have years of experience in ZFS and its optimization while Ceph is new to us so I assume with better optimization ceph can perform better but i dont think it can reach zfs levels

    ceph make sense once you reach certain level , for example for us with more than 1k of disks it started to be painful to manage all that as individual servers and thats why we are interested in ceph

    Hmmm... maybe going distributed with just 32TB per node is not worth it like you say...
    Any experience with Raid-Z2 vs erasure coding?

  • servarica_haniservarica_hani Member, Patron Provider

    @AXYZE said:

    @servarica_hani said:
    for reliability and even usability it is much easier to have 1 server with many disks
    Another thing you need is at least 25% parity overhead to be safe
    So if you go by zfs raidz6 you will need to have 8 disk groups with 2 disks for parity (we do 6 disks with 2 disks for parity but you can live with 8)

    management of single server is always much more easier than ceph cluster (honestly 100TB usable is too low to justify for ceph cluster)

    From our tests that we posted on reddit ZFS was much more better than Ceph in terms of performance (we tested 72 disks in single server as ZFS and 72 disks in 5x servers as ceph cluster )
    Note: we have years of experience in ZFS and its optimization while Ceph is new to us so I assume with better optimization ceph can perform better but i dont think it can reach zfs levels

    ceph make sense once you reach certain level , for example for us with more than 1k of disks it started to be painful to manage all that as individual servers and thats why we are interested in ceph

    Hmmm... maybe going distributed with just 32TB per node is not worth it like you say...
    Any experience with Raid-Z2 vs erasure coding?

    we did it.
    compared to mirrors which is faster and the results was highly in favour of ZFS
    https://www.reddit.com/r/DataHoarder/comments/up9tiu/servarica_distributed_storage_test_series/

    check the first 2 parts as well if you want to see the full process
    https://www.reddit.com/r/DataHoarder/comments/tb69gv/servarica_distributed_storage_test_series_ceph/
    https://www.reddit.com/r/DataHoarder/comments/tf73wt/servarica_distributed_storage_test_series_part2/

    Thanked by 2AXYZE tjn
  • @Astro said:
    There are 40 euro 40TB servers as well!

    Gone from Hetzner auction already. Now $50 per 4x10TB.

  • letloverletlover Member
    edited July 2022

    If it is a 100tb cluster for video streaming >10 gb video clips, limited unlimited bw may be a potential issue.

  • afnafn Member
    edited July 2022

    @AXYZE said: Only large (10-25GB) video files that will be streamed.

    Stream? what is your traffic requirements tho? you need to be sure of that if you are going with hetzner, it appears there is some undefined FUP, and with such large files streamed you may be reaching that FUP if you have a lot of traffic.

    I would suggest Andy10gbit and Walkerservers as well for large storage solutions. Andy appears expensive at the surface but PMing him on discord and asking nicely may -depending on stock- get you a good discounted deal (from my own humble experience with him)

    for reliability and even usability it is much easier to have 1 server with many disks

    euh... but having 4 servers ==> 4x 1gbps port, vs 1 huge server with 1gbps. Better network capacity with multiple servers.

    But if that 1 server is 5/10 gbps that is a different story

    Thanked by 1AXYZE
  • dfroedfroe Member, Host Rep

    Depending on your knowledge and experience regarding distributed file systems I would also consider a simple setup without Ceph, GlusterFS etc.

    Instead you could simply store the files across all your servers, maintain a simple index which file is stored on which server and use this for redirecting requests to the correct server.
    For example srv.example.com/foo redirects to srv1.example.com/foo, srv.example.com/bar to srv42.example.com/bar and so on.
    Avoiding distributed file systems could make maintanence easier and the whole setup more flexible as servers could be placed in any datacenter.
    You might also abstain from any local redundancy like RAID or RAIDz if instead you save each file on two (or three) servers in different datacenters.
    Then you could also use the redirector service as a load balancer.

    In my personal opinion this sounds more robust, reliable, flexible and scalable.

  • servarica_haniservarica_hani Member, Patron Provider

    @afn said:

    for reliability and even usability it is much easier to have 1 server with many disks

    euh... but having 4 servers ==> 4x 1gbps port, vs 1 huge server with 1gbps. Better network capacity with multiple servers.

    But if that 1 server is 5/10 gbps that is a different story

    Ceph will use part of these 1gbps for its internal communication
    and on re balancing it will use it all

    also If you can maximize the 1gbps thats not bad for 100TB of usable data but more is better

    Thanked by 1AXYZE
  • afnafn Member

    @servarica_hani said: also If you can maximize the 1gbps thats not bad for 100TB of usable data but more is better

    I can assure you can maximize a 1gbps for 24/7 with only 512GB of content.

    Which is why I asking OP to thing about his traffic as well not only storage. (not a priority, but worth thinking about)

  • LeviLevi Member

    Is it iptv stuff with vod? It is not innocent...

  • AXYZEAXYZE Member

    @LTniger said:
    Is it iptv stuff with vod? It is not innocent...

    Stream archive platform for several top streamers & project archiving for YouTubers.
    Access will be available for patrons & editors which are making "shorts" for YouTube & TikTok etc.

    That's why I say it's more "fun project" than something that needs to be 24/7 available. Its not business critical. Some downtime is fine, losing all data not so much xD

    @afn said:
    Which is why I asking OP to thing about his traffic as well not only storage. (not a priority, but worth thinking about)

    I know about Hetzner drama, if I will go near this 250TB limit I will just add Hetzner Cloud VPS (internal traffic is not calculated).

    50TB-100TB storage is not needed at this moment (eventually it will be, but idk how fast it will fill up). Sponsor (streamer) of this project wants to pay for whole year and don't think about limits, scaling. Just use it (for his editors & supporters) and give upload access to his Twitch & YouTube friends. So many people are pissed of at some recent events on Twitch & YouTube, long story and even I don't know it fully as they have some NDAs etc.

  • AXYZEAXYZE Member
    edited July 2022

    Based on suggestions from here... What about this:

    2 Big servers
    Xeon E3-1275V6
    64GB ECC
    4x 16TB (64TB per server)
    1Gbps, 250TB FUP

    And then setup filesystem with erasure coding (afaik ZFS doesnt support it)?
    I don't think that going RAID-5 on 16TB drives is good idea, rebuild time are gone be looong and too high risk of losing data.
    I would have two pools of data, each for server, like suggested by @dfroe . That would theoretically give me 2Gbps 500TB traffic - ofc scaling won't be that perfect, but still it will be major improvement over single 1Gbps server.

    Or... maybe... single server is better idea? Then I would get:

    1 Mega big server
    Intel Xeon W-2145
    128GB RAM
    10x 16TB (160GB total) + 2x 960GB NVMe
    1Gbps, 250TB FUP.

    I think two servers is better way, happy to hear your suggestions!

    @servarica_hani thank you very much for your input, your reddit posts gave me nice info :)

  • yoursunnyyoursunny Member, IPv6 Advocate

    YOLORAID

    Mentally strong people use YOLORAID.
    Buy top-tier hard drives and they won't fail.

  • HxxxHxxx Member
    edited July 2022

    I'm glad @dfroe said that (assuming no edits to his post).

    Anyway I was thinking why ZFS? Are you skipping HW RAID ? (which normally people skip or put disks into HBA when using ZFS).

    Don't know if you heard about the Linus (the media content creator I think this is) that lost data due to using ZFS, worth a look to not make the same mistake.

    Then I was reading that ZFS need scrubbing (periodically / recurrent every X time) to make sure all files are intact and not sitting in bad sectors / corrupted. Then it seems also that ECC RAM is crucial for this to work properly and avoid potential issues.

    I'm a fan of simplicity. HW Raid or mdadm + monitoring. But I understand simplicity is not enough for all scenarios and there are cases where ZFS, CEPH, etc are the way to go due to customizability (if that's a word lol).

    Would love to hear more from @servarica_hani as it seems they had proper experience with ZFS.

    Thanked by 1AXYZE
  • dfroedfroe Member, Host Rep

    I think we had some threads about ZFS experience here on LET; and of course you'll find them on reddit (datahoarder).

    I personally run ZFS on FreeBSD and Linux since more than a decade, and it's one piece of code that I am really happy that it exists. You need to know a bit how it works in order to use it right. But it's a real pleassure to handle and maintain ZFS pools. From my personal experience, having started with ZFS on FreeBSD but meanwhile also using ZFS on all my Linux servers.

    However ZFS is just something to be used locally on one machine. Of course you can do certain helpful jobs with zfs send/receive and snapshots, but it is no distributed file system. ZFS is something you use on a particular machine.

    And regarding simplicity, yes, ZFS has certain features like RAIDz(2), integrated checksums, scrubbing, advanced RAM caching etc. and relying on ECC RAM is a must if you care about your data. But that's no disadvantage of ZFS at all. With other file systems you simply do not know when such bit rot occurs. ;)

  • risharderisharde Patron Provider, Veteran
    edited July 2022

    @Astro said:
    There are 40 euro 40TB servers as well!

    Where if you don't mind me asking? Just checked Hetzner auctions and didn't see that, don't think I have ever noticed that

  • servarica_haniservarica_hani Member, Patron Provider

    @Hxxx said:

    Then I was reading that ZFS need scrubbing (periodically / recurrent every X time) to make sure all files are intact and not sitting in bad sectors / corrupted. Then it seems also that ECC RAM is crucial for this to work properly and avoid potential issues.

    I'm a fan of simplicity. HW Raid or mdadm + monitoring. But I understand simplicity is not enough for all scenarios and there are cases where ZFS, CEPH, etc are the way to go due to customizability (if that's a word lol).

    Would love to hear more from @servarica_hani as it seems they had proper experience with ZFS.

    there is alot of misinformation about scubbing out there
    Scubbing is more needed when the data is not accessed regularly so for example if you have backup server that you write the data once to it and maybe will restore the data few years in the future in this case scrubbing is really important as data can rot while at rest

    But for active data that are regularly accessed (at least once per year ) usually you dont need scrubbing as the same scrubbing process is done when ever you read the data

    Another thing that lower the need for scrubbing is having raidz2 as even if you have bit rot you will need 3x blocks in same position in 3 disks to be corrupted in the group to have data loss which is statically very unlikely

    finally having proper Enterprise disks make huge diff as usually they have their BER rating at 10x less likely to have bit error rates than consumer disks which adds to the reliability of the system

    Note: ECC Ram is a must with ZFS, dont do ZFS without ECC

    from our experience ZFS is years ahead of any hardware raid solution , it is open sourced solution that you will find whole communities to support you in case of issues and it do everything to maintain your data

    Thanked by 2Hxxx AXYZE
  • HxxxHxxx Member
    edited July 2022

    @servarica_hani thanks for the reply.

    Do you have time to elaborate about this: "it is years ahead of any hw raid solution". ?

    What advantages you see in comparison to just having a proper HW RAID (modern) solution?

    Edit:Edit:Edit... typos.

  • servarica_haniservarica_hani Member, Patron Provider

    @Hxxx said:
    @servarica_hani thanks for the reply.

    Do you have time to elaborate about this: "it is years ahead of any hw raid solution". ?

    What advantages you see in comparison to just having a proper HW RAID (modern) solution?

    Edit:Edit:Edit... typos.

    I will use @dfroe answer here

    @dfroe said:
    And regarding simplicity, yes, ZFS has certain features like RAIDz(2), integrated checksums, scrubbing, advanced RAM caching etc. and relying on ECC RAM is a must if you care about your data. But that's no disadvantage of ZFS at all. With other file systems you simply do not know when such bit rot occurs. ;)

    I will add the following example
    In case you need to build big array of 60x disks
    using hardware raid you are forced to do raid 5 , 6, 10, 50 or 60 or have more than one array with lvm gluing them which is band aid solution as you get 2 layers of overhead above your disks to access them
    The option will either waste too much space in case of raid 10, 50 ,60
    or have too little parity in case of raid 5 and 6

    while in ZFS you can divide your disks into groups each with parity of 2 disks so you get 1 file system that has 6x groups of 10 disks each and each group have 2 of those 10 disks for parity
    Add to that the rebuild time when one disk fails is only for that group not the whole cluster while in raid 5, 6, 50 , 60 it is either the whole cluster has to be reread or half of it

    ZFS has integrated checksum which means it has native way to find and fix bit rot , it do check the checksum of the block when it read it and will immediately fix it if there is something wrong

    Add that to layered caching which is just started to appear in some of hardware raid solutions

    Another aspect is the excellent metrics we get our of each disk in the system and the ability to monitor smallest details about the disks operations which all will be hidden behind the HW raid device if we go hardware raid route

    I mean this just what I remembered just now but the list goes on

  • AXYZEAXYZE Member
    edited July 2022

    Thanks for help guys!

    In the end we decided to go with 3 servers.

    One AX41 (Ryzen 5 3600, 64GB RAM, 2x 512GB NVMe)
    Two auction servers (Xeon E5-1650V3, 128GB, 10x 10TB Enterprise HDD)

    AX41 will act as main server (getting streams from streamers, hosting website, discord bot etc.)
    Two auction servers will be setup as cdn1.* and cdn2.* and they will just have video files.

    With that solution we can stream videos at up to 2Gbps (if we have perfect load balancing between servers) and have separate server which makes sure that nagivation / stream downloading is perfect even if two storage servers are under heavy load.

    I'm still wanting to get your opinion on RAID-6 on ZFS vs. erasure coding on another filesystem. Share your experiences!

    Edit: I just added "480GB SSD Datacenter Edition" to these storage servers to have separate boot drive. Many space left, maybe I can use like 300GB of it to help these HDDs? I know ZFS can cache on SSD and RAM (two layers) - good idea?

  • dfroedfroe Member, Host Rep

    Chosing between RAID-6 and RAIDz2 (ZFS) is probably a question of personal taste.
    Building a software RAID-6 follows a strict layered approach and each layer is quite easy to understand. The RAID-6 itself is build with mdadm and on top of that you put a filesystem of your trust (e.g. ext4 or XFS). You may add LVM in between if it adds some benefit.
    ZFS on the other hand combines these three layers (RAID, LVM, Filesystem). There a advantages and disadvantages; while I learned to love the advantages. :)

    Another advantage of RAIDz besides what's already mentioned is that for a successful rebuild each individual block must be readable from at least one disk while with traditional RAID the whole remaining disks must be readable, reducing the chance to lose the whole array due to hardware failure.

    Speaking for ZFS (which I personally would use) a RAIDz2 with 10 disks will perform pretty good.
    Best performance with RAIDz (most efficient distribution across disks) is achieved with 2^n data disks plus parity (which is true for 8+2 disks in RAIDz2).

    In theory you could gain a read performance of 8x the read performance of each disk and a write performance of the write performance of the weakest disk. Which should be fine for your use case as you will read much more data than write.

    Having a large amount of RAM will be beneficial for ARC (ZFS RAM cache) which will work out of the box.
    There is also L2ARC which optionally can be configured as a SSD cache but intuitivly I doubt if this would significantly boost your actual end-user performance.
    You can add a L2ARC without any disadvantages but it might not make anything better, either. You will have to test and benchmark under real life conditions.

    A separate SIL on redundant (no single) SSDs can improve the weak performance of ZFS during synchronous writes but in your use case you will mostly have asynchronous writes which always go into RAM first so I wouldn't configure a (separate) SIL.

    TL;DR: RAIDz2 with 10 disks and a large amount of RAM should be a good choice to deliver video files.

    Thanked by 1AXYZE
  • AXYZEAXYZE Member

    @dfroe said:
    Chosing between RAID-6 and RAIDz2 (ZFS) is probably a question of personal taste.
    Building a software RAID-6 follows a strict layered approach and each layer is quite easy to understand. The RAID-6 itself is build with mdadm and on top of that you put a filesystem of your trust (e.g. ext4 or XFS). You may add LVM in between if it adds some benefit.
    ZFS on the other hand combines these three layers (RAID, LVM, Filesystem). There a advantages and disadvantages; while I learned to love the advantages. :)

    Another advantage of RAIDz besides what's already mentioned is that for a successful rebuild each individual block must be readable from at least one disk while with traditional RAID the whole remaining disks must be readable, reducing the chance to lose the whole array due to hardware failure.

    Speaking for ZFS (which I personally would use) a RAIDz2 with 10 disks will perform pretty good.
    Best performance with RAIDz (most efficient distribution across disks) is achieved with 2^n data disks plus parity (which is true for 8+2 disks in RAIDz2).

    In theory you could gain a read performance of 8x the read performance of each disk and a write performance of the write performance of the weakest disk. Which should be fine for your use case as you will read much more data than write.

    Having a large amount of RAM will be beneficial for ARC (ZFS RAM cache) which will work out of the box.
    There is also L2ARC which optionally can be configured as a SSD cache but intuitivly I doubt if this would significantly boost your actual end-user performance.
    You can add a L2ARC without any disadvantages but it might not make anything better, either. You will have to test and benchmark under real life conditions.

    A separate SIL on redundant (no single) SSDs can improve the weak performance of ZFS during synchronous writes but in your use case you will mostly have asynchronous writes which always go into RAM first so I wouldn't configure a (separate) SIL.

    TL;DR: RAIDz2 with 10 disks and a large amount of RAM should be a good choice to deliver video files.

    For now I've gone with BTRFS as its supported out of the box with Hetzner images with no extra hassle.

    I'll try to install ZFS on second box and compare them. Could you share best practices to do it?
    On BTRFS box I've chosen Ubuntu :)

  • dfroedfroe Member, Host Rep

    I wouldn't call it a 'best practice', but in Linux environments it can be easier for example to use the first 100 GB of each disk traditionally for the operating system (for example with mdadm RAID, LVM, etc., whatever you are used to) and then create a partition with the remaining 9+ TB for the ZFS pool. It makes no difference whether you create the ZFS pool on the disks or partitions of the disks. This way you are more flexible to reinstall the linux os keeping the zpool untouched so you can import it again.

    If you seek for documentations, there are tons of it to be found on the internet.

    Thanked by 1AXYZE
  • I have few Hetzner server with private 10gb switch . you can use it for proxmox idea.
    don't remember exactly but switch was ~40euro and per server it was 8 euro for 10gb card I guess.

    Thanked by 2k4zz AXYZE
  • AXYZEAXYZE Member

    @dfroe said:
    I wouldn't call it a 'best practice', but in Linux environments it can be easier for example to use the first 100 GB of each disk traditionally for the operating system (for example with mdadm RAID, LVM, etc., whatever you are used to) and then create a partition with the remaining 9+ TB for the ZFS pool. It makes no difference whether you create the ZFS pool on the disks or partitions of the disks. This way you are more flexible to reinstall the linux os keeping the zpool untouched so you can import it again.

    If you seek for documentations, there are tons of it to be found on the internet.

    I've already setup btrfs like this.
    200GB for OS on ext4 (ext4 should provide better performance from what I saw online)
    rest (around 74TB after RAID-6) is BTRFS mounted as /home/ :)

    I'll try to do the similar thing with ZFS.
    I need to study how LVM/ZFS works or try to do it with Proxmox.

  • AXYZEAXYZE Member
    edited July 2022

    I've tried to setup ZFS on it, but failed.

    Tried everything I could think of.
    QEMU (with virtio to enable more than 4 disks) + Proxmox VNC installation got me "almost working" - its stuck at 99% installation progress on "make bootable drive". htop shows me that there is still around 8% cpu usage by qemu, ram usage is going up and down but its still stuck at 99% installation progress.
    I've saw online that it is a bug with floppy drive, so I'll try Q35 QEMU now, maybe it will fix it.

    Do you guys have method how I can setup ZFS with RAID-Z2 on 10 disks where system is on that ZFS array?
    Debian, Proxmox, Ubuntu... I dont care.

    I've thought about adding SSD for boot+os, but Hetzner upgrade option for SX132 is not good for me - I cant install SATA SSD or NVMe SSD, I can only install "NVMe Datacenter" and only in high capacity (wtf?), I cant choose small capacity like 480GB. It becomes unnecessary expensive, and point of this project is to be as cheap as possible without compromising on reliability, speed is already good enough. :)
    Maybe someone has an idea how to do it. I've wasted 8h trying to do it xD

  • @AXYZE said: QEMU (with virtio to enable more than 4 disks) + Proxmox VNC installation got me "almost working" - its stuck at 99% installation progress on "make bootable drive". htop shows me that there is still around 8% cpu usage by qemu, ram usage is going up and down but its still stuck at 99%.

    I've saw online that it is a bug with floppy drive, so I'll try Q35 QEMU now, maybe it will fix it.

    Why?
    Ask them to connect IPMI to your server, you will have it for 3 hours, and you can mount
    real ISOs and have real VNC directly from UEFI and down. Why complicate it with QEMU?

    Thanked by 1AXYZE
Sign In or Register to comment.