Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


How best to store large list of mp4/webms
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

How best to store large list of mp4/webms

trycatchthistrycatchthis Member
edited December 2022 in General

I need to reliably store a potentially large list of mp4/webm/other.

Is object storage the correct task for this? S3 bucket?

If not what else?

Comments

  • Deep Antarctic storage by @yoursunny
    You cannot go wrong there

  • Can you give us more info?
    How many TB's do you want to store, how accessible to they need to be, do you have a tight budget, etc...?

  • @tjn said:
    Can you give us more info?
    How many TB's do you want to store, how accessible to they need to be, do you have a tight budget, etc...?

    1. They wont be accesses regularly but need to be available in the event there is a problem.
    2. How many TB depends on how many customers we get and how large the average file is.
    3. At this stage budget is less of a concern as the data does not exist yet. I think object is the correct thing as its scalable but I'm not sure that is the correct thing.

    For all I know a simple storage VPS and ftp or some web facing app like nextcloud might be all thats really necessary.

  • PUSHR_VictorPUSHR_Victor Member, Host Rep
    edited December 2022

    Object storage seems to be the thing you need, yes. Check the TOS of the providers you shortlist to make sure there are no shadow charges you'd not know about right away (minimum retention periods, requests cost if any, storage-to-egress ratios that might be a problem for when you need to pull the data out, etc.)
    If the provider offers a pay-as-you-go plan so that you can pay as your usage grows, that would be good. But also check in what increments the storage is billed, because some that offer pay-as-you-go list their price per GB, but bill in TB increments.

  • rm_rm_ IPv6 Advocate, Veteran
    edited December 2022

    @trycatchthis said: For all I know a simple storage VPS and ftp or some web facing app like nextcloud might be all thats really necessary.

    This is a good hunch, no need to overpay for fancy "object" storage, unless you need it to be scalable to beyond what's realistically available on a single VPS or dedi (hundreds of TB). Really, no need to go "object" or "cloud" if the only reason is "I heard that's what you're supposed to use".

    AWS S3: storing 1 (one) TB: $23/month
    retrieving even just 250GB/month from it to the Internet: $45/month
    = $68/month

    OVH SYS-1-SAT-32:
    Intel Xeon D1520 4c / 8t 2.2GHz
    32GB DDR3 ECC 2133MHz
    SoftRAID 4x2TB SATA
    unmetered bandwidth
    = $30/month

  • If you might go down the NextCloud route, check out FileRun.
    Much lighter, easier to manage/install and just as powerful.

  • AXYZEAXYZE Member
    edited December 2022

    @trycatchthis said:

    @tjn said:
    Can you give us more info?
    How many TB's do you want to store, how accessible to they need to be, do you have a tight budget, etc...?

    1. They wont be accesses regularly but need to be available in the event there is a problem.
    2. How many TB depends on how many customers we get and how large the average file is.
    3. At this stage budget is less of a concern as the data does not exist yet. I think object is the correct thing as its scalable but I'm not sure that is the correct thing.

    For all I know a simple storage VPS and ftp or some web facing app like nextcloud might be all thats really necessary.

    Wasabi will cost you $5.99/TB for this usage.
    There's two gotchas to their offer:

    • Minimum file retention period is 90 days. If you upload something and delete after couple of days you will still be billed like it was stored for 90 days.
    • You cant egress more than you store - let say transfer out 2TB when you store 1TB. Ingres is not calculated.

    From what you wrote you wont feel these gotchas so Wasabi is great choice for you.

    Two dedis with replication between them and NO raid will be more cost effective (around $2.5 for TB if you go with Hetzner SX auction servers)... if you fill them up.

    If you don't know how many TB you need to store and it can change anytime then start with Wasabi, transfer files to dedis when you can fill them up to save cost.

    It's way better to have two dedis with no raid than single one with RAID10 and cost per TB will be the same as RAID10 eats half of storage per server.

  • AXYZEAXYZE Member
    edited December 2022

    @rm_ said:

    @trycatchthis said: For all I know a simple storage VPS and ftp or some web facing app like nextcloud might be all thats really necessary.

    This is a good hunch, no need to overpay for fancy "object" storage, unless you need it to be scalable to beyond what's realistically available on a single VPS or dedi (hundreds of TB). Really, no need to go "object" or "cloud" if the only reason is "I heard that's what you're supposed to use".

    AWS S3: storing 1 (one) TB: $23/month
    retrieving even just 250GB/month from it to the Internet: $45/month
    = $68/month

    OVH SYS-1-SAT-32:
    Intel Xeon D1520 4c / 8t 2.2GHz
    32GB DDR3 ECC 2133MHz
    SoftRAID 4x2TB SATA
    unmetered bandwidth
    = $30/month

    AWS S3 vs OVH SYS

    Multiple Gb/s VS 250Mbps
    Replicated on different racks VS one server.

    Getting "even just 250GB" will take more than 2 hours on SYS.
    No wonder AWS is more 2x more expensive when AWS its multiple times better, but idk why you would even compare that.
    If OP says "I need to reliably store" its a wrong idea to get single dedi with RAID.
    PSU failure, FS corruption or even successful hack of server.

    If he wants to setup dedis then its a lot better to get two of them, YOLORAID and then he gets 250Mbps per server (so 500Mbps max), real replication in different rack (or different building / DC), if one server is attacked then second one still can have data or have chance to restore it.

    But even better to go with Hetzner 32-100TB. Even cheaper per TB, 1Gbps network per server.
    With 1-6TB data its more cost effective to use Wasabi, because you per per GB, not for free TB's you maybe will use in future.

    Once you use 30-100TB transfer that to dedis and start over fresh on Wasabi.

  • rm_rm_ IPv6 Advocate, Veteran
    edited December 2022

    @AXYZE said: Multiple Gb/s VS 250Mbps

    Irrelevant unless 250 Mbps becomes the bottleneck. So far there is no indication (from the scarce info that we have) that it is likely to become one.

    @AXYZE said: No wonder AWS is more 2x more expensive when AWS its multiple times better

    It's only 2x more expensive if you stick to the low 250 GB bandwidth estimate. But if you add even 30TB for the 24/7 of 100 Mbit (not to mention 250 on SYS), that would cost $2662/month.

    @AXYZE said: If OP says "I need to reliably store"

    I would humbly suggest to just "store", and then also backup.
    Not to pay 10x-100x for a single storage location with S3, because it is "reliable".
    And yes, even if S3 replicates your data across continents, for the purpose of backup management the entire thing still counts as "I have all my data stored in one place".

    @AXYZE said: a lot better to get two of them, YOLORAID and then he gets 250Mbps per server (so 500Mbps max)

    Yeah, that's a good point! And could just keep adding servers to add storage and b/w. Running something like GlusterFS across all of those might also allow for a more efficient replication scheme than all the same data on all of them.

  • AXYZEAXYZE Member
    edited December 2022

    @rm_ said:

    @AXYZE said: Multiple Gb/s VS 250Mbps

    Irrelevant unless 250 Mbps becomes the bottleneck. So far there is no indication (from the scarce info that we have) that it is likely to become one.

    Yes, there's not enough info available from OP, but I show difference between them. OVH is not just cheaper for same thing. AWS has more capabilities and you pay for that.
    S3 is up to 100Gbps (but good luck finding another end that can download/upload that fast).

    If you need backup as fast as possible and you lose money or lose clients because you need to wait 2 hours then AWS bill may be pennies. Just stating a difference why OVH is so much cheaper. For example I have client that stores 3TB, all video files for his paid video site that generates at least $2000 daily (its surely more than). Retrieving this data from SYS would take more than day. I just show difference so people wont say "SYS is as good as S3".

    24/7 of 100 Mbit (not to mention 250 on SYS), that would cost $2662/month.

    Why are you doing calculations for 24/7 usage? OP clearly stated that data will be rarely accessed. Even this 250GB/mo may be a stretch. I host around 40TB right now for similar project (judging by info OP provided) and I'm using 50-1TB per month for that exact project, so 2.5% in worst case scenario. I use Hetzner SX162 for that tho. I filled these with other projects so I have around 85% disk usage right now.

    Yeah, that's a good point! And could just keep adding servers to add storage or b/w. Running something like GlusterFS across all of those might also allow for a more efficient replication scheme than all the same data on all of them.

    Yup, distributed filesystem or even simple rcloning will be best choice for this, but only if he has enough data. If he will use 10% of 30TB server then its better to stick to Wasabi, it will be cheaper to host 3TB there than to have 30TB server "just in case".

    You can be sooo flexible then, switch providers, select regions you like, choose 2-3-4 server replication per file or directory... and network capacity doubles/triples/quadruples!

    But it will come with a lot more thinking about infrastructure, because replication will also eat network speed. For example with Hetzner you can add 10Gbit private network cheaply if servers are on the same rack, that may be a good idea.

  • rm_rm_ IPv6 Advocate, Veteran
    edited December 2022

    @AXYZE said: Why are you doing calculations for 24/7 usage?

    Because you said AWS is "multiple times better", which is either supposed to read as "for the same thing" or is just a weird statement such as "oranges are multiple times better than apples". Going with the former, it is not the same thing when you consider what you actually get on both, such as the amount of data transfer included in the price on SYS, how rapidly the cost increases if you start using even a bit more of that on AWS, and the insane price of using anything close to "the same" amount.

Sign In or Register to comment.