Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


HostHatch Amsterdam Node Failure | All Data Lost | Still Reliable? - Page 4
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

HostHatch Amsterdam Node Failure | All Data Lost | Still Reliable?

124

Comments

  • MumblyMumbly Member

    @NoComment said:

    @Samael said:

    @LiliLabs said: Expecting your provider to do detective work on your service that doesn't give them any margin is insulting and it's no wonder why those types of tickets take longer.

    How do you know that all those people (and there's really a lot of them complaining recently) expected from a provider to do detective work for them?
    Just because of you personally have good experience SO FAR (you may change your opinion anytime) this don't mean that other ticket requests are less legit or helpful in description of a problem than yours, so please just stop making up things about other people support tickets ...

    And somehow you can ignore good experiences with hosthatch support simply because some people have been complaining? The reality is the most vocal people tend to be the ones who want to complain about something.

    I am not ignoring them, it's right opposite. Just because he have some good experience this doesn't mean that all other people complainig about their experience with support are wrong.

    Thanked by 1foitin
  • @Samael said: In very short time 3 different locations failed. Not to mention numerous complains all over the forum about unanswered tickets for weeks.

    @SpeedTest said: one failure? they can't fix the network in Chicago for many months. It takes them at least six months to go from ignoring to realizing the problem

    didn't know, just saw the screenshot op providad and didn't see anything else, thanks for correcting m

    also, what program is that on the screenshot? @SpeedTest :open_mouth:

  • DPDP Administrator, The Domain Guy

    @duckeeyuck said: also, what program is that on the screenshot? @SpeedTest :open_mouth:

    https://ping.pe

  • defaultdefault Veteran

    @NoComment said:

    @Samael said:

    @LiliLabs said: Expecting your provider to do detective work on your service that doesn't give them any margin is insulting and it's no wonder why those types of tickets take longer.

    How do you know that all those people (and there's really a lot of them complaining recently) expected from a provider to do detective work for them?
    Just because of you personally have good experience SO FAR (you may change your opinion anytime) this don't mean that other ticket requests are less legit or helpful in description of a problem than yours, so please just stop making up things about other people support tickets ...

    And somehow you can ignore good experiences with hosthatch support simply because some people have been complaining? The reality is the most vocal people tend to be the ones who want to complain about something.

    You have been here long enough to remember how HostSolutions had problems, people complained, and still I thought (just like you do now) how vocal people were too negative. In the end the provider got involucrated, and business died.

    Now with HostHatch we have 3 locations going down in just a couple of months, data lost, low support for quite a few months, people complain, and we blame the "vocal" people again?

    Maybe it's time for this provider to stop focusing on low-end market for a while, focus on repairing and preventing the issues which drive customers to complain (in order to reduce worse online exposure), and then come back with better services and better support.

    Hosthatch had a great reputation on LET. Maybe it's time to take a break and reorganise the business and offers. Everybody needs a break from time to time; we're all humans after all.

    Thanked by 3Mumbly foitin webcraft
  • At the end, if you care about your data you will have multiple redundant copies of the same at multiple places on cloud and otherwise.

    If hardware is gonna fail, it's gonna happen for even name brand providers.

    I agree with the sentiment that users expect providers to fix issues with vague statements (I was doing the same long time ago,) but have learned some skills which makes it easy for myself and provider when I seek help.

    @hosthatch has shown considerable progress with support, but that still needs improvements, and they acknowledge the same

    Just don't expect no failures or issues even if you pay big bucks... just saying again, shut is gonna hit the fan when you least expect

    Thanked by 1bdl
  • NoCommentNoComment Member
    edited June 2022

    @default said:

    @NoComment said:

    @Samael said:

    @LiliLabs said: Expecting your provider to do detective work on your service that doesn't give them any margin is insulting and it's no wonder why those types of tickets take longer.

    How do you know that all those people (and there's really a lot of them complaining recently) expected from a provider to do detective work for them?
    Just because of you personally have good experience SO FAR (you may change your opinion anytime) this don't mean that other ticket requests are less legit or helpful in description of a problem than yours, so please just stop making up things about other people support tickets ...

    And somehow you can ignore good experiences with hosthatch support simply because some people have been complaining? The reality is the most vocal people tend to be the ones who want to complain about something.

    You have been here long enough to remember how HostSolutions had problems, people complained, and still I thought (just like you do now) how vocal people were too negative. In the end the provider got involucrated, and business died.

    So, are you correlating complaints on LET with a business dying?

    @default said: Now with HostHatch we have 3 locations going down in just a couple of months, data lost

    That's 3 instances of raid failure. It could be a bad batch of raid controllers, cheap drives or simply a case of bad luck. Or perhaps their disks are simply too old now. I am not blaming the vocal people, but it's best to have realistic expectations. If you paid much, much more with the big boy providers, you can expect them to use nice enterprise disks and EOL their servers every couple years. This is the kind of thing that simply does not happen in this lowend market. Even with the big boys, you would have needed multiple backups, so obviously you need backups even more in the lowend market.

    Also, it's 3 single servers going down, not 3 entire locations going down. I believe they said it only affected 5% of their storage clients in los angeles, so something like 1 storage server out of 20 servers.

    @default said: low support for quite a few months

    I remember hosthatch as the one provider who's black friday thread gets complaints pretty much every year. They constantly get complaints and they always get the most complaints every year for their support. It's honestly nothing new. There's always some people complaining about how their orders didn't get processed, support didn't reply, or some pre-orders got delayed, or poor experiences with support in general. Despite this, hosthatch seems to get a lot of sales every year.

    @default said: Maybe it's time for this provider to stop focusing on low-end market for a while, focus on repairing and preventing the issues which drive customers to complain (in order to reduce worse online exposure), and then come back with better services and better support.

    I think you're right on this point, but as hosthatch have mentioned they are using smaller disk arrays now so that kinda helps. If the real problem was cheap raid controllers or cheap disks, it's really not something they can fix. That's just how business works in the lowend market.

    Overall, I get where the complainers are coming from. Noone likes data loss or downtime but it happens even with the big boys when you pay 10x (or more) the price. Speedy support happens to be included in that 10x price tag. If you can't afford that 10x price tag but care for your data, setup redundancy with multiple lowend services.

    People who require fast support (or so and so) simply bought the wrong type of product. Let's not pretend the services you get on LET on a budget are really comparable to the big boys.

  • NoCommentNoComment Member
    edited June 2022

    -

  • defaultdefault Veteran
    edited June 2022

    @NoComment said:

    @default said:

    @NoComment said:

    @Samael said:

    @LiliLabs said: Expecting your provider to do detective work on your service that doesn't give them any margin is insulting and it's no wonder why those types of tickets take longer.

    How do you know that all those people (and there's really a lot of them complaining recently) expected from a provider to do detective work for them?
    Just because of you personally have good experience SO FAR (you may change your opinion anytime) this don't mean that other ticket requests are less legit or helpful in description of a problem than yours, so please just stop making up things about other people support tickets ...

    And somehow you can ignore good experiences with hosthatch support simply because some people have been complaining? The reality is the most vocal people tend to be the ones who want to complain about something.

    You have been here long enough to remember how HostSolutions had problems, people complained, and still I thought (just like you do now) how vocal people were too negative. In the end the provider got involucrated, and business died.

    So, are you correlating complaints on LET with a business dying?

    @default said: Now with HostHatch we have 3 locations going down in just a couple of months, data lost

    That's 3 instances of raid failure. It could be a bad batch of raid controllers, cheap drives or simply a case of bad luck. Or perhaps their disks are simply too old now. I am not blaming the vocal people, but it's best to have realistic expectations. If you paid much, much more with the big boy providers, you can expect them to use nice enterprise disks and EOL their servers every couple years. This is the kind of thing that simply does not happen in this lowend market. Even with the big boys, you would have needed multiple backups, so obviously you need backups even more in the lowend market.

    Also, it's 3 single servers going down, not 3 entire locations going down. I believe they said it only affected 5% of their storage clients in los angeles, so something like 1 storage server out of 20 servers.

    @default said: low support for quite a few months

    I remember hosthatch as the one provider who's black friday thread gets complaints pretty much every year. They constantly get complaints and they always get the most complaints every year for their support. It's honestly nothing new. There's always some people complaining about how their orders didn't get processed, support didn't reply, or some pre-orders got delayed, or poor experiences with support in general. Despite this, hosthatch seems to get a lot of sales every year.

    @default said: Maybe it's time for this provider to stop focusing on low-end market for a while, focus on repairing and preventing the issues which drive customers to complain (in order to reduce worse online exposure), and then come back with better services and better support.

    I think you're right on this point, but as hosthatch have mentioned they are using smaller disk arrays now so that kinda helps. If the real problem was cheap raid controllers or cheap disks, it's really not something they can fix. That's just how business works in the lowend market.

    Overall, I get where the complainers are coming from. Noone likes data loss or downtime but it happens even with the big boys when you pay 10x (or more) the price. Speedy support happens to be included in that 10x price tag. If you can't afford that 10x price tag but care for your data, setup redundancy with multiple lowend services.

    People who require fast support (or so and so) simply bought the wrong type of product. Let's not pretend the services you get on LET on a budget are really comparable to the big boys.

    Cheap hardware is great for attracting low-end customers, which in turn are great for online exposure. But relying too much on low-end market is not great either, because they become a pain later when hardware starts to create problems. This aspect, combined with poor or overloaded support, creates a huge mess in the long run. Therefore one can not rely on low-end market for too long; so after a business has launched in hosting industry and created some nice image exposure (or SEO) through low-end communities, one can start focusing on higher plans for a more stable business (just my opinion).

    Surely, people here always want offers and cheapo deals, but that does not always mean providers are forced to give them what they want. Sometimes an extremely limited deal in quantity and duration seems much better, just to let them know the provider is still here and has not forgotten their roots.

  • RocksterRockster Member
    edited June 2022

    @NoComment said: That's 3 instances of raid failure.

    I am not affected by those 3 instances raid failures but as one of the oldest if not the oldest still active hosthach client I am also not most happy with recent HH support although I don't really need their support per se, I would just appreciate if they would work on reported issues based on their own infrastructure failures instead of letting client hanging.
    I hope that soon-new control panel will improve things as it seems like they basically abandoned client issues based on their old control panel or how hard is actually to sync distro templates with control panel or instead of me try to reproduce their network error with images reinstall at certain node ...

  • HostHatch is dogshit, never host in Singapore. Glory to RackNerd, best network and price ever.

  • RocksterRockster Member
    edited June 2022

    @invantamy said:
    HostHatch is dogshit, never host in Singapore. Glory to RackNerd, best network and price ever.

    Joined 2:21PM

    Sure, bro.

    (although yes, RN respond to support tickets and solve issues within minutes. I need to give them that, but that's HH thread and your comment completely unrelated.)

  • @Rockster said:

    @invantamy said:
    HostHatch is dogshit, never host in Singapore. Glory to RackNerd, best network and price ever.

    Joined 2:21PM

    Sure, bro.

    How much does @jsg pay you, you fucking SHILL?

  • Who's JSG and shill for who? HH? You obviously missed my previous post :)

  • dosaidosai Member

    @invantamy said:
    HostHatch is dogshit, never host in Singapore. Glory to RackNerd, best network and price ever.

    Was this @tinyweasel?

    Here is my SG uptime,

    Thanked by 1ariq01
  • @dosai said:

    @invantamy said:
    HostHatch is dogshit, never host in Singapore. Glory to RackNerd, best network and price ever.

    Was this @tinyweasel?

    Here is my SG uptime,

    that uptime doesn't include network, there was some downtime awhile ago

    Thanked by 1AXYZE
  • VoidVoid Member

    @dosai said:

    @invantamy said:
    HostHatch is dogshit, never host in Singapore. Glory to RackNerd, best network and price ever.

    Was this @tinyweasel?

    Yes that was him. Someone talked about him in the other thread and he immediately made another alt and joined.

  • miumiu Member

    @TimboJones said: Is the rebuild time for a 10TB drive the same in a 4 drive vs 8 drive RAID 10 array? It's still writing 10TB in both cases.

    @LiliLabs said:
    Smaller arrays do rebuild faster, yes.

    Apologies: I do not agree with you and your wisdom, your real experiences with HW and raid controllers are probably zero (but still u must comment all), and your claim is not generally applicable and truth for all raid array types and situations:

    RAID5/6 based arrays: YES
    Small array (with less disks) will rebuild faster: @TimboJones is right: For example 12TB failed disk replaced with new one, still will need write the same portion of data = 12TB (independently if such array consist from only 4disks (minimum possible amount for R6) or 24 disks/4U LFF). But hw raid controller need re-calculate lost data from all other hdds and parity drives. And in case of parity raid, rebuild speed limit (narrow throat) is just this re-calculation, not write speed of rebuilded disk. So it should be faster from less disks, than from more disks, recalculate it.

    RAID10: NO, absolutely no
    Is different case that R5/6: Failed disks need only copy data from its mirror, and can write it in full his speed (do not need recalculate his data from all another disks what are members of his array). So 1 RAID10 involved drive will finish rebuild of one failed drive so fast in array consisting from 4 drives like in array consisting from 60 drives (in SAS/SATA expander)

    RAID0: NO
    Time for rebuild in case if one drive failed, will be always exactly the same for only 2 as for 60 disks array = 0.000 seconds :wink:

    Large RAID5/6 based arrays and their striping derivatives with using large number of drives:

    • big and only advantage: u get some redundancy and still lost space only from 1 (R5) or 2 (R6) disks (in each striped group in case of R50/60) Say, u have array from 24 disks in RAID6, then free data capacity is 22 disks size and 2 are for parity used only (for comparison at R10 u lost 50% capacity always and available space will be only capacity of 12 disks)
    • but also great disadvantages: very polite chance that under rebuild process may another disks fail (over its fault-tolerancy) and u will lost all data (exactly this happened HH)
    • (especially this probability still increases how disks are older - that may be not able survive rebuild process and begin die another one by one like domino.
      Also is extremely risky let rebuild such array online (already very bad and slow R5/6 rebuild speed will be several time more degraded, and chance that other disks begin be faulty still several time increased)

    But at RAID10 (also large arrays) is also quite good possible let server online and let do rebuild on running machine. Every better HR raid controller also can use for this hot spare disks and automatically begin rebuild without waiting and lose another time when found any disk offline/faulty (without waiting and intervention of admin/technicians).

    More Raid controllers and more smaller Raid arrays in 1 server/motherboard:
    i tried this on both - older dual Xeon 2650 w 12 LFF in raid 10 and 60, and the same with 12 LFF AMD Ryzen 5950 machines. In both cases at FULL I/O load was result (for me quite uexpected and surprising) that 2 raid controllers began fight between self about resources - few second (10-15) was 1 writing up to 2GB/sec, and second one writing low (100-300MB/sec), then first one drop on 100-200MB/sec and second one began write up to 2 GB/sec, and so around..
    (probably again - limit was not possible performance of raid controller and involved hdds but limit of buses and chipsets (southbridge) what must cooperate with them - narrow throttle of whole system... also i exclude CPU as reason bcs Ryzen had low load at all)
    Another very interesting thing was that also NVMe on Ryzen machine was doing exactly the same at max I/O load and fight with both HDD controllers (dropping its write speed from low to max around)
    So from my experiences i think that have more controllers on 1 system is not happy solution and will degrade peak performance between them together (in moments with max load, I/O load), and 1 controller seems can work alone on higher performances

    Simultaneously, i understand that HH (and similar highly technically skilled providers) does have 1000x more real experiences with HW and raid arrays as me, and i only shared my little experiences in this field, so when i am wrong sure can correct my claims and mistakes. Thanks

    Thanked by 1TimboJones
  • miumiu Member
    edited June 2022

    Maybe where are peak performance arrays and top I/O required is better create 1 or several array on 1 good HW raid controller with caching, than use more controllers for it in 1 system,
    and when is preferred to have less possible points of failures instead high performance/for example just storage nodes (and less probability for all data losses and not to have affected all users (for example data corruption caused from faulty HW controller and erroneous data write), especially for R5/6) better use more controllers with more fully independent arrays.

    So new HH storage nodes model seems to me best possible solution in their purposes (when: very cheap large storage server are required where is necessary due from cheap price use raid60 instead raid10, but at the same time to have lowest possibility and probability that at raid array failures and potential data losses will be affected all users and data)

  • @dosai said:

    @invantamy said:
    HostHatch is dogshit, never host in Singapore. Glory to RackNerd, best network and price ever.

    Was this @tinyweasel?

    Here is my SG uptime,

    I don't know if that's him but he is probably a mjj. I don't think anyone would claim colocrossing has the best network, so he's probably referring to racknerd's LA location with chinese peers (so good network to china).

    @miu said: Maybe where are peak performance arrays and top I/O required is better create 1 or several array on 1 good HW raid controller with caching, than use more controllers for it in 1 system

    I would consider other factors first before making these claims. How old are your raid controllers? How many pcie lanes are they really getting? What pcie verison? Maybe it's not a big deal for HDDs but for your NVMe it would matter. Also, this becomes slightly more sketchy for the ryzens, because if you don't check your mobo properly maybe you're unable to utilize as many lanes as you think you are utilizing.

    Also, maybe it is worth it to consider software raid for raid 1/10

    Thanked by 1miu
  • TeoMTeoM Member

    @SpeedTest said:

    @duckeeyuck said: very unrealiable lately because of one failure?

    one failure? they can't fix the network in Chicago for many months. It takes them at least six months to go from ignoring to realizing the problem

    this is how it looks right now

    Can you tell me the script name ?

  • xTomxTom Member, Patron Provider

    RAID is not backup, you always need to backup your important data.

  • @miu said:

    @TimboJones said: Is the rebuild time for a 10TB drive the same in a 4 drive vs 8 drive RAID 10 array? It's still writing 10TB in both cases.

    @LiliLabs said:
    Smaller arrays do rebuild faster, yes.

    Apologies: I do not agree with you and your wisdom, your real experiences with HW and raid controllers are probably zero (but still u must comment all), and your claim is not generally applicable and truth for all raid array types and situations:

    >
    Congrats! HH uses Raid 60 and Raid 6 on their machines, which is clearly what I was referring to since we are on a thread talking about Hosthatch's drive failure. Good writeup though, fantastic use of time there :) As for experience, I manage hundreds of petabytes of storage, but to be fair a lot of it is under ZFS and is not done with RAID on the controller. Not quite zero experience, but also not running insane amounts of storage off of hardware RAID. I still think I run enough arrays on HW backed raid (2 arrays totaling around 160TB) to have a say in the matter though.

    Thanked by 3miu skorous ariq01
  • miumiu Member
    edited June 2022

    @LiliLabs said: I manage hundreds of petabytes of storage

    Sounds like, u bought Google?.. :-O Then, congrat to this, great purchase!

  • @miu said:

    @LiliLabs said: I manage hundreds of petabytes of storage

    Sounds like, u bought Google?.. :-O Then, congrat to this, great purchase!

    Oops, typo on my end, meant terabytes. Sorry about that!

    Thanked by 1miu
  • jsgjsg Member, Resident Benchmarker

    @miu said:
    RAID5/6 based arrays: YES
    Small array (with less disks) will rebuild faster: ... For example 12TB failed disk replaced with new one, still will need write the same portion of data = 12TB (independently if such array consist from only 4disks (minimum possible amount for R6) or 24 disks/4U LFF). But hw raid controller need re-calculate lost data from all other hdds and parity drives. And in case of parity raid, rebuild speed limit (narrow throat) is just this re-calculation, not write speed of rebuilded disk. So it should be faster from less disks, than from more disks, recalculate it.

    Sorry, no. For one the redundancy data is not on one or two drives but spread over all drives, so to rebuild one drive in a Raid 5/6, all drives need to be read over and over again. Second, no, the calculation is not the bottleneck, not a major one anyway. Both XOR and in the case of Raid 6, Galois fields are very fast operations - as opposed to writing to a disk (even SSDs).

    RAID10: NO, absolutely no
    Is different case that R5/6: Failed disks need only copy data from its mirror, and can write it in full his speed (do not need recalculate his data from all another disks what are members of his array). So 1 RAID10 involved drive will finish rebuild of one failed drive so fast in array consisting from 4 drives like in array consisting from 60 drives (in SAS/SATA expander)

    Yes and no, depends on the controlling algorithm but at least for typical home use (software) arrays you are right. But again, even if there were a Raid5/6 like calculation to do that wouldn't change much.

    Large RAID5/6 based arrays and their striping derivatives with using large number of drives:

    • big and only advantage: u get some redundancy and still lost space only from 1 (R5) or 2 (R6) disks (in each striped group in case of R50/60) Say, u have array from 24 disks in RAID6, then free data capacity is 22 disks size and 2 are for parity used only (for comparison at R10 u lost 50% capacity always and available space will be only capacity of 12 disks)
    • but also great disadvantages: very polite chance that under rebuild process may another disks fail (over its fault-tolerancy) and u will lost all data (exactly this happened HH)
    • (especially this probability still increases how disks are older - that may be not able survive rebuild process and begin die another one by one like domino.
      Also is extremely risky let rebuild such array online (already very bad and slow R5/6 rebuild speed will be several time more degraded, and chance that other disks begin be faulty still several time increased)

    Again yes and no. Almost no (as in very low probability) wrt other disks then being stressed and failing. I take that to be largely a horror story based on the past (as in "decades ago"). Nowadays better (hardware) controllers collect the redundancy info to be written to the other drives so as to minimize the risk; and they can do that because they have enough memory.
    But also yes because more disks (in an array) also means less resilvering stress ~ lower risk of a second drive failing.

    But at RAID10 (also large arrays) is also quite good possible let server online and let do rebuild on running machine. Every better HR raid controller also can use for this hot spare disks and automatically begin rebuild without waiting and lose another time when found any disk offline/faulty (without waiting and intervention of admin/technicians).

    Not really; it's always preferable to take the array offline for resilvering, no matter which Raid level.

    Thanked by 3bdl skorous bulbasaur
  • miumiu Member
    edited June 2022

    @jsg said:

    @miu said:
    RAID5/6 based arrays: YES
    Small array (with less disks) will rebuild faster: ... For example 12TB failed disk replaced with new one, still will need write the same portion of data = 12TB (independently if such array consist from only 4disks (minimum possible amount for R6) or 24 disks/4U LFF). But hw raid controller need re-calculate lost data from all other hdds and parity drives. And in case of parity raid, rebuild speed limit (narrow throat) is just this re-calculation, not write speed of rebuilded disk. So it should be faster from less disks, than from more disks, recalculate it.

    Sorry, no. For one the redundancy data is not on one or two drives but spread over all drives, so to rebuild one drive in a Raid 5/6, all drives need to be read over and over again.

    Hello, thanks for reaction and correction, appreciated. In this point i wrote the same - that all disk must be read in array at rebuild (incl. parity disk(s)): "But hw raid controller need re-calculate lost data from all other hdds and parity drives." = need read all disks from affected array

    But at RAID10 (also large arrays) is also quite good possible let server online and let do rebuild on running machine. Every better HR raid controller also can use for this hot spare disks and automatically begin rebuild without waiting and lose another time when found any disk offline/faulty (without waiting and intervention of admin/technicians).

    Not really; it's always preferable to take the array offline for resilvering, no matter which Raid level.

    i know providers who are using hot spares and have preconfigured automatic array rebuild to start immediately when controller found any disk offline (without technician intervention needed), also when controller's manufacturers and engineers include (as standard in each better and more expensive raid-controller) options for adding hot spares and set auto-start for rebuild, this must have some sane reasons and sense = they believe that this can work and be successfully used. Additionally, when are in raid 10 array many disks used (say 12, 24 etc..), then degraded performance in whole array is only partially (on different from parity raids) - just only for data that were placed on disk that failed, its mirror disk and its another 2 self-mirroring disks pair in his opposite striping group(side): if are 12 drives in r10 array and 1 disk fails, 8 remaining disks are not affected ever and still will be able work whole time (at rebuild) with full performance.

    For example one provider (who i personally consider for one of the best ever) - LiteServer used hot spares and auto rebuild started immediately by controller for RAID10 arrays. So still i assume, on R10 nodes that is not extremely oversold and HW is not very outdated is not problem use hot spares and let controller initiate auto rebuild immediately when faulty disk occurs and has been detected, and is not necessary to make for this purpose server offline (for parity raids 5/6 it is insane bcs rebuild time would be extremely prolonged and chance of failures of another disks and total data loss too many times increased)

  • miumiu Member

    Or in case when u r extremely worried - for most uses (aka shared hosting etc.) u can config for example 3 smaller arrays consisting from 4 hdds and split them into 3 VMs (each VM will use just 1 array), then u will be able stop VM what uses as storage just degraded array and allow affected array focus only for its rebuild, but too let another arrays and their VMs working without interruption and whole server/node online all time

  • adlyadly Veteran

    @miu said:
    Or in case when u r extremely worried - for most uses (aka shared hosting etc.) u can config for example 3 smaller arrays consisting from 4 hdds and split them into 3 VMs (each VM will use just 1 array), then u will be able stop VM what uses as storage just degraded array and allow affected array focus only for its rebuild, but too let another arrays and their VMs working without interruption and whole server/node online all time

    There’s a whole \o/. Maybe you can fill it. The Europeans can fill it. Perhaps with 100x Asian peens we can fill it one time?

  • @LiliLabs said:

    @TimboJones said:

    @hosthatch said:
    And on Storage nodes, we have mitigated this issue (or lowered the risk by a huge factor) by using multiple RAID cards and multiple RAID arrays per node, instead of single large arrays.

    LowEndStaticians, is this true?

    Adding more hardware RAID cards increases the failure rate and reduces MTBF. It sounds like decreasing the array size is intended to speed up rebuild times, but is that actually true? Is the rebuild time for a 10TB drive the same in a 4 drive vs 8 drive RAID 10 array? It's still writing 10TB in both cases.

    Or is this halving the data loss rate by reducing lost data by 50% when failures occur?

    I thought risk usually goes down as disk count is increased (probability or something spread over more units?)? I never took statistics in high school and regret that.

    Smaller arrays do rebuild faster, yes.

    Why? https://www.memset.com/support/resources/raid-calculator/ had same rebuild time in 4 drive array as 8 in a quick check. I'm sure there's tons of scenarios, though.

    Smaller redundant arrays also give them a better chance of recovering from inevitable drive failures. Risk goes up as disk count is increased because the likelihood of one drive failing is always the same. This is the reason why RAID 0 actually gives you negative redundancy, since the chance of one drive failing is now multiplied by however many drives are in the array.

    And the equivalent to your RAID 0 example is adding double the amount of raid cards that fails, not expected disk failure, but weird data loss failures.

    Multiple small arrays limit your exposure to one drive failing though, instead of rebuilding a 30 drive array you might just have to rebuild a 5 drive array.

    Rebuild means writing 100% to the new disk. How is that different in 5 disk or 30 disk array?

    So with more arrays, does the odds of failure events go up or down? I think failure rate goes up and data loss rate goes down.

    Again, HH is acting on pretty good faith here, no need to jump down their throats :)

    I didn't. I gave my technical understanding, correct or not. Also, at this point of their failure analysis, I'm not prepared to give them the benefit of doubt.

  • @miu said:
    For example one provider (who i personally consider for one of the best ever) - LiteServer used hot spares and auto rebuild started immediately by controller for RAID10 arrays. So still i assume, on R10 nodes that is not extremely oversold and HW is not very outdated is not problem use hot spares and let controller initiate auto rebuild immediately when faulty disk occurs and has been detected, and is not necessary to make for this purpose server offline (for parity raids 5/6 it is insane bcs rebuild time would be extremely prolonged and chance of failures of another disks and total data loss too many times increased)

    The issue for hot-spares for providers, other than economics of having spare sitting idle for long time, is either lack of hot-swap chassis slot or lack of RAID ports. With 8 ports, you might go with 6 drives and a hot spare, wasting a RAID slot and likely paying for bigger chassis for 8 drives vs 4-6.

    I would doubt they have hot-spares available on small arrays and likely not even on the bigger ones.

Sign In or Register to comment.