[MR] What are your expectations regarding Storage VMs

angstrom · June 2017

@willie said:

gisadik said:

LMK what about the SpeedyKVM TOS you don't like and I'll try to make it less assholish.

Just read it from the point of view of a low drama customer looking for hosts who are easy to deal with, and see how you think it comes across.

@gisadik: Your TOS does sound rather militaristic. By the way, regarding:

Account Email Change or Account Ownership Change Fee

To change the email, name, or owner of an account will incur a $75 account change fee.

$75 to change the email address of the account? Is this standard practice? (Name or owner, I get, though in the case of a legal name change of a customer, $75 seems harsh.)

Nekki · June 2017

@gisadik said:

LMK what about the SpeedyKVM TOS you don't like and I'll try to make it less assholish.

Don't ever change. The patented Incero brand of no-nonsense complaint and abuse handling is a delight.

angstrom · June 2017

@gisadik said: LMK what about the SpeedyKVM TOS you don't like and I'll try to make it less assholish.

Here's a little exercise if you're interested. Do a side-by-side reading of your TOS + AUP ( https://speedykvm.com/tos.html ) and RamNode's TOS ( https://clientarea.ramnode.com/tos.php ) + AUP ( https://clientarea.ramnode.com/aup.php ). The tone is definitely different (sometimes the content as well).

overclock · June 2017

Stable,

no VEHICLE ROLL OVER.

bsdguy · June 2017

@willie said:

bsdguy said: R6 is nice in adding yet more resilience but is a pig when writing due to rather processor expensive Galois Field based calculations (while R5 gets away with xor).

I wonder how bad that is. The GF product is a few table lookups plus the xor, with the tables small enough to all fit in L1 cache. Ceph and the like do network operations in addition to that, and they work ok. I'm generally skeptical of R5 and would want to use R6 when possible.

It's bad enough to need ASICs or at the very least FPGAs. Keep in mind that math is an ideal realm but disks are not. You, want, for instance reasonable stripe sizes which quickly make your GF calculations very burdensome on the cpu (hence you want it done in ASICs).

As for R5 and R6: It's funny to see how well marketing works. Even most seasoned admins believe that R6 is much more safe than R5. Well, I have bad news: It is not.

The problem beneath that is that we actually have different variables. One is the beauty of having 2 syndromes (typ. plain xor and xor over GF) which promises to afford losing even 2 drives. The other one, however, is that the basic risc calculation still holds, namely "more disks -> higher risk of n disks failing".

If you look at the equations you'll find that the failure calculation for R5 and R6 have a major part in common with the only difference being that R6 has a higher power factor (for the same amount of net disks/payload disks). So, for instance for 4 payload disks R5 add 1 disks (power factor 5) while R6 adds 2 (power factor 6).

And, in fact, looking only at the R5 part of the equation, we find that failure rate for R5 is lower for the a.m. reason (1 - r to the nth power is lower with increasing n). But there is the other part of the equation, the R6 only part, which may (or not, depending on the values) make up for the loss. That part basically comes down to a term we already know from the R5 equation part - but times some factor that is depending on the number of disks!
That is the funny part (n/2 * r² ...), that is where the music plays and where R6 gains over R5 - or not, depending on the number of disks.

Now, keep in mind that our context here is storage VPS, i.e. lots and lots of reasonable safe storage at reasonable costs.

All in all, it will usually be considerably less expensive both in $ and in computing power to have 2 R5 rather than 1 R6 - and - offer very similar (low) failure rates. As a positive side effect you gain better throughput in a very heterogenous context like VPS nodes. Note: R5 and R6 failure rates are typically very similar and within 1%.

There is btw. yet another factor that is often overlooked: hot spares. If you have a hot spare (and a decent controller) you get with R5 all but de facto R6 quality with very simple means and at lower cost.

raindog308 · June 2017

bsdguy said: Keep in mind that math is an ideal realm but disks are not.

That summed up my entire sysadmin career.

willie · June 2017

bsdguy said: "more disks -> higher risk of n disks failing".

That looks to be outweighed by the higher redundancy.

N discs, each has a probability of failure p on a given day. So probability of 2 failures = p2 = N(N-1)p**2. With Raid-5, 2 failures means you lose all your data.

Probability of 3 failures out of N+1 discs = p3 = (N+1)N(N-1)*p**3. That's what has to happen to raid-6 array to be lost.

So with p=0.001, N=5, p2 = 2e-5, and p3=1.2e-7.

With N=10, p2=9e-5, p3=9.9e-7.

Raid-6 with 2 ECC drives looks way better. And of course you can do raid-6 with 3 or more ECC drives.

deadbeef · June 2017

@Nekki said:
The state of the cunts who want 1TB or more for less than €5.

$191.80 / 36 ~= $5.33 ~= 4.73E/month for 2TB.

Which makes it ~2.37 Euro per month per TB (with KVM, 5GB RAM and 2x CPU, 10GB Xfer, 4Gbps, RAID 60).

Add that we all know that @SpeedyKVM is a very decent host.

So, 5E/TB sounds like double price, not dirt cheap.

/cc @AnthonySmith

Nekki · June 2017

deadbeef said:

$191.80 / 36 ~= $5.33 ~= 4.73E/month for 2TB.

Which makes it ~2.37 Euro per month per TB (with KVM, 5GB RAM and 2x CPU, 10GB Xfer, 4Gbps, RAID 60).

Add that we all know that @SpeedyKVM is a very decent host.

You know as well as I do that those responding to the survey aren't prepared to pay up three years in advance to get the pricing they're asking for.

deadbeef · June 2017

@Nekki said:

deadbeef said:

$191.80 / 36 ~= $5.33 ~= 4.73E/month for 2TB.

Which makes it ~2.37 Euro per month per TB (with KVM, 5GB RAM and 2x CPU, 10GB Xfer, 4Gbps, RAID 60).

Add that we all know that @SpeedyKVM is a very decent host.

You know as well as I do that those responding to the survey aren't prepared to pay up three years in advance for get the pricing they're asking for.

Fair point

AnthonySmith · June 2017

deadbeef said: /cc @AnthonySmith

Nice calculations so on that basis with licenses and fee's, he only needs to fill around 28 nodes to make minimum wage for 1 person. not a business model I would like to pursue personally

Francisco · June 2017

@AnthonySmith said:

deadbeef said: /cc @AnthonySmith

Nice calculations so on that basis with licenses and fee's, he only needs to fill around 28 nodes to make minimum wage for 1 person. not a business model I would like to pursue personally

I'll take "How to become VortexNode" for 100, Alex.

Francisco

deadbeef · June 2017

@AnthonySmith said:

deadbeef said: /cc @AnthonySmith

Nice calculations so on that basis with licenses and fee's, he only needs to fill around 28 nodes to make minimum wage for 1 person. not a business model I would like to pursue personally

I totally get your point and I'm not saying you're wrong in any way in your business strategy.

My point is that sometimes there exist edge cases that combine amazing value without significant "summerhost" danger. The whole fun in LET is getting them

For example in this case, if I remember correctly, they had these very few nodes paid-out from a project that had ended and wanted to fill them asap, hence they did a special.

willie · June 2017

AnthonySmith said: Nice calculations so on that basis with licenses and fee's, he only needs to fill around 28 nodes to make minimum wage for 1 person.

Well he says he's using disks that were upgraded from other users' dedis, so they're already at least partly paid down; on the other hand, smaller disks = more of them, so more nodes.

New raw disk space these days is (round off to) $36/TB or $1/month when spread over 3 years. If you can sell it for $3/month while getting the person to pay the full 3 years up front, you're getting $2/month on a hopefully low activity, midrange VPS not counting the disk. This seems kind of doable, (added:) especially with the capital outlay already mostly covered by the 3 year payment.

Storage plans (non-vps) at Hetzner and Online are around 5 euro/month/TB as regular products and I like to think they're making money at those prices.

AnthonySmith · June 2017

Fair enough, the worry now is that some 15 year old (physically or mentally) reads this post and thinks wow I can make an extra $300 p/month pocket money, they do it for 6 - 9 months and are super happy because they now have the best scooter on the block.

Meanwhile that price point becomes the norm and people start expecting actual stable business to compete with it, sadly we just can't compete with pocket money hosts.

You are right though, there are edge cases like this which I am sure are completely stable and reliable, just perhaps not repeatable long term.

I'm almost 37 now, i can't help getting more grumpy and cynical with age, its genetic.

bsdguy · June 2017

@willie said:

bsdguy said: "more disks -> higher risk of n disks failing".

That looks to be outweighed by the higher redundancy.

N discs, each has a probability of failure p on a given day. So probability of 2 failures = p2 = N(N-1)p**2. With Raid-5, 2 failures means you lose all your data.

Probability of 3 failures out of N+1 discs = p3 = (N+1)N(N-1)*p**3. That's what has to happen to raid-6 array to be lost.

So with p=0.001, N=5, p2 = 2e-5, and p3=1.2e-7.

With N=10, p2=9e-5, p3=9.9e-7.

Raid-6 with 2 ECC drives looks way better. And of course you can do raid-6 with 3 or more ECC drives.

Not really. R5 or 6 addresses the question what happens if any disk fails. However, there is another question first, namely whether (and with what probability) a disk fails.
That question is completely independent of R5 or R6. Btw. those calculations are typically (and reasonably) done per year or over the life cycle (typ. 30 - 36 months) and not for days.

The other issue is that while the failure rate tends towards 0 with more elaborate - and more expensive in every regard - raid it doesn't become 0, so there is always a risk remaining (which is why we need backup, no matter what raid we use).

We can have many funny discussion but at the end of the day R6 is not considerably more safe/resilient than R5 in the usual scenarios (4-12 disks in a group) and with typical disk failure rates. In fact, using good disks increases overall resilience by far more than R6 vs R5.

And again, while raid can help to mitigate the problem, the failure rate of the disks exists independently and increases with more disks (and, of course, over time).

The error you and many make is to assume that the risk of 3 disks failing at roughly the same time is much smaller than 2 disks failing at roughly the same time. That seems to be true (and theoretically is) but (practically) is not because the disks failure risk is distributed over the (reasonably) expected life time - unless there is an extra factor, such as a major power spike, which, however, then tends to fry all disks.

In the end it comes down to the fact that neither R5 nor R6 offer 100% resilience and how much one is willing to spend to minimize the risk from, say 4.5% to 3.8%; well noted at rapidly increasing cost.

On the other hand one can with modest effort buy reliable disks for little or no extra cost in the first place, which can make a difference that by far outweighs the difference between R5 and R6. Plus, there are other levers to use; good smart monitoring and replacing worn out disks is but one example.

Whatever, I merely did the math and kept concrete experience in mind. While I like to spoil marketing lies I'm not interested in evangelizing, so if you prefer R6 I certainly won't try to draw you away.

willie · June 2017

bsdguy said: R6 is not considerably more safe/resilient than R5 in the usual scenarios (4-12 disks in a group) and with typical disk failure rates. In fact, using good disks increases overall resilience by far more than R6 vs R5.

Have you got any actual observable numbers to back this up?

bsdguy · June 2017

For R6 vs R5 we have the math. Running Prolog over the tight domain gives clear answers.

As for the disk reliability there are some studies out there. One better known one is from blaze something (sorry, forgot the name). They clearly show that within a given class, say, 2TB - 3TB disks, there are very considerable differences (> 100%).

One caveat, though: Usually they are based on over-1-year numbers which can be misleading as the failure rate obviously grows over time, i.e. a new drive has a lower failure risk than the same model after 2 years continuous usage.

lurch · June 2017

I've been using zxhost for a couple of years and it has served me well with good communication from Ashley if things do go wrong. It's not the cheapest but prefer to pay bit more for peace of mind.

Amitz · June 2017

@AnthonySmith said:
I'm almost 37 now, i can't help getting more grumpy and cynical with age, its genetic.

We either share the same genetics or this is quite a normal development for everyone with a brain watching this world for more than 30 years.

AnthonySmith · June 2017

Amitz said: We either share the same genetics or this is quite a normal development for everyone with a brain watching this world for more than 30 years.

Probably a bit of both.

willie · June 2017

bsdguy said: As for the disk reliability there are some studies out there. One better known one is from blaze something (sorry, forgot the name). They clearly show that within a given class, say, 2TB - 3TB disks, there are very considerable differences (> 100%).

You're talking about Backblaze disk reports, which are eagerly read by storage nerds. They found that certain models of HDD made during the Thai flooding area absolutely sucked, and there are significant but smaller failure rate differences among other makes and models.

They don't use raid 5. Sure there's higher and lower probabilities of a drive failing, but it's always less likely that two drives will fail simultaneously, than just one drive failing.

Has anyone here ever heard of a raid-6 storage array being lost due to multiple drive failures, especially failure of more than 2 drives? I've heard of some lost from controller failures, but that's different. Raid-5 arrays have been lost from double failures often enough that raid-6 had to be invented.

Online.net's enterprise C14 product spreads data across something like 50 drives with 20 of them redundant.

bsdguy · June 2017

there is more and more recent data available. The situation was evidently not limited to the time of the flooding.
It's irrelevant whether they use Raid as the drive failure rate is independent of Raid.
You just love R6? Great, no problem with me. Use R6 to your liking.
My interest, however, isn't in favouring or disliking any Raid type but rather to make a well informed decision based on data and math and to find the best solution for any given situation. Sometimes that will be R6 but more often it will not.

The math is clear. The data strongly suggest that choosing a good disk model does by far outweigh R6 vs R5.

So, let's look at the math and data:

Using r = 0.01, array failure rate (in %) is
3 disks - Raid 5: 0.029799999999997884, Raid 6: 0.014949999999997883
4 disks - Raid 5: 0.059203000000005584, Raid 6: 0.03960100000000558
5 disks - Raid 5: 0.09801496000000909, Raid 6: 0.07375748500000909
6 disks - Raid 5: 0.14604476049999982, Raid 6: 0.1172268801999998
7 disks - Raid 5: 0.2031041634940084, Raid 6: 0.1698195117475084
8 disks - Raid 5: 0.26900777395207354, Raid 6: 0.23134856797603354
9 disks - Raid 5: 0.3435730017846375, Raid 6: 0.30163006112882296
10 disks - Raid 5: 0.42662002428315027, Raid 6: 0.38048278956175424
11 disks - Raid 5: 0.5179717490315137, Raid 6: 0.4677283004199133
12 disks - Raid 5: 0.6174537772824812, Raid 6: 0.5631908527819529
Using r = 0.015, array failure rate (in %) is
3 disks - Raid 5: 0.06682500000000924, Raid 6: 0.033581250000009236
4 disks - Raid 5: 0.13231518750000185, Raid 6: 0.08865506250000185
5 disks - Raid 5: 0.21832563375000918, Raid 6: 0.1645691048437592
6 disks - Raid 5: 0.32422599569532085, Raid 6: 0.26068577852813335
7 disks - Raid 5: 0.4494002235146857, Raid 6: 0.37638192395339276
8 disks - Raid 5: 0.5932462736504301, Raid 6: 0.5110485307157175
9 disks - Raid 5: 0.7551758272318174, Raid 6: 0.6640904533422889
10 disks - Raid 5: 0.9346140137941855, Raid 6: 0.8349261323706461
11 disks - Raid 5: 1.130999140198563, Raid 6: 1.0229873206761582
12 disks - Raid 5: 1.3437824246576968, Raid 6: 1.2277188149527125
Using r = 0.02, array failure rate (in %) is
3 disks - Raid 5: 0.11840000000000878, Raid 6: 0.059600000000008764
4 disks - Raid 5: 0.23364800000001268, Raid 6: 0.1568160000000127
5 disks - Raid 5: 0.38423872000000886, Raid 6: 0.29011952000000885
6 disks - Raid 5: 0.5687123520000101, Raid 6: 0.45802817280001007
7 disks - Raid 5: 0.7856533432320081, Raid 6: 0.6591044316800082
8 disks - Raid 5: 1.033689209873931, Raid 6: 0.8919544289356908
9 disks - Raid 5: 1.3114893805128813, Raid 6: 1.1552267845284716
10 disks - Raid 5: 1.6177640686423356, Raid 6: 1.4476114641259785
11 disks - Raid 5: 1.951263173494386, Raid 6: 1.767838665825753
12 disks - Raid 5: 2.310775208524915, Raid 6: 2.1146777348719037
Using r = 0.025, array failure rate (in %) is
3 disks - Raid 5: 0.18437500000000745, Raid 6: 0.09296875000000743
4 disks - Raid 5: 0.36261718750001254, Raid 6: 0.24378906250001253
5 disks - Raid 5: 0.5943320312500124, Raid 6: 0.44951025390626254
6 disks - Raid 5: 0.8767344970703222, Raid 6: 0.7072930175781348
7 disks - Raid 5: 1.2071453820800881, Raid 6: 1.014405699157725
8 disks - Raid 5: 1.5829877637786982, Raid 6: 1.3682206885223505
9 disks - Raid 5: 2.0017835605285805, Raid 6: 1.7662109248567741
10 disks - Raid 5: 2.461150200088602, Raid 6: 2.205946511444145
11 disks - Raid 5: 2.9587973929453004, Raid 6: 2.6850914368741203
12 disks - Raid 5: 3.492524007284109, Raid 6: 3.201400399462945
Using r = 0.03, array failure rate (in %) is
3 disks - Raid 5: 0.2646000000000051, Raid 6: 0.1336500000000051
4 disks - Raid 5: 0.5186430000000075, Raid 6: 0.3492810000000075
5 disks - Raid 5: 0.8472052800000102, Raid 6: 0.6418538550000102
6 disks - Raid 5: 1.245587044500021, Raid 6: 1.006557985800021
7 disks - Raid 5: 1.709303418378025, Raid 6: 1.4388022002825247
8 disks - Raid 5: 2.234075781483294, Raid 6: 1.934205859708854
9 disks - Raid 5: 2.815823429725703, Raid 6: 2.4885903775893454
10 disks - Raid 5: 3.450655550870241, Raid 6: 3.0979710391232778
11 disks - Raid 5: 4.134863503659353, Raid 6: 3.758549129625343
12 disks - Raid 5: 4.864913389285341, Raid 6: 4.466704360762081
Using r = 0.035, array failure rate (in %) is
3 disks - Raid 5: 0.35892500000000993, Raid 6: 0.18160625000000993
4 disks - Raid 5: 0.7011501875000103, Raid 6: 0.47300006250001037
5 disks - Raid 5: 1.1414799287500137, Raid 6: 0.8662738404687638
6 disks - Raid 5: 1.6726276791328294, Raid 6: 1.3539390289031419
7 disks - Raid 5: 2.287696774076131, Raid 6: 1.9289064686925415
8 disks - Raid 5: 2.9801620634664627, Raid 6: 2.584467612386275
9 disks - Raid 5: 3.7438523540512274, Raid 6: 3.3142765655972988
10 disks - Raid 5: 4.572933625767312, Raid 6: 4.112332919258376
11 disks - Raid 5: 5.461892989329553, Raid 6: 4.97296533937032
12 disks - Raid 5: 6.405523353750886, Raid 6: 5.890815882248347
Using r = 0.04, array failure rate (in %) is
3 disks - Raid 5: 0.46720000000001344, Raid 6: 0.23680000000001344
4 disks - Raid 5: 0.9095680000000161, Raid 6: 0.6146560000000162
5 disks - Raid 5: 1.475799040000017, Raid 6: 1.1219046400000172
6 disks - Raid 5: 2.1552762880000187, Raid 6: 1.7475899392000185
7 disks - Raid 5: 2.9380340776960194, Raid 6: 2.4814253670400195
8 disks - Raid 5: 3.8147228021555484, Raid 6: 3.3137578167501083
9 disks - Raid 5: 4.7765755741339895, Raid 6: 4.235533389896115
10 disks - Raid 5: 5.8153765678707225, Raid 6: 5.238264904683656
11 disks - Raid 5: 6.923430961189886, Raid 6: 6.3140010448643435
12 disks - Raid 5: 8.093536400534934, Raid 6: 7.455297069983094
Using r = 0.045, array failure rate (in %) is
3 disks - Raid 5: 0.5892750000000155, Raid 6: 0.2991937500000155
4 disks - Raid 5: 1.143330187500019, Raid 6: 0.773960062500019
5 disks - Raid 5: 1.8488271262500133, Raid 6: 1.4078915395312634
6 disks - Raid 5: 2.691014096882835, Raid 6: 2.1857019145031478
7 disks - Raid 5: 3.6561603652280397, Raid 6: 3.0931583753600043
8 disks - Raid 5: 4.7314941658759935, Raid 6: 4.117017708362881
9 disks - Raid 5: 5.905144199726042, Raid 6: 5.244966055685392
10 disks - Raid 5: 7.166084454843691, Raid 6: 6.465562090889446
11 disks - Raid 5: 8.504082169996302, Raid 6: 7.768183426662367
12 disks - Raid 5: 9.909648769764123, Raid 6: 9.142976078981677
Using r = 0.05, array failure rate (in %) is
3 disks - Raid 5: 0.7250000000000117, Raid 6: 0.3687500000000117
4 disks - Raid 5: 1.4018750000000135, Raid 6: 0.9506250000000135
5 disks - Raid 5: 2.2592500000000264, Raid 6: 1.7233906250000264
6 disks - Raid 5: 3.2773828125000266, Raid 6: 2.6665031250000264
7 disks - Raid 5: 4.438054218750037, Raid 6: 3.760995898437537
8 disks - Raid 5: 5.724465027343784, Raid 6: 4.989373136718784
9 disks - Raid 5: 7.121139619531286, Raid 6: 6.335510161425817
10 disks - Raid 5: 8.613835589931679, Raid 6: 7.7845600508203505
11 disks - Raid 5: 10.189459114243204, Raid 6: 9.322866175871866
12 disks - Raid 5: 11.835985697148754, Raid 6: 10.937880288291186
Using r = 0.055, array failure rate (in %) is
3 disks - Raid 5: 0.8742250000000146, Raid 6: 0.44543125000001454
4 disks - Raid 5: 1.684645187500014, Raid 6: 1.1443650625000141
5 disks - Raid 5: 2.705774623750021, Raid 6: 2.0675687260937705
6 disks - Raid 5: 3.9119837703203375, Raid 6: 3.18825828237815
7 disks - Raid 5: 5.27982494253108, Raid 6: 4.481917592074819
8 disks - Raid 5: 6.787869834893417, Raid 6: 5.9261298964006555
9 disks - Raid 5: 8.416558318644752, Raid 6: 7.500421046534633
10 disks - Raid 5: 10.148057762932877, Raid 6: 9.186113627217253
11 disks - Raid 5: 11.966132179435402, Raid 6: 10.96619125035901
12 disks - Raid 5: 13.856020535389785, Raid 6: 12.825172341232852
Using r = 0.06, array failure rate (in %) is
3 disks - Raid 5: 1.0368000000000128, Raid 6: 0.5292000000000128
4 disks - Raid 5: 1.9910880000000213, Raid 6: 1.3548960000000214
5 disks - Raid 5: 3.1871289600000297, Raid 6: 2.4396033600000298
6 disks - Raid 5: 4.592477088000041, Raid 6: 3.749268211200041
7 disks - Raid 5: 6.177709776384043, Raid 6: 5.252990708160044
8 disks - Raid 5: 7.916181624645163, Raid 6: 6.922769139924524
9 disks - Raid 5: 9.783797095919965, Raid 6: 8.733263393327888
10 disks - Raid 5: 11.758800456793079, Raid 6: 10.661576367419132
11 disks - Raid 5: 13.821581744816102, Raid 6: 12.687052036403443
12 disks - Raid 5: 15.954497596631917, Raid 6: 14.791088950186934
Using r = 0.065, array failure rate (in %) is
3 disks - Raid 5: 1.2125749999999769, Raid 6: 0.6200187499999767
4 disks - Raid 5: 2.3206551874999812, Raid 6: 1.5819350624999813
5 disks - Raid 5: 3.70206182124998, Raid 6: 2.83868267515623
6 disks - Raid 5: 5.316580824445288, Raid 6: 4.347869422528101
7 disks - Raid 5: 7.128071146030418, Raid 6: 6.0713684584390855
8 disks - Raid 5: 9.104105171826198, Raid 6: 7.974942871371462
9 disks - Raid 5: 11.215638673676558, Raid 6: 10.027901078885732
10 disks - Raid 5: 13.436707975935397, Raid 6: 12.202780585791592
11 disks - Raid 5: 15.744152195504313, Raid 6: 14.47505787474141
12 disks - Raid 5: 18.117358575330943, Raid 6: 16.822882368152783
Using r = 0.07, array failure rate (in %) is
3 disks - Raid 5: 1.4014000000000137, Raid 6: 0.7178500000000138
4 disks - Raid 5: 2.6728030000000236, Raid 6: 1.8252010000000234
5 disks - Raid 5: 4.249342720000026, Raid 6: 3.264005395000027
6 disks - Raid 5: 6.082070144500035, Raid 6: 4.982433689800035
7 disks - Raid 5: 8.12739395024204, Raid 6: 6.934288396892541
8 disks - Raid 5: 10.346570279472111, Raid 6: 9.078469519912073
9 disks - Raid 5: 12.705237692253801, Raid 6: 11.37848727256411
10 disks - Raid 5: 15.172993472876634, Raid 6: 13.802018039197286
11 disks - Raid 5: 17.723007779520206, Raid 6: 16.320499910866232
12 disks - Raid 5: 20.331672415216612, Raid 6: 18.90876443210949

What we learn is:

the more disks in a raid array, the higher the failure rate. Raid can deal with failing disks but it can't change the physics of disk drives.
the other major factor, just as I said, is disk quality.
R5 vs R6 doesn't make all that much difference. Typically R6 gives you less than 1% better array failure rate.

Ergo: The best (most resilient, least failure prone) way to do things is to

use good quality disks
prefer multiple smaller arrays over large ones.

R6 is better than R5 in terms of administration. While the failure rates over lifetime are very similar, R6 offers more breathing room ~ less pressure in exchanging a failed disk and rebuilding the array. On the other hand R5 can be rebuilt considerably quicker and is generally much cheaper. Hence, iff you have tech hands available round the clock and are in the low market end, you might prefer R5 and achieve very comparable resilience.

As a customer, particularly with sensitive data, you'll want a hoster with known to be good and quick support and R6 just in case. And you'll want a hot standby drive.

P.S. The setlX code is available upon request. It uses well established formulas.

willie · June 2017

bsdguy said: setlX code

What's setIX? And what's "r" in that chart? I'll look at the rest of the post later.

bsdguy · June 2017

A modern and extended version of setl. Strong set theoretical support, good math support, etc.

It's not that the computation couldn't be made in Python or whatever, too, but setlx (like prolog variants) are widely used and come in handy for "scientific spread-sheeting", domain exploration, formal modelling, etc.

willie · June 2017

Fine, what is r and what are those numbers of disks? Does the first row mean 3 data disks, plus 1 or 2 (depending on raid level) parity disks? The calculation looks mysterious.

bsdguy · June 2017

r is commonly used for the failure rate for a given time (commonly 1 year) and is supposed to be the same for each of the disks in an array.

The number of disks is the total (incl. payload and syndrome(s)). So n=6 means a total of 6 drives of which 5 are payload w/R5, 4 are payload w/R6. That's also the standard way to calculate it.

Mysterious? Nope. The calculation is the standard formulas for R5 and for R6.
R5: '1 − (1 − r)n − nr (1 − r)n−1'
R6: same as R5 with added '- (n/2) * r**2 * (1 - r) ** n-2'

willie · June 2017

So in the first row of the table, r=0.01 is probability of any drive failing in relevant period (1 year), raid-5 is 2 data disks and 1 parity disk, while raid-6 is 1 data disk essentially replicated 3 times. And you're saying raid-6 has only half the failure likelihood of raid-5 I think your math is off. The raid-5 is killed by any 2-drive failure while killing raid-6 requires a 3-drive failure. So I think your math is off. r-cubed is much less than r-squared and that swamps effects of the small constants out front.

Also you can't treat all failures in 1 year as simultaneous. Drive 1 fails and is rebuilt/replaced, then later drive 2 fails/is rebuilt/replaced, is much different than drive fails, then during the rebuild drive 2 fails. So r is effectively even smaller.

More of an issue is (as you said elsewhere) failures tend to be correlated (drive age etc.) and also a raid rebuild on an active server stresses drives a lot, increasing failure likelihood.

I don't think the many large and careful users of raid-6 arrays are being stupid and only need to be shown their error by you, anonymous internet person. It still looks to me like you're the one in error.

henkb · June 2017

@willie said:
raid-5 is 2 data disks and 1 parity disk, while raid-6 is 1 data disk essentially replicated 3 times.

Willie, I'm sorry, but you are wrong.
RAID-5 = a number of drives, where 1 the capacity of 1 drive is used for storing parity data (round robin on all of the drives).
RAID-6 = a bit like RAID-5, accept using the capacity of 2 drives. And storing parity double.

Check: https://nl.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID-5

willie · June 2017

henkb said: 1 data disk essentially replicated 3 times.

I'm specifically describing the 1st row of bsdguy's table, i.e. a 3-drive array.

Howdy, Stranger!

Categories

In this Discussion

[MR] What are your expectations regarding Storage VMs

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

[MR] What are your expectations regarding Storage VMs

Comments