New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Comments
Yeah, that's 2 million files. For instance, lv-shared04 has ~22M inodes on it.
Francisco
Every node gets 3 backups a week. We shift things some so LUX is like Tues/Thurs/Sat or something like that, that way each group of nodes doesn't have to fight with other nodes for inodes.
As of now all but a handful of accounts we missed on the initial rounds have been imported. We've fixed whatever IP's were wrong.
This issue is resolved
Francisco
I'm going to take a wild guess that BuyShared's boxes have an order of magnitude more accounts per server.
40,000 accounts per server.
Not that bad. lv-shared04 has a peak of 1000 IP's (a /22 of IP's).
We just have a lot of people that have never, ever, cleaned their spam folders so you end up with 50,000+ unread emails.
Francisco
2 million files, per account for the top ones.
In my case, it's 403 accounts and 34.4 million inodes and total of 2'645 gigabyte of files.
Total backup time being 2 hours and 37 minutes last night.
If it takes 24+ hours for 22 mil inodes, then something must be very wrong, it means you're reading on average 250 iops on the webserver itself on average per second over a 24 hour span - that seems super low :-) But what do I know.
Another smaller box is 228 gigabyte of data, 4.3 million inodes and 125 accounts, takes about 40 minutes, and that's even on spinning rust
Consider mbox format - would probably also greatly reduce a recovery process.
The issue isn't the reading on the shared side, it's that the destination is getting slammed by not only 7 other shared nodes looking to do the same work, but also BuyVM backups (though those are more stream heavy).
I thought mbox was deprecated. If not, i'll for sure consider that.
Francisco
Shlonged.
Francisco
Sorry, mdbox - not mbox - hate the fact their naming is close: https://documentation.cpanel.net/display/68Docs/Mailbox+Conversion
They even considered making mdbox default at some point - however was not done, but this was added in the recent releases - and cPanel did migrate all their internal emails to mdbox themselves
Neat
Will wait and see if there's any known issues but i'd for sure love to have something like that instead of a metric crap ton of inodes.
Francisco
It was introduced in cPanel v56, so has been there for at least 1+ year - and cPanel run it internally with terabytes of emails - if they'd switch their own @cpanel.net emails to run with it, I'd assume it's "good enough" to make the company rely on it
Might be faster to just save a compressed tarball of each account as backup, rather than attempting differential or file by file backup. So you write just one file per account on the backup server.
True, but very heavy on space usage.
In the case of a disaster we'd likely resort to the full drive snapshots which I'll be able to restore at full line rate since it's just streaming data.
Francisco
some hours offline. a copy a few days ago. but much better than losing everything
Actually dump/restor is still a thing and can do differential dumping into a single file on the backup device. That might be a decent alternative. Space consumption stays the same with all approaches if you're not keeping multiple backups around, but the tarball approach increases traffic to the backup server so that's not great.
I'm not so sure you can do that on an active filesystem because of getting inconsistent snapshots as stuff changes during the snapshot process.
I can LVM snapshot the entire node so it'd be just fine.
Francisco
How does that work? The stuff inside the LVM partitions would still be changing during the snapshot, I would have thought. I could imagine setting up an overlay filesystem or similar to allow individual accounts to be safely snapshotted and that would be interesting, but I'm not aware of it having been done.
More like "OVH Server + Summer Holidays: A Guide to Scamming"
No. The 'changed data' gets stored in a different area. When you make a snapshot you tell it how much space it can use for that very purpose.
https://www.thomas-krenn.com/en/wiki/LVM_Snapshots
Francisco
Why doesn't anyone mention this might have something to do with the low amount of TBW the consumer series SSDs have?
If they are from the same batch and are in a raid, isn't it expected for them to fail just simultaneously?
That was actually already mentioned.
My bad then, I missed that.
Fran mentioned he monitors the wear level on them, on the 1TB variants you're looking at like 2PB of write life per drive. Definitely a possibility, but that seems like a lot of available room for writes on a shared hosting node.
My site backs up online without issue. Great work, Fran!
I documented about this earlier but I'll repeat it here
The SSD's were all in the 30 - 40% left on the official ratings for the drives, but Samsung's can go far and beyond the limits on those drives. Still, we weren't anywhere near that.
I know this for a fact because I smart'd all the drives around 2 weeks ago when Karen & I decided we wanted to give Shared a nice upgrade with bigger CPU's and a move to the NVME drives.
Honestly, with what we've seen I think it was just because the drives never got a firmware update. We bought those drives right when the 850's first hit the market and pulling the node offline to flash the firmware and take a chance at losing everyones data wasn't a happy thought to me.
For all I know there's some subsystem that was patched in a later firmware where the drives can go cockeyed if they hit a certain wear level. Samsung knows, I don't.
We do a lot to make sure we take as much strain off our drives as we can. We're close to a 85/15 Read/Write work load on our shared node SSD's. There's absolutely no swap on our nodes and we do some other tricks to take high thrash areas off the drives completely.
I'm not happy with it but I'm incredibly proud with how quickly Anthony & I were able to diagnose the problem, build completely new nodes, get everything installed, every backup we had packed and restored.
I think it was just over 24 hours for the whole episode. Given what i've seen from countless other hosts out there, they would be 5 - 10% into the restore by now.
This is resolved. Anyone with outstanding issues with their site, log a ticket, i'll personally sort you out!
Francisco
https://www.anandtech.com/show/8747/samsung-ssd-850-evo-review lists the rated endurance of the 1TB 850 EVO as 150TB.
Correct! Even the PROs' aren't that much higher.
The enterprise stuff gets into the multi PB.
Still, this is shared. shared04 already had a couple years on it and still had a good bit left.
Francisco
Quick question from the official BuyVM LET help desk, when do you think the KVM slices will get NVMe installed in 'em? (srs)
Unlikely any time soon.
It's a huge cost and not enough people will care about it. I would have to do full motherboard changes, or move to a 2U chassis, since our spare PCI slots are being taken by our infiniband networking.
Free nightly backups & snapshosts will come out around summer or so though That should be fun.
Francisco
Ah, my bad. My quick google search brought this page up earlier: https://www.anandtech.com/show/8747/samsung-ssd-850-evo-review/4