Providers with High Availability VPS

drserver · May 2018

raindog308 said: ...which is not implemented on a SAN, so...?

Ok, I will explain you . EBS is main type of storage those days at AWS. It is network storage. Each node have live internet network and EBS network. So it is not local in any way. They have multiple tiers of EBS storage actually.

Real world example would be worlds workhorse c4.whatever instance family. Lowest one in the family have dedicated 500mbps for EBS.

As for google https://cloud.google.com/persistent-disk/

One good thing with google is "multi-mount" so few instances can use same drive

raindog308 · May 2018

Francisco said: You can move the volumes around, no? Unless you mean they're mapping the volumes that way too?

drserver said: Ok, I will explain you . EBS is main type of storage those days at AWS. It is network storage. Each node have live internet network and EBS network. So it is not local in any way. They have multiple tiers of EBS storage actually.

Sorry if I wasn't clear:

Yes, EBS is a sort of SAN-like storage (from an AWS subscriber's perspective)
But EBS itself does not run on top of a SAN. Amazon does not use SANs.

My point was that the major cloud vendors do not use SANs. They do provide SAN-like storage features for end users.

Francisco · May 2018

raindog308 said: Sorry if I wasn't clear:

Yes, EBS is a sort of SAN-like storage (from an AWS subscriber's perspective)

But EBS itself does not run on top of a SAN. Amazon does not use SANs.

My point was that the major cloud vendors do not use SANs. They do provide SAN-like storage features for end users.

Yep, so it's what I figured, it's all hyper-converged it sounds like.

Francisco

drserver · May 2018

raindog308 said: My point was that the major cloud vendors do not use SANs. They do provide SAN-like storage features for end users.

Fair

zllovesuki · May 2018

... sigh this conversation again.

consumer grade vs "enterprise":

Enterprise hardwares have a supported life span. Supported as in will the vendor give us a new firmware when there's security flaw, as in will the vendor fix our problem if it acts weird, as in will we have someone to complain to when the hardware dies. We frequently (before, not anymore) have problems with Arista and Brocades because of their EoL models and etc.

Whereas with consumer grade, well, good luck. You can cry in your sleep when your hardware dies. It's fine if you run some one-man VPS providers and your clients don't run their next PayPal on it. Plus, with "HA" you need FC or FCoE to be reliable, which most consumer grade hardwares don't support. Same goes for LACP and (some) STP.

iSCSI NFS w/e with HA:

We have vMotion for exactly the same reason: it just works. if it doesn't, it's not our problem! We do have people run the cluster, in fact we only have one person for the entire VMWare infrastructure (3 storage nodes, 3 hypervisor nodes), AND she could do some other things as well. Why? Because we have support contract, so she doesn't have to run around and scream in the middle of night (of course she will be up in the middle of the night if something goes wrong but it would be she screaming at VMWare not at herself).

In-house solutions don't always work. For example we have a lot of Supermicro-based servers running with ZFS on SmartOS. There was an incident where the three of my coworkers have to stay up the entire night to figure out why the storage server died. Turns out it was a disk silently failed and the I/O was blocked for too long and kernel become unresponsive (not panic, unresponsive).

If it were NetApp or EMC array, we would just call them and have someone get down here in 4 hours to fix it (but of course they are expensive so not ALL of storage are with them).

Imagine that with a small company where you have max of 7 employees, how would you resolve the HA problem? Don't assume things won't fail: it will definitely fail.

Plus, say you do have enough people to run your ideal HA environment for VPS, for example, an FTE for an Unix Sysadmin is >95,000/year (before benefits), with taxes and benefits and etc. are you looking at 120,000/year (numbers based on my region). How would you price your services? Who's gonna be the PoC for your clients? If you don't have some sort of support then why bother using your "HA"? It goes much further than a simple "HA infrastructure" to make a reasonable business out of it.

I recently figured out that HC (a vendor's name) will charge us $230/node for 24x7x365 support (even though the service is nowhere near mission critical). We have 500+ node. For that amount, we could hire another FTE to do the same thing (because it's not mission critical) AND to do more things.

In conclusion, while bother providing "HA" when economics of scale starts to kick in and you realize that your clients might as well run their next PayPal on AWS?

randvegeta · May 2018

@raindog308 said:

randvegeta said: What's wrong with consumer gear or gbit switches?

Consumer hardware is cheap, has high performance, and is no less reliable than 'enterprise' hardware.

You are high as a kite with that statement.

In general, SAN-type setups (which is what you're describing) for web hosting are too expensive to do right and worthless if not done right, which is why they're rarely done in this market.

Most providers simply don't need a SAN to meet their customer's needs. And that's end of story, because they are much more complicated to manage.

And large providers who have practically unlimited funds and a huge customer base and hundreds of engineers...nope! Neither Amazon nor Microsoft Azure nor Google Cloud use SANs or storage networks. All storage is local SSD.

https://serverfault.com/questions/117745/what-makes-cloud-storage-amazon-aws-microsoft-azure-google-apps-different-fr

SANs have a place - generally outside the cloud - but they are not a panacea for all.

Where did you get the idea that I was talking about using a SAN? What do you mean by SAN? And I never said anyone should actually use consumer hardware for a HA cluster. I was only saying it could be done on the cheap, with practically any hardware. The compromise being the performance is limited by network.

I don't think any cloud provider uses fully local storage. You cant have true HA with pure local storage. The data needs to be replicated in real time for HA. Hyperconvergence is probably what most big boys do, but that's not expensive either. It's just a little more complex. Separating storage from virtualization makes things a little easier to manage, at the course of a little lost performance.

randvegeta · May 2018

zllovesuki said: consumer grade vs "enterprise":

I don't think anyone is comparing the 2.... No one is saying consumer is a suitable alternative for Enterprise gear in a production environment. But that doesn't mean HA cannot be achieved at relatively low cost, and for some situations may be perfectly fine (i.e. test systems).

zllovesuki said: In conclusion, while bother providing "HA" when economics of scale starts to kick in and you realize that your clients might as well run their next PayPal on AWS?

I think you're missing the point. Or maybe I wasn't clear. The idea is not to target clients looking for HA because they have mission critical requirements. The idea is to make life easier for the VPS provider.

If the performance that can be achieved from a highly available distributed storage cluster can compete with local disks in terms of throughput, and you can scale out disk I/O (which is often the case when you add more disks / storage nodes to a cluster) then you can get a lot of benefits without sacrificing performance. 10G (or 20G with bonding) NICs absolutely can compete with SATA.

So it's an issue of cost right? Building a storage cluster need not be THAT expensive.

I'll give an actual example.

Virtuozzo Storage, supposed to work with Virtuozzo virtualization in a hyper-converged environment, actually works as a standalone storage solution. It supports iSCSI and S3 block storage in addition to how it's supposed to work with Virtuozzo's own solution. Hardware requirements for the cluster is very modest ( you can literally build this on any commodity / consumer grade hardware ).

Licensing is a little expensive, depending on how you look at it. $3 /100GB /month, but this is for space you actually use, meaning you can 'oversell' and only license what you need when you need it. You get support with this licensing too... so if you have a problem, you're supposed to be covered. You can add nodes to the cluster to increase total storage capacity, bandwidth and I/O. Adding more nodes does not cost more in terms of licensing. So when it comes to scaling out, it's just a matter of HW costs.

In terms of how much hardware you need, well it also starts looking expensive when you consider the level of redundancy. For every 1TB of usable disk space on the cluster, you need at least 3TB of physical disk space, in order to accommodate the triplicate data copies. Okay so in addition to requiring more nodes for the storage cluster and licensing (if you want to use a commercially supported solution), you also need 3x more disks than you would otherwise need for local storage. So it starts looking a bit expensive. Except using this solution actually means you don't need a massive up front cost. You can grow your cluster as needed. And you get better utilisation of resources since the vast majority of VMs don't even use half of their allocated disk space. And since most VPS providers are using RAID 10 any way, they still lose 50% of their capacity to the RAID any way.

Forget the benefits to the end users. I'm thinking entirely about the VPS provider. What they get from network storage is dynamic capacity (grown/shrink as needed), fast migration between nodes, possibility of HA, easier management, and potentially a performance boost if they were not already going for SSD local storage.

zllovesuki · May 2018

Talk is cheap, show me the product.

raindog308 · May 2018

Francisco said: hyper-converged

I was already sick of that term the second time I heard it.

I hope plaid-converged is next.

deank · May 2018

Um..., 7 ?

Shazan · May 2018

I believe the real problem with HA with network storage is that it rarely provides a better uptime and reliability, because it adds a level of complexity and a centralized source of problems.

I've always preferred the "make it simple" approach, because it is harder to break and easier to fix than a HA system.

Shazan · May 2018

Anyway, Leaseweb has implemented HA with SAN for their VPS line.

randvegeta · May 2018

zllovesuki said: Talk is cheap, show me the product.

It's not LET pricing. But it's been around for a long time :-).

Shazan said: I believe the real problem with HA with network storage is that it rarely provides a better uptime and reliability, because it adds a level of complexity and a centralized source of problems.

I don't know about that. I mean it may only provide a tiny / marginal improvement in up-time considering hardware tends to last a long time before it fails. The question is how much is that 0.01% difference worth?

Shazan said: I've always preferred the "make it simple" approach, because it is harder to break and easier to fix than a HA system.

There is definitely an argument for that approach. I mean if you want to go relatively cheap, keep it simple, and target the LET crowd, then it would probably be cheaper to simply deploy single disk nodes and have regular backups to a simple storage server. Then so long as you can spin up a new node quickly, and quickly retrieve the backups, then that might be acceptable for some...

willie · May 2018

There are several providers with Ceph-backed VPS so that if the host node fails the VPS can migrate to another one in the cluster. I guess that adds reliability and I'd use it for a server where uptime was important. But I think real HA means truly independent servers, preferably in separate data centers. So in the case of a web site you'd need ongoing db replication. I'd love it if someone on LET offered that.

msg7086 · May 2018

HA VPS sounds like an enterprise thing while you are taking consumer hardware as an example.

If I'm going to use consumer hardware for HA, I'd do a cheap-as-hell multi-VPS settings with DNS failover / LB, not this kind of "true HA".

subhojitdutta · May 2018

LET and HA don't mix well. 70$ would be the true price of a good HA service not 7$. Since most companies who require HA already have onboard tech people, they tend to do it themselves. Therefore for VPS providers it does not make sense.

willie · May 2018

subhojitdutta said: 70$ would be the true price of a good HA service not 7$

Meh, you need 2 VPS on separate servers, plus a template that sets up LAMP with db replication and failover and maybe an anycast proxy CDN. $7/month seems quite doable.

I originally wanted it for shared hosting so there would be a replicated database shared by multiple users, but that isn't so good since one person's script could bog down the db and impair reliability for other users.

zllovesuki · May 2018

willie said: replicated database

SQL? NoSQL? NewSQL? Active-Active? Active-Passive? Geo-aware replication (GDPR)? OLAP? OLTP?

Different database and use case combos have different deployment strategies. There's absolutely no reasons to use a "HA-capable provider."

zllovesuki · May 2018

btw for those who are looking at "replicated database", make sure you understand what you are buying (and check out Jepsen) before investing your product/app into the backend.

willie · May 2018

Just regular replicated mysql for the usual shared hosting use case of someone with a low traffic wordpress or similar site, except they want more resilience from outages.

subhojitdutta · May 2018

@willie said:

subhojitdutta said: 70$ would be the true price of a good HA service not 7$

Meh, you need 2 VPS on separate servers, plus a template that sets up LAMP with db replication and failover and maybe an anycast proxy CDN. $7/month seems quite doable.

I originally wanted it for shared hosting so there would be a replicated database shared by multiple users, but that isn't so good since one person's script could bog down the db and impair reliability for other users.

Exactly my point, if the client itself is able to do it on a much cheaper basis, what is the point of the provider to do so as a lot of customers use VPSes for various other purposes like development/storage/backup/relay and don't really require HA on such VPSes. The VPS provider will have to implement HA for all its customers and not the one who specifically requires it i.e. Two Dedicated Servers in preferably two seperate locations with identical specifications + replication costs + CDN costs for every customer would mark it up to.. A lot.. Hence my hypothesis of 70$. This obviously comes down to a use case basis and a lot of additional factors like upstream providers, data center and / or location. I am only voicing my opinions on a provider's POV as that is what OP has requested.

jsg · May 2018

@willie said:
Just regular replicated mysql for the usual shared hosting use case of someone with a low traffic wordpress or similar site, except they want more resilience from outages.

I've seen some companies trying that route and it was painful and they didn't achieve what they wanted. And I co-worked on something for a quite massive operation that designed it the way I now consider the best way. They put that into the server application and killed some other high reliability/availability problems too along the way.

I'm confused btw by the rather diverse things that are discussed here under "HA" plus the mix from "couple of cheap 2nd hand machines" to proper storage HA putting a cheap and rather poor ubiquity switch next to a n+1 100 Gb/s backend solution or mentioning operations like AWS which even have their own specifically designed hardware.

FHR · May 2018

@willie said: Just regular replicated mysql for the usual shared hosting use case of someone with a low traffic wordpress or similar site, except they want more resilience from outages.

I would like to point out that doing geo-replicated setups is definitely not easy, and the complexity increases exponentially with the distance between your nodes.

Who thinks there's really nothing to it, try doing HA WordPress setup, with one server in Germany and second server in London for example.
You need active/active MySQL replication and active/active file sync in near real time, over a high latency WAN connection.

Francisco · May 2018

FHR said: Who thinks there's really nothing to it, try doing HA WordPress setup, with one server in Germany and second server in London for example.

You need active/active MySQL replication and active/active file sync in near real time, over a high latency WAN connection.

this comment drifts a bit, but could be useful to some

That's why all of the 'geo replicated' SQL offerings you see on the market have their IOPs rated in the 10's and not hundreds/thousands.

If you have a 95%+ read based database it'll be OK since local will be nice and speedy with the tiny bit of write work being slow. Anything lower than that and it's gonna drag ass.

Even with Galera i've had plenty of issues where any sort of minor lag can cause a desync, requiring the out-of-sync server to be restarted so it'll regroup.

Honestly, in the 'geo replicated Wordpress' projects i've seen people want to host, I almost always recommend to do reverse proxy caching.

Francisco

willie · May 2018

Well I guess there's HA and then there's HA . I'd still count it as HA if there's physical replication but no geo replication. E.g. two DC's in separate but nearby buildings with separate power systems, network feeds etc. would count, with local fiber between the two buildings handling replication in the usual case where both systems were running. I think this is basically the situation at Hetzner's DC parks and at places like Google.

zllovesuki · May 2018

Francisco said: Even with Galera i've had plenty of issues where any sort of minor lag can cause a desync, requiring the out-of-sync server to be restarted so it'll regroup.

You might want to turn evs.user_send_window and evs.send_window up to 11. Also, increase the gcache.size to something more reasonable.

Francisco · May 2018

zllovesuki said: You might want to turn evs.user_send_window and evs.send_window up to 11. Also, increase the gcache.size to something more reasonable.

Neat

The user moved to a reverse proxy setup and has been happy since.

Still, WAN syncing is gonna suck.

Francisco

zllovesuki · May 2018

@Francisco said:

zllovesuki said: You might want to turn evs.user_send_window and evs.send_window up to 11. Also, increase the gcache.size to something more reasonable.

Neat

The user moved to a reverse proxy setup and has been happy since.

Still, WAN syncing is gonna suck.

Francisco

It's always use-case dependent. In this example, Wordpress is terrible with this Galera.

However say if you are running LDAP then Galera is perfect (free DR why not).

Francisco · May 2018

zllovesuki said: However say if you are running LDAP then Galera is perfect (free DR why not).

"If you have to write to it you've already lost" in other words

Francisco

pullangcubo · May 2018

@Francisco said:

Honestly, in the 'geo replicated Wordpress' projects i've seen people want to host, I almost always recommend to do reverse proxy caching.

I don't want to hijack this thread, but this setup has always been a hypothetical for me and never quite gone down and dirty to actually try it...but what's your specific setup for this? What software do you / did you use?

Howdy, Stranger!

Categories

In this Discussion

Providers with High Availability VPS

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Providers with High Availability VPS

Comments