Galera over WAN: how far apart is reasonable?

raindog308 · October 2016

I think big, so when I decided I wanted to setup a 3-node Galera cluster (MariaDB 10.1 on Deb 8), I thought, what the hell, let's be geo-redundant: Las Vegas, Frankfurt, Tokyo. From that you can probably name the providers (BuyVM, DO, Vultr if you can't).

Average pings:

from frk to tok 243.344 ms
from frk to vgs 142.656 ms
from vgs to frk 142.560 ms
from vgs to tok 116.430 ms
from tok to frk 243.291 ms
from tok to vgs 116.428 ms

I'm seeing some 1205 lock timeouts and such on rather elementary transactions, but before I go digging in to tuning or seeing if it's my code, I'm wondering what kind of cluster distances are reasonable. Assume everything is over public Internet WAN...

Three different continents?
Three different DCs on same continent (yes, I'm aware they're not all the same size)
Three different DCs in same region (e.g., "Eastern US")
You really want all three in the same DC talking over private network

yomero · October 2016

I would go no more than the same region. I think the cluster should do a lot of roundtrips for even the most simple operations, and could take several seconds for that kind of latency between continents.

akhfa · October 2016

I ever read somewhere that the latency should be under 10 ms, but I forgot the source.

exception0x876 · October 2016

Generally I don't think you want to spread the nodes over different locations if you care about performance. If you just want to be geo-redundant, I suggest to setup a standard MySQL replication to the remote nodes and limit them to read-only mode while your main cluster is active.

perennate · October 2016

It depends on what kind of workload you're running on the database. If it's mostly read-only, with a few inserts and updates every now and then, your current setup should be fine; I'm not sure why you're getting lock timeouts, is that because you're running conflicting transactions, or it even happens when you run one query in isolation (without anything running on the other nodes)? If you have a very write-heavy workload, synchronous replication might not be desirable at all, you could have asynchronous replication instead.

Francisco · October 2016

The question should be 'how many write queries am I doing and how many per page?' The latency in ping is how long each query will have added in run time. If you're 90%+ reads though this isn't an issue since reads are always done against the local install and not against global cluster.

Francisco

exception0x876 · October 2016

Francisco said: The latency in ping is how long each query will have added in run time.

Not entirely. If you do several queries in one transaction, Galera replicates entire transaction. You pay the same price whether the transaction has a single statements or several statements.

So if all your writes are auto-committed transactions, each write will trigger a commit that will need at least one RTT (Round-Trip Time) to complete. If RTT is 100ms, that means 10 writes/s.

But if your transactions have 10 writes, you will only need a commit every 10 writes: with 10 commits/s, you can now run 100 writes/s.

pbgben · October 2016

Unless your glued to galera, take a look at xtreemFS its meant to be for wan clustering

Francisco · October 2016

@exception0x876 said:

Francisco said: The latency in ping is how long each query will have added in run time.

Not entirely. If you do several queries in one transaction, Galera replicates entire transaction. You pay the same price whether the transaction has a single statements or several statements.

So if all your writes are auto-committed transactions, each write will trigger a commit that will need at least one RTT (Round-Trip Time) to complete. If RTT is 100ms, that means 10 writes/s.

But if your transactions have 10 writes, you will only need a commit every 10 writes: with 10 commits/s, you can now run 100 writes/s.

Very good point but most software doesn't use transactions for works loads, they just one query at a time.

Francisco

Neoon · October 2016

Frankfort wat.

raindog308 · October 2016

Francisco said: Very good point but most software doesn't use transactions for works loads, they just one query at a time.

I am a DBA by trade so I use transactions for everything :-) I also normalized the data to hell and back so most transactions are just passing integers around.

However, there are also some "stuff some JSON in the DB" statements and a single insert (which should be completely unlocking!) is causing some 1205s. I'm not sure if it's Galera's optimistic modeling + triggers + auto_increment over the WAN, or something in my code...but thought I'd ask about WAN before debugging.

Neoon said: Frankfort wat.

Oh you...I fixed it :-)

pbgben said: Unless your glued to galera, take a look at xtreemFS its meant to be for wan clustering

xtreemFS seems more like a competitor for Gluster. I don't think I want to run a database atop a distributed filesystem. Distributed volume in a LAN situation like DRBD is one thing, but running atop a distributed fs sounds like trouble. Unless they have some magic, but I didn't see mention of DBs on their web site.

PrincessOfCats · October 2016

What sst method are you using?

rsync will cause locks, use xtrabackup if possible

raindog308 · October 2016

In this case, I was amused to find the issue (1205 lock timeouts) appeared again even after I put all three nodes in the same DC talking over a private network. So geography played no role.

I found the cause:

https://www.percona.com/blog/2012/03/27/innodbs-gap-locks/

So my test case would insert, select for update/update, insert...and hang. The problem was that while the select for update was doing a sort and LIMIT 1, I think it was grabbing locks on more rows (and auto_increment values) than just 1. Combine that with the way galera does auto_increment interleaving and it was easy to get to a case where a lock was held on the next auto_increment value just when someone came by to use it...

Switching to READ-COMMITTED eliminated the problem.

Jacob · October 2016

Awesome, glad you got it sorted.

We don't use Galera just a MariaDB master + slave but it's useful info

oneilonline · October 2016

Interesting read. We have used galera clusters only within the same datacenter because of higher latency when in even different regions, seams less than <10ms is best. I'm curious to hear more successful uses between continents, if there are any?

vexilla · October 2016

Just set your flow control stupidly wide. :P

Howdy, Stranger!

Categories

In this Discussion

Galera over WAN: how far apart is reasonable?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Galera over WAN: how far apart is reasonable?

Comments