Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Galera over WAN: how far apart is reasonable?
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Galera over WAN: how far apart is reasonable?

raindog308raindog308 Administrator, Veteran
edited October 2016 in Help

I think big, so when I decided I wanted to setup a 3-node Galera cluster (MariaDB 10.1 on Deb 8), I thought, what the hell, let's be geo-redundant: Las Vegas, Frankfurt, Tokyo. From that you can probably name the providers (BuyVM, DO, Vultr if you can't).

Average pings:

from frk to tok 243.344 ms
from frk to vgs 142.656 ms
from vgs to frk 142.560 ms
from vgs to tok 116.430 ms
from tok to frk 243.291 ms
from tok to vgs 116.428 ms

I'm seeing some 1205 lock timeouts and such on rather elementary transactions, but before I go digging in to tuning or seeing if it's my code, I'm wondering what kind of cluster distances are reasonable. Assume everything is over public Internet WAN...

  • Three different continents?
  • Three different DCs on same continent (yes, I'm aware they're not all the same size)
  • Three different DCs in same region (e.g., "Eastern US")
  • You really want all three in the same DC talking over private network
Thanked by 1yomero

Comments

  • I would go no more than the same region. I think the cluster should do a lot of roundtrips for even the most simple operations, and could take several seconds for that kind of latency between continents.

  • I ever read somewhere that the latency should be under 10 ms, but I forgot the source.

  • exception0x876exception0x876 Member, Host Rep, LIR

    Generally I don't think you want to spread the nodes over different locations if you care about performance. If you just want to be geo-redundant, I suggest to setup a standard MySQL replication to the remote nodes and limit them to read-only mode while your main cluster is active.

  • perennateperennate Member, Host Rep

    It depends on what kind of workload you're running on the database. If it's mostly read-only, with a few inserts and updates every now and then, your current setup should be fine; I'm not sure why you're getting lock timeouts, is that because you're running conflicting transactions, or it even happens when you run one query in isolation (without anything running on the other nodes)? If you have a very write-heavy workload, synchronous replication might not be desirable at all, you could have asynchronous replication instead.

    Thanked by 1vimalware
  • FranciscoFrancisco Top Host, Host Rep, Veteran
    edited October 2016

    The question should be 'how many write queries am I doing and how many per page?' The latency in ping is how long each query will have added in run time. If you're 90%+ reads though this isn't an issue since reads are always done against the local install and not against global cluster.

    Francisco

    Thanked by 1yomero
  • exception0x876exception0x876 Member, Host Rep, LIR

    Francisco said: The latency in ping is how long each query will have added in run time.

    Not entirely. If you do several queries in one transaction, Galera replicates entire transaction. You pay the same price whether the transaction has a single statements or several statements.

    So if all your writes are auto-committed transactions, each write will trigger a commit that will need at least one RTT (Round-Trip Time) to complete. If RTT is 100ms, that means 10 writes/s.

    But if your transactions have 10 writes, you will only need a commit every 10 writes: with 10 commits/s, you can now run 100 writes/s.

    Thanked by 1yomero
  • pbgbenpbgben Member, Host Rep

    Unless your glued to galera, take a look at xtreemFS its meant to be for wan clustering

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @exception0x876 said:

    Francisco said: The latency in ping is how long each query will have added in run time.

    Not entirely. If you do several queries in one transaction, Galera replicates entire transaction. You pay the same price whether the transaction has a single statements or several statements.

    So if all your writes are auto-committed transactions, each write will trigger a commit that will need at least one RTT (Round-Trip Time) to complete. If RTT is 100ms, that means 10 writes/s.

    But if your transactions have 10 writes, you will only need a commit every 10 writes: with 10 commits/s, you can now run 100 writes/s.

    Very good point but most software doesn't use transactions for works loads, they just one query at a time.

    Francisco

  • NeoonNeoon Community Contributor, Veteran

    Frankfort wat.

  • raindog308raindog308 Administrator, Veteran

    Francisco said: Very good point but most software doesn't use transactions for works loads, they just one query at a time.

    I am a DBA by trade so I use transactions for everything :-) I also normalized the data to hell and back so most transactions are just passing integers around.

    However, there are also some "stuff some JSON in the DB" statements and a single insert (which should be completely unlocking!) is causing some 1205s. I'm not sure if it's Galera's optimistic modeling + triggers + auto_increment over the WAN, or something in my code...but thought I'd ask about WAN before debugging.

    Neoon said: Frankfort wat.

    Oh you...I fixed it :-)

    pbgben said: Unless your glued to galera, take a look at xtreemFS its meant to be for wan clustering

    xtreemFS seems more like a competitor for Gluster. I don't think I want to run a database atop a distributed filesystem. Distributed volume in a LAN situation like DRBD is one thing, but running atop a distributed fs sounds like trouble. Unless they have some magic, but I didn't see mention of DBs on their web site.

  • edited October 2016

    What sst method are you using?

    rsync will cause locks, use xtrabackup if possible

  • raindog308raindog308 Administrator, Veteran

    In this case, I was amused to find the issue (1205 lock timeouts) appeared again even after I put all three nodes in the same DC talking over a private network. So geography played no role.

    I found the cause:

    https://www.percona.com/blog/2012/03/27/innodbs-gap-locks/

    So my test case would insert, select for update/update, insert...and hang. The problem was that while the select for update was doing a sort and LIMIT 1, I think it was grabbing locks on more rows (and auto_increment values) than just 1. Combine that with the way galera does auto_increment interleaving and it was easy to get to a case where a lock was held on the next auto_increment value just when someone came by to use it...

    Switching to READ-COMMITTED eliminated the problem.

    Thanked by 1vimalware
  • Awesome, glad you got it sorted.

    We don't use Galera just a MariaDB master + slave but it's useful info :)

  • oneilonlineoneilonline Member, Host Rep

    Interesting read. We have used galera clusters only within the same datacenter because of higher latency when in even different regions, seams less than <10ms is best. I'm curious to hear more successful uses between continents, if there are any?

  • Just set your flow control stupidly wide. :P

Sign In or Register to comment.