How can I achieve this??

emdad · December 2016

Holiday season is upcoming. I am expecting high traffic on my apps. Last time this happened (October), server/MySQL stopped serving data no matter what I do. I got a big ass server from OVH, implemented Memcache, tuned MySql to receive more concurrent hits, use Nginx replacing apache 2, none of this worked. Only thing left to do now, is to use HAProxy to Set Up MySQL Load Balancing+api server load balancing, since issue still seems to be MySQL socket connection mostly.

I've got a 64GB RAM OVH server. Here is what I want to do now:

Get 10 additional IP
Install proxmox/KVM virtualization
Create 10 server
Setup HaProxy (Load Balancer) on one of the KVM+one KVM for api/file server
Setup 4 KVM on OVH server as MySql server
Setup another 4 KVM on OVH server as api/file server
Connect KVM's with master server to serve api via load balancer

I need your opinion on what's the best possible way to achieve this & how I can make this transition with minimum possible downtime. Since, it's a running app, and IP can't be changed because not everyone updates to latest version

Thanks in advance, everyone

trewq · December 2016

In theory you're not even scaling because you are just taking resources from one instance to give them to another. You are actually worse off.

Depending on the traffic you expect it might be more worthwhile focusing on optimisation rather than scale.

Could you give us a run down of the traffic levels you expect and what each request triggers?

exception0x876 · December 2016

If the problem is the number of MySQL concurrent connections, you might take a look at a thread pool. It will deliver more throughput at the cost of the increased latency.

emdad · December 2016

@trewq said:
In theory you're not even scaling because you are just taking resources from one instance to give them to another. You are actually worse off.

Depending on the traffic you expect it might be more worthwhile focusing on optimisation rather than scale.

Could you give us a run down of the traffic levels you expect and what each request triggers?

Yes, that's what I'm doing, it seems like resource usage is not an issue. So I want to divide them in multiple instances & set them to serve data from one or the other. Please note, my problem is MySql socket connection number & also something similar for API serving from Memcache (I'm not exactly sure what is the error in this case).

as for traffic estimation, 30K-50K hits to API exact same time. And with each hit, either serve from Memcache or direct query to DataBase.

@exception0x876 said:
If the problem is the number of MySQL concurrent connections, you might take a look at a thread pool. It will deliver more throughput at the cost of the increased latency.

Thanks. Will look into this.

trewq · December 2016

@emdad said:

@trewq said:
In theory you're not even scaling because you are just taking resources from one instance to give them to another. You are actually worse off.

Depending on the traffic you expect it might be more worthwhile focusing on optimisation rather than scale.

Could you give us a run down of the traffic levels you expect and what each request triggers?

Yes, that's what I'm doing, it seems like resource usage is not an issue. So I want to divide them in multiple instances & set them to serve data from one or the other. Please note, my problem is MySql socket connection number & also something similar for API serving from Memcache (I'm not exactly sure what is the error in this case).

as for traffic estimation, 30K-50K hits to API exact same time. And with each hit, either serve from Memcache or direct query to DataBase.

@exception0x876 said:
If the problem is the number of MySQL concurrent connections, you might take a look at a thread pool. It will deliver more throughput at the cost of the increased latency.

Thanks. Will look into this.

50k/sec requests is no small number. It's probably doable with the infrastructure you have but it might be worth looking at other solutions, especially as you don't have redundancy in place.

emdad · December 2016

@trewq said:

@emdad said:

@trewq said:
In theory you're not even scaling because you are just taking resources from one instance to give them to another. You are actually worse off.

Depending on the traffic you expect it might be more worthwhile focusing on optimisation rather than scale.

Could you give us a run down of the traffic levels you expect and what each request triggers?

Yes, that's what I'm doing, it seems like resource usage is not an issue. So I want to divide them in multiple instances & set them to serve data from one or the other. Please note, my problem is MySql socket connection number & also something similar for API serving from Memcache (I'm not exactly sure what is the error in this case).

as for traffic estimation, 30K-50K hits to API exact same time. And with each hit, either serve from Memcache or direct query to DataBase.

@exception0x876 said:
If the problem is the number of MySQL concurrent connections, you might take a look at a thread pool. It will deliver more throughput at the cost of the increased latency.

Thanks. Will look into this.

50k/sec requests is no small number. It's probably doable with the infrastructure you have but it might be worth looking at other solutions, especially as you don't have redundancy in place.

Can you suggest something? You mean like DBaaS etc.?

nullnothere · December 2016

Here are a few comments:

How many CPU's do you have? Do you have any input on how "busy" your server is? If it is not "pegged", it's IO bound (disk and/or network - most likely disk) and that'll give you a clue on where to optimize. Dividing into multiple VMs is not a good idea.
Nginx+Apache are unlikely to be big culprits (their "work" is minimal and they're both excellent at delivering static content).
For the part that is the "dynamic" part that connects to the DB and does whatever and returns the answer, you have the right approach of caching whatever in memcached and only going to DB for new/expired requests. What is the memory load and hit rate with memcached?
Finally coming to MySQL - threads are only part of the problem (and more threads will result in a lot of contention unless you have a good backend model). Are you using a good storage engine? Can you minimize the logs or else move it to a separate disk (or even ramdisk)? How big is your DB size? Can it fit entirely into say 32G of RAM? Can you try to put the key contention part into RAM via a separate DB? This will go a LONG way in solving your headaches. Also how are the read/write ratios for the DB? Again, without knowing what your doing with the DB it's a bit hard to comment more usefully.

(all of this is based on the point @trewq that just creating VMs is not a good idea and is just dividing resources and hiding the true problem).

Hope these ideas help.

pbgben · December 2016

Why mysql, could you rewrite the db class to connect to somthing like mongoDB?

Or perhaps take a leap and move to aws/google to scale better

impossiblystupid · December 2016

@emdad said:
Please note, my problem is MySql socket connection number & also something similar for API serving from Memcache (I'm not exactly sure what is the error in this case).

Until you do know what the exact bottleneck is, it is pointless to do a mass reconfig to "fix" it. That's the epitome of premature optimization.

You also don't seem to mention the application layer at all. For all we know it could be some database-intensive technology stack that also happens to poison the cache.

You probably need someone to rearchitect your whole system top to bottom. But expecting to make a lot of changes in a short time period is a recipe for disaster. I'd focus on doing triage for the highest value customers.

rincewind · December 2016

Try Percona as a drop-in replacement for MySQL.
Tweak network settings: You might be running out of network sockets. https://www.percona.com/blog/2011/04/19/mysql-connection-timeouts/. Filesystem tweaks: If you haven't done it already, mount the database partition with noatime to get a noticeable improvement.
Partition your tables - store frequently accessed tables in a separate memory DB. I assume you have already done MySQL tuning like creating indexes, defrag with OPTIMIZE TABLE etc.
To reduce pressure on the TCP stack, try Unix sockets instead of TCP sockets for connections within the same server. Nginx can forward to Unix sockets, and I think MySQL can also listen on Unix sockets. In your current setting, I think each incoming network connection creates another 3 or 4 TCP sockets for processing (Nginx -> cache/application -> DB) - you don't want that kind of amplification.

OhMyMy · December 2016

Maybe get AmazonLightsail Instances for the front end in a round robin DNS configuration and use Amazon RDS on the backend. Thus you can scale up RDS for holiday demand. Then scale it back down. Depending on number of lightsail VPS you use you can scale up/down those as needed during/after holiday rush

teamacc · December 2016

@emdad said:
as for traffic estimation, 30K-50K hits to API exact same time. And with each hit, either serve from Memcache or direct query to DataBase.

Maybe optimize your application so it spreads the requests a bit, instead of making it all hit at 1 moment?

emdad · December 2016

@teamacc said:

@emdad said:
as for traffic estimation, 30K-50K hits to API exact same time. And with each hit, either serve from Memcache or direct query to DataBase.

Maybe optimize your application so it spreads the requests a bit, instead of making it all hit at 1 moment?

This is what I intended to achieve, without using multiple IP/domain on application side. But, now I'm thinking what if I hit API's from different random servers from application.

@OhMyMy said:
Maybe get AmazonLightsail Instances for the front end in a round robin DNS configuration and use Amazon RDS on the backend. Thus you can scale up RDS for holiday demand. Then scale it back down. Depending on number of lightsail VPS you use you can scale up/down those as needed during/after holiday rush

This is a simple idea, thank you very much. I think I'm gonna do it first and see how it goes. Even if not AWS, maybe google cloud or someone else.

@rincewind said:

Try Percona as a drop-in replacement for MySQL.

Tweak network settings: You might be running out of network sockets. https://www.percona.com/blog/2011/04/19/mysql-connection-timeouts/. Filesystem tweaks: If you haven't done it already, mount the database partition with noatime to get a noticeable improvement.

Partition your tables - store frequently accessed tables in a separate memory DB. I assume you have already done MySQL tuning like creating indexes, defrag with OPTIMIZE TABLE etc.

To reduce pressure on the TCP stack, try Unix sockets instead of TCP sockets for connections within the same server. Nginx can forward to Unix sockets, and I think MySQL can also listen on Unix sockets. In your current setting, I think each incoming network connection creates another 3 or 4 TCP sockets for processing (Nginx -> cache/application -> DB) - you don't want that kind of amplification.

Don't have experience with Percona. Will have to try it out first for one of the smaller app. This looks really good. I'll have to figure out all the settings recommended/you just suggested, though

I'll have to try RDS like solution first, see if it goes my way.

Reducing pressure on TCP connection is my intention for all this (and MySQL socket connection, of course) . Surely connections are piling up, I think. I'm not that good with this side of server management.

Thanks a lot

emdad · December 2016

@pbgben said:
Why mysql, could you rewrite the db class to connect to somthing like mongoDB?

Or perhaps take a leap and move to aws/google to scale better

I wish I could move to mongoDB But, I'm gonna try aws/google cloud. Thanks

emdad · December 2016

@impossiblystupid said:

@emdad said:
Please note, my problem is MySql socket connection number & also something similar for API serving from Memcache (I'm not exactly sure what is the error in this case).

Until you do know what the exact bottleneck is, it is pointless to do a mass reconfig to "fix" it. That's the epitome of premature optimization.

You also don't seem to mention the application layer at all. For all we know it could be some database-intensive technology stack that also happens to poison the cache.

You probably need someone to rearchitect your whole system top to bottom. But expecting to make a lot of changes in a short time period is a recipe for disaster. I'd focus on doing triage for the highest value customers.

You're right. In a normal application, it is pointless. But in my case, it happens when it happens, for maybe an hour ( a chunk of 5/6 mins) maximum. And while it happens I'm busy rebooting server & MySQL so my users don't leave.

What kind of information can be helpful here? I have no idea. It's actually a personal project that got much bigger suddenly. You're right, I probably need someone expert to work with me. For which, I don't have resources. If the application continues to grow, maybe 6 months from now I can think big & concentrate on just this one app.

OhMyMy · December 2016

@emdad google cloud gives $300 trial credit for a couple months I think - so it may help you get over the holiday rush? Just a thought

impossiblystupid · December 2016

@emdad said:
What kind of information can be helpful here? I have no idea. It's actually a personal project that got much bigger suddenly. You're right, I probably need someone expert to work with me. For which, I don't have resources. If the application continues to grow, maybe 6 months from now I can think big & concentrate on just this one app.

That's not a rational approach, and exactly why I noted how important triage is. If you're looking to simply even be sustainable, never mind plan for growth, you need to start doing your resource gathering now. Because the way you're doing it doesn't appear to scale in a sustainable way, so I don't know why you expect that situation to change by itself in 6 months.

I have no idea what you're doing, but a burst in popularity that isn't tied to revenue is a net negative. Perhaps people are simply freeloading off you with no intention of ever returning the favor. Unless you have some sort of angel investor lined up that you're looking to sell out to, I'm not sure what your actual long term plan would be. Your problems don't really sound like technical problems so much as business/management problems. It sounds like it stopped being a "personal project" in October, so now you're going to have to ask people to pony up some resources if they want things to keep working smoothly.

Layer · December 2016

Sometimes moving database to SSD makes database to fly.

Order new server with OVH SSD disks and make them RAID10.

Keep mysql on different server.

pbgben · December 2016

@emdad https://www.online.net/en/dbaas

emdad · December 2016

@pbgben said:
@emdad https://www.online.net/en/dbaas

Ordered one last week. It's free, but can't create a DB now, showing error. Maybe no stock?

Harambe · December 2016

As an affordable option, since you already have OVH dedi, you could get a local SSD VPS at GRA on the cloud lineup. Put the DB server on SSDs and keep front-end on the 64GB box.

ricardo · December 2016

Tweak network settings:

I had to do this recently after creating a relatively high volume crawler. At 5000 pages/second with out of the box network settings I was getting 'nf_conntrack: table full' and had to tweak a few things. You mention having a lot of concurrent connections so it's something to watch out for.

I don't see much discussion about logs but obviously have a look at them and see what they're saying.

Also consider putting web assets like JS and images onto a CDN.

Potential errors aside, generally agree you should focus on the DB.

emdad · December 2016

@ricardo said:

Tweak network settings:

I had to do this recently after creating a relatively high volume crawler. At 5000 pages/second with out of the box network settings I was getting 'nf_conntrack: table full' and had to tweak a few things. You mention having a lot of concurrent connections so it's something to watch out for.

I don't see much discussion about logs but obviously have a look at them and see what they're saying.

Also consider putting web assets like JS and images onto a CDN.

Potential errors aside, generally agree you should focus on the DB.

Noted with thanks, also Images are already hosted on CDN, just FYI

Howdy, Stranger!

Categories

In this Discussion

How can I achieve this??

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

How can I achieve this??

Comments