All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Nginx & PHP-FPM with Simple Wordpress Sites - Crashes Server
I use Minstall to do setup a basic webserver to run three very basic Wordpress sites. The VPS ia a 1gb server, so should be more than enough for the job.
The issue I am finding is that, according to the monitors I have set up with uptimerobot, the websites suffer from downtime once or twice a day, reporting a Bad Gateway error. The downtime seems to last a few minutes then recovers.
In addition to this, I have found that the server itself is restarting once or twice a day (I have a monitor for each website, and a ping monitor for the server IP). I can only assume that my nginx / php-fpm configuration is causing the restarts. Can anyone help me pin point the issue and fix it?
Comments
1 GB Ram is not too much.
The big problem with Wordpress is mySQL load. You should check mySQL load more closely and pay attention for all the crapy plugins you add. 1 plugin could double MySQL load/nginx.
Nginx could do miracles over ram footprint but php is PHP, it will execute thru FPM and use always MySQL.
So you check more server load live with ram use.
M B
Install quickcache on your wordpress install.
Mun
Supercache better but before he should identify the bottle neck.
It's not a ram issue. PHP-FPM uses a request queue. Something's causing requests to outstrip FPM's ability to process the queue. Nginx will shove a request into the queue, wait and then timeout if it doesn't hear back for that request. That gives the Bad Gateway error. I suggest turning on your FPM status page and just watch how it behaves during the day.
https://rtcamp.com/tutorials/php/fpm-status-page/
Note the requests, and requests duration metrics. If some pages have deep processing, you could always try adding more fpm child processes so that you stand a better chance of not having all pipes stuck in a long duration workflow. Although without any more CPU or whatever resource limitation you're currently running up against, it's not going to cure everything.
Good advice there from @tchen !
Thanks for that, but now I am having issues getting the status page to load. I've added the code to the right section of the files but am still getting the 502 on the status page.
Post the relevant lines from (a) your www.conf php-fpm pool config, and (b) the nginx virtual host where you're trying to access the php-fpm status page.
Double check that your listen and fastcgi_pass are compatible (i.e. socket vs file)
Bingo, thats sorted it. Thanks.
So I should keep an eye on this status page next time I get an email from uptimerobot reporting a 502 error?
Have you checked /var/log/php5-fpm.log (in Ubuntu)? It should give you some sense of php-fpm health status.
Still quite new to Debian/Ubuntu. Didn't know logs existed in that folder! I'll check it now...
Only thing that stood out in the logs is instances of this:
[09-Apr-2014 20:10:17] NOTICE: fpm is running, pid 843
[09-Apr-2014 20:10:17] NOTICE: ready to handle connections
[09-Apr-2014 20:16:12] WARNING: [pool steve] server reached max_children setting (4), consider raising it
[09-Apr-2014 21:18:59] WARNING: [pool steve] server reached max_children setting (4), consider raising it
[09-Apr-2014 22:22:46] WARNING: [pool steve] server reached max_children setting (4), consider raising it
[10-Apr-2014 10:04:59] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful
[10-Apr-2014 10:04:59] NOTICE: fpm is running, pid 810
[10-Apr-2014 10:04:59] NOTICE: ready to handle connections
[10-Apr-2014 10:10:18] WARNING: [pool steve] child 1247 exited on signal 11 (SIGSEGV) after 0.100318 seconds from start
[10-Apr-2014 10:10:18] NOTICE: [pool steve] child 1249 started
[10-Apr-2014 11:00:12] WARNING: [pool steve] child 1399 exited on signal 11 (SIGSEGV) after 0.230021 seconds from start
[10-Apr-2014 11:00:12] NOTICE: [pool steve] child 1401 started
[10-Apr-2014 11:42:42] WARNING: [pool steve] server reached max_children setting (4), consider raising it
[10-Apr-2014 13:13:35] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful
[10-Apr-2014 13:13:35] NOTICE: fpm is running, pid 809
[10-Apr-2014 13:13:35] NOTICE: ready to handle connections
[10-Apr-2014 13:47:00] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful
[10-Apr-2014 13:47:00] NOTICE: fpm is running, pid 807
[10-Apr-2014 13:47:00] NOTICE: ready to handle connections
[10-Apr-2014 14:16:40] WARNING: [pool steve] child 1306 exited on signal 11 (SIGSEGV) after 0.161891 seconds from start
[10-Apr-2014 14:16:40] NOTICE: [pool steve] child 1308 started
[10-Apr-2014 14:37:25] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful
[10-Apr-2014 14:37:25] NOTICE: fpm is running, pid 810
[10-Apr-2014 14:37:25] NOTICE: ready to handle connections
[10-Apr-2014 14:45:23] WARNING: [pool steve] server reached max_children setting (4), consider raising it
[10-Apr-2014 14:47:51] WARNING: [pool steve] child 1263 exited on signal 11 (SIGSEGV) after 0.108312 seconds from start
[10-Apr-2014 14:47:51] NOTICE: [pool steve] child 1265 started
[10-Apr-2014 15:07:23] NOTICE: configuration file /etc/php5/fpm/php-fpm.conf test is successful
I assume the line "WARNING: [pool steve] server reached max_children setting (4), consider raising it" means it is reaching its limit and restarting. I've raised this to 9 in the config. Anything else I can gather from the logs?
With a 1 GB server, I would set max_children to a value around 50. I remember having read once that the value can be calculated by:
(Total RAM - RAM used by other process) / (Average amount of RAM used by a PHP process)
So 1024 MB - 250 MB (RAM used by other processes on my VPS) = 774 MB
774 MB / 15 MB (average RAM used by PHP process on my VPS, your mileage may vary) = 51
Just as a rule of thumb...
//Edit: Wait, I found the article about it - There are some more values to be set:
http://nls.io/optimize-nginx-and-php-fpm-max_children/
I had this problem before finding out the best php-fpm setting .Here is how to find correct php-fpm setting faster
1)Get the average memory usage of a SINGLE php-fpm process
ps --no-headers -o "rss,cmd" -C php-fpm | awk '{ sum+=$1 } END { printf ("%d%s\n", sum/NR/1024,"M") }'
pm.max_children = Total RAM dedicated to PHP(I would say around 750mb for your case) / average memory usage of a SINGLE php-fpm process
For the other variables its pretty simple.....Here is a complete conf
Brilliant, thanks guys that has helped. I did read a config guide yesterday, but want quite sure how to work out this variable. I changed the easy one which was to match the number of CPUs to the value I also changed the config from ondemand to dynamic.
Even with the changes I made to the config last night (9 children, dynamic processes) I have not had an email from uptimerobot today. 15 hours so far. I'll keep an eye on the load of the server. As I said before, I only have 3 amateur websites running, but I would like to run more in the future (family and friends hosting).
The amount of memory used by PHP is not the sum of all processes.
4080+3292+3292 = 10664 ~ 10 MB
PHP was consuming ~ 6 MB.
Some memory is shared between php threads.
Ok spoken too soon, server has just gone down according to my monitors. Should the php service be bringing the whole server down with it?
I suspect that is the case, check your /var/log/messages. A common scenario might be the host is oversold, then the file descriptor reaches its maximum limit, and any fork may fail ...
Not much to see in the messages file. Are you saying the node may be at fault?
Applications typically do not crash server.
Indeed. It seems as if there is a further problem...
I spoke to the provider, telling them that my server crashes daily (sometimes twice daily).
After 30 mins their response was 'it is fixed now'. As I want to learn why this has happened I asked them what the fix was. Their response was:
"Hi,
ip_conntrack: table full, dropping packet
We have notice the above message in syslog, it looks like the conntrack database doesn't have enough entries for your environment. Connection tracking by default handles up to a certain number of simultaneous connections. This number is dependent on you system's maximum memory size.
We have increased the number of maximal tracked connections now in the server and there won't be any downtime now.
Thanks,"
I cant work out where they spotted this message. He said it was in the syslog file, which I believe is in \var\logs in debian. Looking in this file it seems to only show when my cron is running, showing the restart showing services starting up and that's it. The syslog.gz files (which I assume are daily rolled logs) also don't seem to mention the line that the provider mentions.
I am also a little sceptical that this is the fix. A few minutes after I got their reply my websites went down again and the server was unresponding. When I did get back on htop said uptime of 4 mins
I assume they are referring the syslog in the host, not in your container. I am not a linux administrator, so I can not tell whether the fix will address your problem.
Did you try to use Virtual Web Hosting? It seems to be better solution for you.
I guess it could have been talking about the host node. Wouldn't other users also be complaining to them about the same issue though?
I suppose I could use hosting rather than VPS, but I like to learn. Until now the server has been running fine.