Server unstable after upgrade
There is one Ovz node which I upgraded from Xeon 5420 , 16GIG RAM, 500GB hdd raid soft 1 to E3 1231 , 32GIG RAM, 2TB raid soft raid 1 , same data center, same data. Migrated openvz container and almost identical setup but now since the setup I am into anything but trouble. This machine is just for few moderate traffic website with 5 container and only 2 of them are actually active with 12Gig ram and 8Gig , rest are just there.
So thing is that, after move, 12Gig container(cpanel, nginx->apache--> php-fcgi) on new new server had sql data crashed with in hour , I restored whole container backup as was nothing much to loose. Next, sometimes FastCGI giving error like "
' mod_fcgid: can't apply process slot for /usr/local/cpanel/cgi-sys/php5, '
when increased resources,
Then apache running out of max worker process,
Then later, found
"kernel: [228069.967073] Orphaned socket dropped " ie. numtcpsock hit..
And then mysql reaching sky high max_user_limit (though I see only 100 connection, but it was showing around 2000 requested)
And then again sql crashed.. no apparent reason..
Thing to wonder is that same setup was working very clean and stable on inferior hardware on similar traffic , rather there was room to handle tripple of the current traffic.. but on new hardware instead of improvement, things are gone rusty..
Any advice, what could be issue, I see disk smartctl test are fine, I feel somewhere there is disk issue causing but even mdadm -D showing system clean..
I am out of ideas..ofcourse I can move to better hardware, optimize software but thing is that it is just causing trouble one after another but was never on previous inferior server..
You just got punished for not obeying the rule "if it works, don't touch it!"...
But seriously: seems to me like hw-problem. Try to run complete memtest...
Review /proc/user_beancounters for any failcnt.
AS said, there was ** numtcpsock ** but was increased later but still new issues , its like one after another.
I will see into it, thanks.
Any other possibilities , anything specific to look in logs as I can't find any hardware issue related logs .
Did two pass, no error.
I am wondering what is causing trouble.. database tables are corrupted twice.. could it be dis issue.. although dont' see any error in mdadm and mdstat