Production server | What the heck?
A new day, a new post.
I woke up this morning with 5 tickets in which I was told that the KVM servers were moving deplorably and later I had to find out why (see the screenshots).
The big problem is that I was at the office (at work) all day and then I traveled by train, going back home only in the morning, in about 10 hours.
I know it is not ethical, professional, inspirational and well seen, but in order to solve the situation as quickly as possible tomorrow, I will tell you what I have already done and I did not manage to find the source of this problem:
1. I tested the RAM memory (nothing abnormal)
2. I checked the values in "smartctl" for each individual disk, there is no error on any of them
3. I checked the health status reported by proxmox for each ZFS pool, nothing abnormal there either.
4. I checked the temperature of the processors, nothing abnormal (below 60 degrees Celsius)
Any idea is welcome. (I try to impact customer services as little as possible, being an already critical situation)