Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Cluess why VPS offline 3 times?
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Cluess why VPS offline 3 times?

bustersgbustersg Member
edited May 23 in Help

Provider says server node ok and I did not hit threshold.
Asked me to check my own VPS processes instead

Offline always happen at 5.2xam++ and the nearest cron job i suspect is a 5.10am daily rsync bash script 20GB data backup (via 5 rsync command) to mounted iDrive-e2 (yes I mounted to the linux VPS)
But odd thing is offline issue happened on alternate days (see grafana) and my backup DOES sync to iDrive successfully because i logged in iDrive and checked the bucket timestamp.

Furthermore, as a IT engineer myself, I don't believe a mounted drive on a VPS could cause it to go offline. The network I/O and metrics are healthy.

grafana-3-offline

Solution: since this happened 3rd time, I moved the cron earlier by 2 hours and shall check the next offline if concide with the change. I will also consolidate the 5 rsync job (diff DIRs) into 1 rsync command.

Comments

  • Lack of whipping.

  • thanethane Member
    edited May 23

    Sounds like it's the backup job for sure. I faced similar issues before, the fix for me was setting swapiness to 0 (because I noticed swapping occuring during backups that had load spikes) and reduced concurrent backups running to 1. Consider if your VPS is underpowered for the backup as well, it may need more CPU if that's what's bottlenecking, hard to tell what's bottlenecking since your monitor blacks out during the crashes, probably locked up from whatever is bottlenecking and causing CPU to max out).

  • bustersgbustersg Member

    i zoomed in grafana before offline and see no cpu spikes and it occurs on alternate days 2 days offline so i don't think is cpu / load.

    on 2nd look on grafana, i suspect the "RAM cache/buffer" graph, is like it takes 2 days to reach a threshold and the VPS juz gave up. there are ways to MANUALLY clear the cache/swap/buffer - https://www.tecmint.com/clear-ram-memory-cache-buffer-and-swap-space-on-linux/
    but I'm holding as last resort because i think that is BEST left to the the kernel itself to decide when to do the flushing

  • thanethane Member

    I'm not talking about flushing swap, I'm saying functionally disabled it. You can try it to test, or do nothing and just live with the downtimes I guess.

  • bustersgbustersg Member
    edited May 23

    i created a cron and output to a log */2 min

    $ grep '^Swap' /proc/meminfo
    SwapCached:            0 kB
    SwapTotal:       2097148 kB
    SwapFree:        2097148 kB
    

    It will only log before and after +-1hr for the backup cron.
    Hopefully it will show IF swap is the culprit or buffer/cache etc

  • rcy026rcy026 Member

    Check the timeouts for nodeexporter, sometimes it takes a while for it to gather all the data especially if you gather all of it since it can monitor a lot of things. On a heavily loaded system it might not complete in time and times out, leaving you with holes in the graphs.
    Setup a simple ping monitor to see if the vps is really offline or if it's just your data gathering that fails, you can use a free service like HetrixTools or similar.

  • GoodLeaf-CloudGoodLeaf-Cloud Member, Patron Provider

    Sounds like the backup job I would start there

  • bustersgbustersg Member

    Ok so i adjust the cron to start at 3am, and logged the swap usage bef and aft the cron
    As in image, the swap was NOT used at all.
    However, there is a JUMP in RAM Cache+Buffer (Mem Basic: dark blue color)
    That did not return or get release, If the cron run again tomorrow morning, these metric will hit the roof and knock my VPS to "offline" status.

    So, what is RAM Cache+Buffer in Grafana terms? Why my Debian 11 not auto free these?

    grafana-day1

  • bustersgbustersg Member

    Ok i manually ran it juz to crash my VPS.
    Below is the last state in mobaXterm before VPS went offline.
    I had to go Control Panel and boot up the VPS

    top

  • is this problem fixed? why are you deleting the full screenshot in the imgbb?

  • bustersgbustersg Member

    the backup script consists of a few tar and rsync of production sites.
    it was fine few weeks ago till recently hence my guess is the backup size grew > ram cache/buffer
    using the same backup script, i never had issues with 16gb ram VPS but i downsized to 8gb and this issue started after few months of stability.

    i could break the backup script into multiple scripts hoping that the VPS will auto free the cache/buffer in between but i decided to maintain as 1 large script.

    my solution: add below CRON: manual clear cache/buffer command under root user (10 min after the backup cron)

    20 5 * * * sh -c 'echo 1 > /proc/sys/vm/drop_caches' >> /root/drop_caches.log 2>&1

    does it work? yes, using the top command, i can see how the free mem drop to <500kb while the backup script ran and after the clear cache kicked in, the free mem went back up to 6000kb. of course, if i ran the backup script a 2nd time without clearing cache, the VPS will go offline and i have to boot from the client control panel.

    i did not test by disable swap @thane because i think swap could be useful in some cases and since this is a VPS, might not be a good idea to edit the /etc/fstab file.

  • yoyekyoyek Member

    I had similar issues, my Ubuntu VPS was crashing in a very weird way while doing backups with tar. There was completely nothing in the logs, it was just going Offline in service's panel, had to boot it manually.
    Tried a lot of different things, replacing tar with pigz and so on.
    Finally installed linux-crashdump (https://ubuntu.com/server/docs/kernel-crash-dump) to further debug.
    The strangest bit is that after installing it the system never crashed on the backup script.

  • bustersgbustersg Member

    @yoyek said:
    I had similar issues, my Ubuntu VPS was crashing in a very weird way while doing backups with tar. There was completely nothing in the logs, it was just going Offline in service's panel, had to boot it manually.
    Tried a lot of different things, replacing tar with pigz and so on.
    Finally installed linux-crashdump (https://ubuntu.com/server/docs/kernel-crash-dump) to further debug.
    The strangest bit is that after installing it the system never crashed on the backup script.

    interesting, im sure both of us hit something related to the kernel mem. im on debian which also relates to your ubuntu. in any case, i will keep your solution at the back of my head.

Sign In or Register to comment.