Dedicated Server - Hardware Monitoring ?

nvidian · April 2018

Hello all,

I used to be a VPS/Cloud users but now have one unmanaged dedicated server - pretty old XEON:

Xeon L5640 @ 2.27GHz (Westmere)
RAM 48Gb ECC
2x SSD Samsung MZHPU256HC (maybe M2 SSD) - RAID 1
4x HDD 3TB Hitachi HUS724030ALE641 - RAID 10

I installed Proxmox and split it into couple of KVM and its works very well

What worried me most is that I dont have much knowledge on hardware issue. The provider doesnt provide hardware health monitoring on their panel. Again, forgive my ignorance, I'm just a VPS user before, so I never have learn how to maintain/monitor my hardware.

My questions:

1. What should I do to regular check my server to ensure my server in healthy condition ? (Hardware Monitoring and Troubleshooting)

2. Is there any dedicated server provider who provide easy hardware monitoring (auto notification if something went wrong, hardware health information etc) ? - I would prefer low end provider if possible

Thank you guys

BlaZe · April 2018

Been using Pinguzo.com and it works like a charm. Its free in beta.

I have more than 30+ servers there being monitored for LOAD, RAM, CPU, HDD, Network usage.

YokedEgg · April 2018

A lot of hardware failure is random. Certain things like drives are easier to predict. That's the appeal of cloud over bare metal, in a worst case scenario transferring a kvm on Proxmox could be done by a 5th grader in about 3 minutes.

Gamma17 · April 2018

Hardware is still provider's responsibility, if anything dies and server goes offline you just create ticket and wait. I doubt low-end servers have any redundancy, so just monitoring that server is online will be enough.

The only thing that is IMO worth doing is setting up smartd/mdmon (or whatever controller tool in case of HW raid) to send emails, and as usual - have offsite backups for the case when something goes really, really wrong.

Clouvider · April 2018

@Gamma17 said:
Hardware is still provider's responsibility, if anything dies and server goes offline you just create ticket and wait. I doubt low-end servers have any redundancy, so just monitoring that server is online will be enough.

By definition, bare metal is a bare metal. You don’t get any redundancy, you’re free to build your own or pay for a solution to create a resilient environment for you.

The only thing that is IMO worth doing is setting up smartd/mdmon (or whatever controller tool in case of HW raid) to send emails, and as usual - have offsite backups for the case when something goes really, really wrong.

Smart monitoring + email is the way to go. Looking through the messages/dmesg log from time to time wouldn’t hurt either.

Gamma17 · April 2018

@Clouvider said:

By definition, bare metal is a bare metal. You don’t get any redundancy, you’re free to build your own or pay for a solution to create a resilient environment for you.

What i meant are things like redundant PSU-s, redundant fans etc, which may be worth monitoring just to make sure provider does not miss it and let it run degraded and then fail.
But i highly doubt anything called "low-end server" will have this.

Neoon · April 2018

My KS1, from 2013, had 2-3 network issues in 5 years, no hardware failure, nothing.
If you put your stuff on a dedi, you be fine with backups.

If something fails, you have a downtime for a few hours in 2-5 years, restore the backup and you are fine.

When this crap is not mission critical, its fine.

If you need something mission critical, so you gonna loose billions of dullahs, do not take a single machine. Simple.

I had 3 hardware failures in about 5 years on my dedis, I have about 10 dedis.

To Monitor I would recommend: https://github.com/firehol/netdata

if the server is dead, they check it, you cannot do anything.

Clouvider · April 2018

@Gamma17 said:

@Clouvider said:

By definition, bare metal is a bare metal. You don’t get any redundancy, you’re free to build your own or pay for a solution to create a resilient environment for you.

What i meant are things like redundant PSU-s, redundant fans etc, which may be worth monitoring just to make sure provider does not miss it and let it run degraded and then fail.
But i highly doubt anything called "low-end server" will have this.

The 'low-end' E3-1270V5/V6 we sell on here have both N+1 PSU and N+1 Fans so they are quite resilient for the standard.

I presume the ancient delimiter E55XX blades also to have similar resiliency levels on PSUs and fans (what cannot be said about their DC, however...), I don't think HP was selling them without N+1 config back then.

So always worth checking with the supplier.

And this reminds me, we should add it to our marketing

quadhost · April 2018

@Clouvider said:

Smart monitoring + email is the way to go. Looking through the messages/dmesg log from time to time wouldn’t hurt either.

This.

Zabbix is also useful for tracking historical stats, setting notification for x alert and load/usage information.

Gamma17 · April 2018

@Clouvider said:

The 'low-end' E3-1270V5/V6 we sell on here have both N+1 PSU and N+1 Fans so they are quite resilient for the standard.

I presume the ancient delimiter E55XX blades also to have similar resiliency levels on PSUs and fans (what cannot be said about their DC, however...), I don't think HP was selling them without N+1 config back then.

So always worth checking with the supplier.

And this reminds me, we should add it to our marketing

That's good to know.

And probably true for most blade/"cloud"/whatever systems when there is single chassis, which provides all the power, cooling, etc to all the systems inserted. But in this case customer also has no access to this chassis and no way to monitor PSUs and fans. And i bet it is monitored and fixed as needed without customer ever noticing a thing...

What i was thinking about are cheap 1-2U servers, which often come with single PSU (with redundancy/hotplug being optional) and desktop-grade systems, which are used in a lot of cheap offers. And what made me think about 1-2U server are 4 3.5 HDD-s + 2 SSD-s in OP...

Clouvider · April 2018

In the MicroCloud or MicroBlade products the hardware statuses for shared components are fed directly to each blade so one can still watch it if they like, they also get IPMI access that gives them access to this diagnostic information in a clear and easy form, but I agree, good operators would monitor these chassis remotely and likely respond before the Customer ever noticing.

andiklive · April 2018

@BlaZe said:
Been using Pinguzo.com and it works like a charm. Its free in beta.

I have more than 30+ servers there being monitored for LOAD, RAM, CPU, HDD, Network usage.

see long time ago, but never get chance to try it. another softaculous product.

BlaZe · April 2018

@andiklive said:
see long time ago, but never get chance to try it. another softaculous product.

Yes and its getting better day by day. I had some issues with it earlier but they fixed it.

jetchirag · April 2018

@BlaZe said:

@andiklive said:
see long time ago, but never get chance to try it. another softaculous product.

Yes and its getting better day by day. I had some issues with it earlier but they fixed it.

I really expect for it's cost to be affordable for small vps'

wavecomas · April 2018

Most cheaper and easier option is snmp monitoring

Just install snmp agent to host and your vm´s
Most of brand servers have additional snmp agents and/or remote management like HPE ILO Dell Idrac. They will allow to get more detailed info about hardware.

Make another vm and install some free monitoring software. All gpl licenced are demanding some knowledge, excperience and most important - time. There is really nice piece software called Manage engine opmanager. Its free for 10 devices, setup is really easy and documentation good for beginner.

So after that you are able to see and monitor bretty much everything what is going with your host hardware and vm and host os and applications.You can set also very different notifications..

With second question - IMO there are lot of providers who giving access to server remote management like ILO , iDrac etc. With better brands there are notification options.
But such options are really basic comparing monitoring system. And notifications are quite new feature with most brand, so i guess not present your server.

nvidian · April 2018

I'm sorry for late respond. Really appreciate the helps from you guys

@BlaZe said:
Been using Pinguzo.com and it works like a charm. Its free in beta.

I have more than 30+ servers there being monitored for LOAD, RAM, CPU, HDD, Network usage.

Thank you.

@wavecomas said:
Most cheaper and easier option is snmp monitoring

Just install snmp agent to host and your vm´s
Most of brand servers have additional snmp agents and/or remote management like HPE ILO Dell Idrac. They will allow to get more detailed info about hardware.

Thank you, I think I will try snmp monitoring first

With second question - IMO there are lot of providers who giving access to server remote management like ILO , iDrac etc. With better brands there are notification options.
But such options are really basic comparing monitoring system. And notifications are quite new feature with most brand, so i guess not present your server.

@Neoon said:
To Monitor I would recommend: https://github.com/firehol/netdata

if the server is dead, they check it, you cannot do anything.

Thank you for your suggestion. I will surely try snmp, netdata and pinguzo

@Clouvider said:
In the MicroCloud or MicroBlade products the hardware statuses for shared components are fed directly to each blade so one can still watch it if they like, they also get IPMI access that gives them access to this diagnostic information in a clear and easy form, but I agree, good operators would monitor these chassis remotely and likely respond before the Customer ever noticing.

Its a supermicro server - already set "Warning and Above" alert level to my email. Do you think its sufficient ?

Howdy, Stranger!

Categories

In this Discussion

Dedicated Server - Hardware Monitoring ?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Dedicated Server - Hardware Monitoring ?

Comments