Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Providers or managers of large linux server fleets
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Providers or managers of large linux server fleets

SplitIceSplitIce Member, Host Rep

Q. How often do you see non fatal kernel OOPS'es, warnings or bug alerts. Do you monitor for them (netconsole, or other)?

Recently we have increased the number of managed servers quite significantly. On all systems we watch, log and triage.

One thing I've noticed is that the Linux kernel really isnt as defect free as one might hope.

Q. Do you find there is significant benifit in high patch number releases of LTS branches? Do you find them significantly more stable than say low (i.e <20) releases?

Comments

  • SplitIceSplitIce Member, Host Rep
    edited June 2021

    For those curious as to the spark.

    Just today I found a netconsole (or virtio?) bug. Non fatal, but a crash risk for sure (for example if an IRQ occurred during the op).

    [...]
    [194051.326140] ------------[ cut here ]------------
    [194051.326271] netpoll_send_skb_on_dev(): eth0 enabled interrupts in poll (start_xmit+0x0/0x4b0 [virtio_net])
    [194051.327739] WARNING: CPU: 0 PID: 9 at net/core/netpoll.c:351 netpoll_send_skb_on_dev+0x231/0x240
    [194051.327740] Modules linked in: [...]
    [194051.327810] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G           O      5.7.5+ #22
    [194051.327810] Hardware name: Vultr VC2, BIOS
    [194051.327811] RIP: 0010:netpoll_send_skb_on_dev+0x231/0x240
    [...]
    [194051.327838] Call Trace:
    [194051.327838]  netpoll_send_udp+0x2c4/0x3e6
    [194051.327839]  write_msg+0xda/0xf0 [netconsole]
    [194051.327839]  console_unlock+0x33b/0x4b0
    [194051.327839]  vprintk_emit+0x17d/0x270
    [194051.327840]  printk+0x58/0x6f
    [...]
    

    An unsafe printk (or in this case net_warn_ratelimited) is a scary idea.

  • @SplitIce said:
    One thing I've noticed is that the Linux kernel really isnt as defect free as one might hope.

    Correct, they're not that experienced or do very little with it to know where and how often it shits the bed.

Sign In or Register to comment.