Spectre and Meltdown - The what is my provider going to do about it? thread!

Zerpy · January 2018

@perennate said:
Ah, this is with RHEL? The RHEL patch seemed more comprehensive than the Debian one actually since Debian hasn't updated kernel package yet AFAIK. So that is strange...

Yeah the example above is from the RHEL (tested on actual Red Hat, CentOS and CloudLinux).

I've asked a bunch of other sysadms working with RHEL based systems, which see same behaviour - so.. either we're all updating things incorrectly - or there's yet to be a new microcode release.

Online.net which keeps the list up to date also have most of them in "Pending" because they're waiting.

@ramnet said:
Red Hat released microcode updates also.

Sure - but those microcodes ain't really fixing spectre, they're not enabling ibpb and ibrs which is required.

@ramnet said:
Linux has long had the ability to patch the microcode during OS bootup, unlike certain other OSes which require BIOS updates to do that.

Correct - but if the microcode is not there, you'll still have to reboot to get it applied when it's available

Neoon · January 2018

Even when they reboot, it takes from 5 min to 1h+, for what the fuck do they need so long?
GestionDBI,Virtmach & BandwagnHost needed a long period of time to boot up the servers back.

I just stay with dedis, to much hassle.

eva2000 · January 2018

Zerpy said: I've asked a bunch of other sysadms working with RHEL based systems, which see same behaviour - so.. either we're all updating things incorrectly - or there's yet to be a new microcode release.

Intel's press release says most of microcode updates coming next week AFAIK

Zerpy · January 2018

@Neoon said:
Even when they reboot, it takes from 5 min to 1h+, for what the fuck do they need so long?
GestionDBI is down since 80 minutes and counting.

I just stay with dedis, to much hassle.

Maybe people run fsck as well :-D

@eva2000 said:

Zerpy said: I've asked a bunch of other sysadms working with RHEL based systems, which see same behaviour - so.. either we're all updating things incorrectly - or there's yet to be a new microcode release.

Intel's press release says most of microcode updates coming next week AFAIK

Yeap, that's what I'm counting on as well - so we all just have to sit tight :-D

But the fact that people believe they're all 100% safe now is fake news

gestiondbi · January 2018

@Neoon said:
Even when they reboot, it takes from 5 min to 1h+, for what the fuck do they need so long?
GestionDBI is down since 80 minutes and counting.

I just stay with dedis, to much hassle.

D*uq. No nodes has been done for so long. I know you don't like us, but no need to bash and say bulls**t on forums...

Neoon · January 2018

@davidgestiondbi said:

@Neoon said:
Even when they reboot, it takes from 5 min to 1h+, for what the fuck do they need so long?
GestionDBI is down since 80 minutes and counting.

I just stay with dedis, to much hassle.

D*uq. No nodes has been done for so long. I know you don't like us, but no need to bash and say bulls**t on forums...

"Server Unity GestionDBI England went offline. Detected: 06.01.2018 19:50:10"

Just checked if my Monitoring went shit, I did not:

gestiondbi · January 2018

@Neoon said:

@davidgestiondbi said:

@Neoon said:
Even when they reboot, it takes from 5 min to 1h+, for what the fuck do they need so long?
GestionDBI is down since 80 minutes and counting.

I just stay with dedis, to much hassle.

D*uq. No nodes has been done for so long. I know you don't like us, but no need to bash and say bulls**t on forums...

"Server Unity GestionDBI England went offline. Detected: 06.01.2018 19:50:10"

Just checked if my Monitoring went shit, I did not:

Did you open a ticket? no
Did you try to login to SolusVM Portal? Probably not
Why you say we are down, WHEN it's your VPS that is down?

All nodes are up and running, excluding LAX-03 that is currently rebooting for the last 5 minutes.

Neoon · January 2018

@davidgestiondbi said:

Did you open a ticket? no

Did you try to login to SolusVM Portal? Probably not

Why you say we are down, WHEN it's your VPS that is down?

All nodes are up and running, excluding LAX-03 that is currently rebooting for the last 5 minutes.

So, you go, reboot the nodes, and check if they back up but you do not care if the costumer VM's are backup up? well ok then.

I had 10 restarts today, this question was asked in general, I did listed just one provider, that was maybe a bit unfair but I have updated it.

At least half of them, went down for about 1 hour, so I go for each of these and open a ticket? No. I do expect that a Provider brings up the VM's and I have not to login in each panel and reboot them by hand.

Everyone got it working, except gestionDBI.

WSS · January 2018

Oh, look, it's time for some @Neoon rage. Which project are you going to abandon now?

Neoon · January 2018

@WSS said:
Oh, look, it's time for some @Neoon rage. Which project are you going to abandon now?

Clouvider · January 2018

@Neoon said:
Even when they reboot, it takes from 5 min to 1h+, for what the fuck do they need so long?
GestionDBI,Virtmach & BandwagnHost needed a long period of time to boot up the servers back.

I just stay with dedis, to much hassle.

I guess to safely switch off all VMs?

Neoon · January 2018

@Clouvider said:

@Neoon said:
Even when they reboot, it takes from 5 min to 1h+, for what the fuck do they need so long?
GestionDBI,Virtmach & BandwagnHost needed a long period of time to boot up the servers back.

I just stay with dedis, to much hassle.

I guess to safely switch off all VMs?

How many containers do you need on a single node that you need 60min+ to reboot it?

WSS · January 2018

@Clouvider said:
I guess to safely switch off all VMs?

They're OVZ tho. If they're simfs, just reboot now and it'll work itself out if you've got a journaling filesystem.

Clouvider · January 2018

Neoon said: How many containers do you need on a single node that you need 60min+ to reboot it?

1 Windows KVM that refuses to budge. And then you have a choice. Downtime or potential loss of data?

Neoon · January 2018

@Clouvider said:

Neoon said: How many containers do you need on a single node that you need 60min+ to reboot it?

1 Windows KVM that refuses to budge. And then you have a choice. Downtime or potential loss of data?

Well I said containers, most of them where OVZ boxes.

WSS · January 2018

@Clouvider said:

Neoon said: How many containers do you need on a single node that you need 60min+ to reboot it?

1 Windows KVM that refuses to budge. And then you have a choice. Downtime or potential loss of data?

Take a snapshot, shutdown, restart using snapshot?

Clouvider · January 2018

@WSS said:

@Clouvider said:

Neoon said: How many containers do you need on a single node that you need 60min+ to reboot it?

1 Windows KVM that refuses to budge. And then you have a choice. Downtime or potential loss of data?

Take a snapshot, shutdown, restart using snapshot?

Always an option. Depends on how many of those stubborn VMs you have at scale, you may still get to these 60 minutes mentioned.

gestiondbi · January 2018

Funny fact, longest reboot time was MTL-02, with ~35min.

WSS · January 2018

@davidgestiondbi Well, this is an outrage - I could (probably) be done pooping by then!

DrNutella · January 2018

uh... ... ...... .. sudo (Holy shit that's half of it) apt-get (common sense part) update (omg that simple?) (hit enter) (only if I bought softlayer or theplanet!!)

Neoon · January 2018

@davidgestiondbi said:
Funny fact, longest reboot time was MTL-02, with ~35min.

Funny fact, the Smokeping just send emails to 1 address, I still got my VPS suspended for 20 monitored servers.

perennate · January 2018

cheapwebdev said: uh... ... ...... .. sudo (Holy shit that's half of it) apt-get (common sense part) update (omg that simple?) (hit enter) (only if I bought softlayer or theplanet!!)

Well, last I checked neither Debian nor Ubuntu have kernel updates yet, other updates (e.g. qemu) are likely missing too. Also apt-get update only fetches the package index heh.

Clouvider · January 2018

@Neoon said:

@davidgestiondbi said:
Funny fact, longest reboot time was MTL-02, with ~35min.

Funny fact, the Smokeping just send emails to 1 address, I still got my VPS suspended for 20 monitored servers.

You should really work on this rage mate. It’s not healthy...

Neoon · January 2018

@Clouvider said:

@Neoon said:

@davidgestiondbi said:
Funny fact, longest reboot time was MTL-02, with ~35min.

Funny fact, the Smokeping just send emails to 1 address, I still got my VPS suspended for 20 monitored servers.

You should really work on this rage mate. It’s not healthy...

I am calm, its fine, I just do like to bash sometimes at something, so you do like to OVH.

Maounique · January 2018

mfs said: Prometeus have issued alerts and/or reboots (Prometeus announced they'll they may retire the XenPower product completely)

We did plan for a long time to retire XenPower in the PV form and replace it with HVM, we thought this bug is a good opportunity, but it seems we may not be ready in time.

We are rebooting now the OVZ nodes one by one so it will take 24 hours, possibly more. Expected downtime for every node is below 30 minutes if everything goes well.
Some will take as low as 10 min, the e3 ones, largest some 20, up to 30. Containers may take longer to come up though in some cases, if you have been down for more tahn 30 minutes, please check the announcement and if nothing about your node, please open a ticket.
We do not expect problems, but this is not an exact science.

Mr_Tom · January 2018

One of my providers did live migrations of VMs and updated the hosts.

Eg, move running VMs to a new host live, patch/reboot other host, and circle machines while patching previous host.

I've not heard from DO/Vultr/Hosthatch/ZX about it to be honest - but at the same time I've not logged in to check either lol.

Maounique · January 2018

Mr_Tom said: One of my providers did live migrations of VMs and updated the hosts.

This can work and is a good opportunity to see how well this works in the event of a real node failure.
IWStack with SAN storage supports this, those nodes with local ssd storage do not, though. Since the first line of lack of defense is OVZ, though, we will do those first.

Mr_Tom · January 2018

@Maounique said:

Mr_Tom said: One of my providers did live migrations of VMs and updated the hosts.

This can work and is a good opportunity to see how well this works in the event of a real node failure.
IWStack with SAN storage supports this, those nodes with local ssd storage do not, though. Since the first line of lack of defense is OVZ, though, we will do those first.

Apparently one set of migrations went wrong and the VMs had to be rebooted. None of my services were affected, other than a slight loss of network to one VM for about 3 minutes.

Maounique · January 2018

Mr_Tom said: Apparently one set of migrations went wrong and the VMs had to be rebooted.

We have an iwstack node down atm, but is not with SAN storage, it is one of the SSD nodes. It is also unrelated, 2 of the disks died and we try to recover the data now.
You know that guy Murphy, I expect a lot of unrelated failures when the workload is the highest with planned stuff.

dergelbe · January 2018

Ramnode had a big reboot yesterday.

Howdy, Stranger!

Categories

In this Discussion

Spectre and Meltdown - The what is my provider going to do about it? thread!

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Spectre and Meltdown - The what is my provider going to do about it? thread!

Comments