Providers, how do you audit bare metal security?

ValdikSS · January 24

Some dedicated server platforms could be reflashed to include malicious payload put into UEFI, RAID card or network card OPROM, BMC firmware.

This means that one client could rent the server, install some kind of a backdoor, cancel it, and the next client will get the backdoored server.

I know that major providers both isolate the access to the sensitive functions (like firmware update) where possible (BMC), and reset&reflash the UEFI on each provision, however there's no details on the exact procedure available publicly from anyone. The only document I've found is from OpenStack Ironic, but it includes a wide variety of cleaning functions and it's not clear what is really widely used.

Smaller providers, how do you validate server security in general, and how do you handle server cleaning before reprovisioning?

ehhthing · January 24

AWS does this by having fully custom firmware with their own root of trust, I suspect any other major cloud provider does the same.

If you're not that rich and have to buy your servers from a normal vendor, no solution will work for every hardware platform. There are often undocumented ways to flash system firmware, especially BIOS/UEFI-related stuff.

I'm not huge into hardware security, but I do believe that most UEFI systems have a root of trust that you can configure so that only signed firmware can be flashed. This extends to most hardware components that have flashable firmware, I believe. I'm sure there are still ways to infect a dedicated server, especially if you're MSI and have your UEFI keys leaked.

For smaller providers, I don't imagine this is something they deal with. I know of at least one enterprise dedicated server provider that doesn't do any kind of auditing of this...

nikio · January 24

Just a few months ago there was a thread about a LET provider giving out servers without even wiping the disks. And you're asking about UEFI chain of trust? Ha!

The way most Low End providers ensure hardware security is by having non-functioning IPMI/KVM functionality, followed closely by using firmware that's so old it cannot boot in the first place!

I am not even joking. A few weeks ago I ran into an issue with a 13-year-old intel box that wouldn't boot in UEFI mode after 2021 because of a known bug. Had to set the hardware clock back to 2020 to get it to work; otherwise it just showed 'no signal' after selecting the boot medium.

That aside, BMCs tend to have audit logs. And the communication tends to be over unencrypted HTTP, so the provider can monitor those logs or the traffic you're sending to see what you're doing. But I highly doubt that UEFI backdoors are a threat model anyone is seriously considering (or exploiting).

forest · January 24

Audit with ChipSec, enforce integrity with SRTM. Assuming a read-only CRTM (which is the case for most systems) and no physical access, you can't get past SRTM. It's fragile to configuration changes, though.

Levi · January 24

Just... host everything from your basement.

ValdikSS · January 24

@forest said: Audit with ChipSec, enforce integrity with SRTM

Well, that would allow you to detect that something has changed if your server goes down and then up, and the values don't match, but if you already received the backdoored server, you most probably won't be able to detect this — it's very rare for the manufacturers to publish golden PCR values.

It's useful if you order multiple servers of exactly the same model though, assuming not all of them are backdoored. This way you can assume golden values.

ValdikSS · January 24

@ehhthing said: I'm not huge into hardware security, but I do believe that most UEFI systems have a root of trust that you can configure so that only signed firmware can be flashed. This extends to most hardware components that have flashable firmware, I believe.

Unfortunately, many servers could be downgraded to the vulnerable firmware which lacks some FW checks, but still signed with the same keys and don't have anti-downgrade protection.

I recommend watching this video, that's the only decent source of information I'm aware of.

forest · January 24

@ValdikSS said:

@forest said: Audit with ChipSec, enforce integrity with SRTM

Well, that would allow you to detect that something has changed if your server goes down and then up, and the values don't match, but if you already received the backdoored server, you most probably won't be able to detect this — it's very rare for the manufacturers to publish golden PCR values.

It's useful if you order multiple servers of exactly the same model though, assuming not all of them are backdoored. This way you can assume golden values.

The manufacturers don't need to publish anything, only the provider through TPM 2.0's Direct Anonymous Attestation (DAA) functionality. Remote attestation can be done without needing the manufacturer to publish anything special.

ValdikSS · January 24

@forest said: TPM 2.0's Direct Anonymous Attestation

Never heard of it, thanks.

Kodomu · January 25

For anything newly deployed, we're only using our own hardware where we flash a fresh BMC/UEFI firmware before anything gets installed on them. There's just too many issues with renting or leasing other people's equipment, for us it's now just handful of legacy customers that have been with us for years that we are slowly moving off.

As for if we ever did dedicated servers, we'd probably want to do similar between customers alongside zeroing/secure erasing all disks, and otherwise lock down the system as much as possible. Modern hardware from better vendors is thankfully more capable of this than hardware from lower tier, cheaper vendors. Still, I think dedicated servers are a lot more trouble than they are worth unless you have spare capacity you need to sell off in bulk, both in high capital input to low return, and for management reasons.

You can't really stop customers breaking the hardware in any way if you give them full control, but you can at least usually reset it back after they're done with it.

forest · January 25

@Kodomu said: You can't really stop customers breaking the hardware in any way if you give them full control, but you can at least usually reset it back after they're done with it.

I suspect there are some ways to cause harm that is very difficult to reset back. While you might set up a master ATA passphrase for storage devices so that you can't get locked out (although honestly, how many providers here actually do that?), a customer could still use ATA DOWNLOAD MICROCODE to brick the device, or mess with DCO to cause troubles. And I suspect repeatedly writing to UEFI variables or /dev/nvram would wear out the low-endurance flash used for non-volatile storage. And if the system has a dGPU, there are countless ways to permanently fry it, especially if pre-Ampere.

Oh, and you know those rowhammer tests? Those are actually damaging if run too long, and you could fairly easily modify a double-sided rowhammer test to cause permanent damage to memory modules after a matter of weeks.

It's not too hard for a provider to keep malicious code out of the dedis, but anyone knowledgeable and dedicated is going to be able to cause physical damage if they really want to.

Kodomu · January 25

@forest said:

@Kodomu said: You can't really stop customers breaking the hardware in any way if you give them full control, but you can at least usually reset it back after they're done with it.

I suspect there are some ways to cause harm that is very difficult to reset back.

Definitely, which is one of the reasons we're not planning to do them any time soon, it's just not worth it. It's just usually not really a problem, I've never really heard of anyone trying to damage hardware on a dedicated server, I guess a malicious competitor might try to do it, but at that point it could go to a lawsuit if they are caught, and you could weed it out (and other kinds of abuse) by forcing a big upfront commitment to make it less worth doing, like minimum 12 months. We'd probably do exactly this if we did do dedis.

But mainly it's because you're putting a few thousand dollars into buying, shipping and commissioning a node, and it can take 2-3++ years to pay off if you charge market rate, whereas running VMs on the same node might get well under a 6-12 month payback and you can then sooner use the capital you tied up in the server for something else. Other, larger hosts, can be more competitive on price at scale than small hosts anyway.

bozolover99 · January 26

I always secure my backdoor

meaton · January 26

Shakib · January 26

Flush the latest BIOS and Firmware every time I build a server.

TimboJones · January 26

@Kodomu said:
Definitely, which is one of the reasons we're not planning to do them any time soon, it's just not worth it.

I think a hacker of a certain technical skill is going to spend their time on targeting high value targets and would need to know what customer would use it next. It's not impossible, but it's pretty low risk unless the IPMI was exposed to the internet and it was found purely from lucky scanning.

forest · January 26

@Shakib said:
Flush the latest BIOS and Firmware every time I build a server.

That's not sufficient. Anyone who is capable of compromising the BIOS is likely to be able to do the same to HDD/SSD/NVMe firmware, to the GSP on the GPU, etc. And if you compromise the GSP, then it can perform DMA attacks before the system firmware has initialized the IOMMU (although there are ways around it, but that wouldn't be enough due to the possibility of just attacking the driver from the GSP).

Shakib · January 26

@forest said:

@Shakib said:
Flush the latest BIOS and Firmware every time I build a server.

That's not sufficient. Anyone who is capable of compromising the BIOS is likely to be able to do the same to HDD/SSD/NVMe firmware, to the GSP on the GPU, etc. And if you compromise the GSP, then it can perform DMA attacks before the system firmware has initialized the IOMMU (although there are ways around it, but that wouldn't be enough due to the possibility of just attacking the driver from the GSP).

I use brand new servers only since 2021.

Howdy, Stranger!

Categories

In this Discussion

Providers, how do you audit bare metal security?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Providers, how do you audit bare metal security?

Comments