intel Ice Lake and successors crypto totally f_cked up

jsg · January 2023

Since Gen 10 intel (and arm64 too it seems) don't guarantee "Data
Operand Independent Timing Mode" anymore. This is vital for most crypto code!

In simple terms that means that even primitive instructions (like e.g. and in particular XOR) don't work in known constant time but the timing depends on the operands (which might be direct, in one of the caches, or in DRAM). This basically breaks many crypto algorithms and is even more grave as the data in crypto algorithms are basically bound to be in specific locations (e.g. L1d).

I'll mention a very simple (but nevertheless critical) example to make it more clear.

Checking a password or similar one typically uses certain techniques that are (well, have been) constant time so that no information by an outside party (vulgo hacker) can be gained on e.g. the length of the checked byte array.
With intel's clusterf_ck - or intentional backdoor, depending on one's perspective - those algorithms are very much weakened now.

There are patches proposed but it's not yet clear whether they'll be in the kernel main line and if so when. As for Windows next to nothing is known about Microsofts plans (afaik). As for the BSD's I guess they'll react but when and how exactly isn't known yet (afaik).
As far as I know AMD processors are not concerned.

In my next post I'll copy the original post by the kernel developer (Eric Biggers) who discovered it and is proposing a patch.

jsg · January 2023

Here is E. Bigger's post:

_According to documentation that Intel published recently [1], Intel CPUs
based on the Ice Lake and later microarchitectures don't guarantee "data
operand independent timing" by default. I.e., instruction execution
times may depend on the values of data operated on. This is true for a
wide variety of instructions, including many instructions that are
heavily used in cryptography and have always been assumed to be
constant-time, e.g. additions, XORs, and even the AES-NI instructions.

Cryptography algorithms require constant-time instructions to prevent
side-channel attacks that recover cryptographic keys based on execution
times. Therefore, without this CPU vulnerability mitigated, it's
generally impossible to safely do cryptography on the latest Intel CPUs.

It's also plausible that this CPU vulnerability can expose privileged
kernel data to unprivileged userspace processes more generally.

To mitigate this CPU vulnerability, it's possible to enable "Data
Operand Independent Timing Mode" (DOITM) by setting a bit in a MSR.
While Intel's documentation suggests that this bit should only be set
where "necessary", that is highly impractical, given the fact that
cryptography can happen nearly anywhere in the kernel and userspace, and
the fact that the entire kernel likely needs to be protected anyway.

Therefore, let's simply enable DOITM globally by default to fix this
vulnerability. At most this gives up an "optimization" on the very
latest CPUs, restoring the correct behavior from previous CPUs.

Note: this patch does not address the separate but related vulnerability
of MXCSR Configuration Dependent Timing (MCDT) that the Intel document
describes. A separate patch would need to address that.

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/data-operand-independent-timing-isa-guidance.html
_

Levi · January 2023

Intel deepens its wounds. AMD doing all cocaine they can atm. Thought, I like Intel. Probably grew woth those cpu's. Only once owned an opteron. Still nostalgia on pentium 4 and diablo 2 LOD...

jsg · January 2023

@LTniger said:
Intel deepens its wounds. AMD doing all cocaine they can atm. Thought, I like Intel. Probably grew woth those cpu's. Only once owned an opteron. Still nostalgia on pentium 4 and diablo 2 LOD...

Yes, it seems that intel indeed lacks prudence since quite a while. I btw. am not in any way an enemy of intel but prefer AMD simply on practical and cost reasons. I still have and like a couple of intel boxes, incl. some SFFs with i5 6500T (a sweet spot in terms of performance, TDP and price).

But still, at the moment my advice would be to stay away from all newer (Gen 10++) processors.

TrK · January 2023

TL;DR: Intel fucked up big time with cryptography and all.

rattlecattle · January 2023

The Intel manuals states to use the LDMXCSR instruction to load 0x1fbf into MXCSR before doing any timing sensitive operation. I guess the workaround for now will be crypto libraries will detect the processor it is running on (CPUID ?) and do the needful.

Maounique · January 2023

@jsg said: I btw. am not in any way an enemy of intel but prefer AMD simply on practical and cost reasons.

I remember Durons... They were top notch for the money. After that, Intel leaped ahead again in the top range where lost its edge for a year or two, but in the lower range, the APUs and bobcats were still kings regarding power usage and bang for the buck.
The problems AMD had with the scale are no longer here today, Intel invested a lot in its fabs but still can't compete with the likes of ARM, for example.

As for the security stuff, that is the consequence of rushing changes without thoroughly testing and leaving time for the community to check for bugs.

There is no wonder many giants are considering ARM and the platform is increasingly popular even where Intel is still king. If AMD doesn't botch things badly, it can finally emerge as winner in the x86 races.

ralf · January 2023

How in the actual **** is it possible to make a XOR implementation that isn't constant time?

TimboJones · January 2023

@ralf said:
How in the actual **** is it possible to make a XOR implementation that isn't constant time?

Parallel processing? Returning when the work is done instead of waiting for the end. You know they need to squeeze out single digit improvements generation over generation and they're reaching diminishing returns in the grand scheme.

This isn't a bug and nothing is broken. It's just that they took the opinion to gain the performance of faster operations 24/7 and only reduce performance when needed. It's just a DEFAULT setting that defaults to disabled.

It's just difference of opinion between the security conscience and the performance conscience.

Chicken Little stuff.

jsg · January 2023

I just read the stupidest, most clueless moron post ever on LET ...

ralf · January 2023

@TimboJones said:

@ralf said:
How in the actual **** is it possible to make a XOR implementation that isn't constant time?

Parallel processing? Returning when the work is done instead of waiting for the end. You know they need to squeeze out single digit improvements generation over generation and they're reaching diminishing returns in the grand scheme.

The thing is, it would require the same amount of logic to do the XOR as it would to detect it's finished. I've also never seen anyone implement XOR any way other than parallel on the word size, because that would be pointless, and so there wouldn't be an opportunity to finish early.

Looking further, the claim made isn't supported by the actual article, which in the section called "Instructions That May Exhibit MCDT Behavior", the list only contains multiplies (expected) and population count instructions (reasonable).

Also it links to another page that lists all the instructions that are constant time, and they include all the simple arithmetic operations and all the XOR operations: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/resources/data-operand-independent-timing-instructions.html

So, I'm not sure what this E. Bigger chap is complaining about TBH.

jsg · January 2023

@ralf said:
The thing is, it would require the same amount of logic to do the XOR as it would to detect it's finished. I've also never seen anyone implement XOR any way other than parallel on the word size, because that would be pointless, and so there wouldn't be an opportunity to finish early.

Looking further, the claim made isn't supported by the actual article, which in the section called "Instructions That May Exhibit MCDT Behavior", the list only contains multiplies (expected) and population count instructions (reasonable).

Also it links to another page that lists all the instructions that are constant time, and they include all the simple arithmetic operations and all the XOR operations: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/resources/data-operand-independent-timing-instructions.html

So, I'm not sure what this E. Bigger chap is complaining about TBH.

BS. The routines we're talking about have virtually no performance gain due to that sh_tfuckery. XOR already is single cycle and the data worked on are in L1d if not even in a register.
"parallel"? Brillant explanation - because we all know that servers always handle just a single connection so e.g. checking a password in parallel is awfully smart ...

I had my reasons (and I'm not alone in that) to hint at a not at all innocent possible reason behind intel's clusterf_ck decision. I'll make it even more clear: intel got/will get tens of billions of $$ from the government for new fabs. It wouldn't surprise me (and not only me) a lot if the government asked for a "favour" in return ...

Turn it as you want, needing to set a flag/bit in order to keep certain (since decades, until now) constant time operations constant time is pure sh_tfuckery and idiocy. And btw, what do you think people care more about, their keys, passwords, etc being safe -or- saving maybe, accumulated, 1 clock cycle in 50 or some 100?

ralf · January 2023

@jsg said:

@ralf said:
The thing is, it would require the same amount of logic to do the XOR as it would to detect it's finished. I've also never seen anyone implement XOR any way other than parallel on the word size, because that would be pointless, and so there wouldn't be an opportunity to finish early.

Looking further, the claim made isn't supported by the actual article, which in the section called "Instructions That May Exhibit MCDT Behavior", the list only contains multiplies (expected) and population count instructions (reasonable).

Also it links to another page that lists all the instructions that are constant time, and they include all the simple arithmetic operations and all the XOR operations: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/resources/data-operand-independent-timing-instructions.html

So, I'm not sure what this E. Bigger chap is complaining about TBH.

BS. The routines we're talking about have virtually no performance gain due to that sh_tfuckery. XOR already is single cycle and the data worked on are in L1d if not even in a register.

So, you're agreeing with me. Well, except that XOR isn't necessarily single cycle on every CPU design, but it is at least constant time on every CPU architecture I've ever encountered.

"parallel"? Brillant explanation - because we all know that servers always handle just a single connection so e.g. checking a password in parallel is awfully smart ...

Eh? What has this got to do with how you'd implement an ISA?

I had my reasons (and I'm not alone in that) to hint at a not at all innocent possible reason behind intel's clusterf_ck decision. I'll make it even more clear: intel got/will get tens of billions of $$ from the government for new fabs. It wouldn't surprise me (and not only me) a lot if the government asked for a "favour" in return ...

And again, read the docs in the link I provided. They make it quite clear that the instructions that this guy are claiming aren't constant time are in fact constant time.

Turn it as you want, needing to set a flag/bit in order to keep certain (since decades, until now) constant time operations constant time is pure sh_tfuckery and idiocy. And btw, what do you think people care more about, their keys, passwords, etc being safe -or- saving maybe, accumulated, 1 clock cycle in 50 or some 100?

That flag, again according to the docs, is specifically for making multiplies take constant time. On most CPU architectures, multiplies are not constant time, and for general use where side channel attacks aren't a consideration, saving a few cycles on an instruction is a worthwhile goal. And it seems that Intel gives you a choice, so for applications where constant time is more useful, it's possible.

But yeah, bad Intel, government conspiracy, alien probes, yadda yadda.

Drv · January 2023

All crypto done in hardware is b a c k d o o r e d.
Plus, there is no unbreakable/safe encryption. And there will never be.

TimboJones · January 2023

@Drv said:
All crypto done in hardware is b a c k d o o r e d.
Plus, there is no unbreakable/safe encryption. And there will never be.

Not a math person, eh?

jsg · February 2023

@TimboJones said:

@Drv said:
All crypto done in hardware is b a c k d o o r e d.
Plus, there is no unbreakable/safe encryption. And there will never be.

Not a math person, eh?

Not an implementation person, eh?

jsg · February 2023

@ralf said:

So, I'm not sure what this E. Bigger chap is complaining about TBH.

I had my reasons (and I'm not alone in that) to hint at a not at all innocent possible reason behind intel's clusterf_ck decision. I'll make it even more clear: intel got/will get tens of billions of $$ from the government for new fabs. It wouldn't surprise me (and not only me) a lot if the government asked for a "favour" in return ...

And again, read the docs in the link I provided. They make it quite clear that the instructions that this guy are claiming aren't constant time are in fact constant time.

Turn it as you want, needing to set a flag/bit in order to keep certain (since decades, until now) constant time operations constant time is pure sh_tfuckery and idiocy. And btw, what do you think people care more about, their keys, passwords, etc being safe -or- saving maybe, accumulated, 1 clock cycle in 50 or some 100?

That flag, again according to the docs, is specifically for making multiplies take constant time. On most CPU architectures, multiplies are not constant time, and for general use where side channel attacks aren't a consideration, saving a few cycles on an instruction is a worthwhile goal. And it seems that Intel gives you a choice, so for applications where constant time is more useful, it's possible.

But yeah, bad Intel, government conspiracy, alien probes, yadda yadda.

No, a simple case of "I don't like that guy, so I'll take whatever I find first to paint him as clueless".
Unfortunately your attempt is a failure, in fact a double failure.

(a) quoting anything from the presumably bad party, intel in this case, isn't exactly smart. Of bloody course they try to make it look "oh, not really so bad".
(b) What you linked only shows one thing: you didn't grasp the problem, you confused "data independent" and "data operand independent". And as your intention wasn't to actually discuss constructively or to actually contribute anything of value but rather to paint me as clueless you even failed to notice that intel themselves linked to their data operand independent guide - literally in the first paragraph of what you linked.
And there, in the relevant document intel says, that

"This means that the DOIT mode may have a performance impact ...

This functionality is intended for use by software which has already applied other techniques to mitigate software timing side channels ..."

So it seems that the kernel developer Eric Biggers had it right while that ralf chap, (afaik) some guy at facebook, didn't even understand the problem.

ralf · February 2023

@jsg said:
No, a simple case of "I don't like that guy, so I'll take whatever I find first to paint him as clueless".

You're doing a great job of painting yourself that way TBH.

You have quoted some post (but not provided a link where it could be checked) that makes a claim, and in support of that claim has a link to something that states the complete opposite. From that you conclude that anybody who actually READ the link must be wrong.

And as your intention wasn't to actually discuss constructively or to actually contribute anything of value but rather to paint me as clueless

Wow, you really are a narcist. I was responding initially to the quoted text, and later to your idiotic reply to mine, where you were saying that what I said was BS whilst actually saying the same thing. Why are you making this about you rather than the details of the claimed vulnerability?

So it seems that the kernel developer Eric Biggers had it right while that ralf chap, (afaik) some guy at facebook, didn't even understand the problem.

It appears your doxxing skills are on a par with your reasoning skills.

jsg · February 2023

@ralf said:

@jsg said:
No, a simple case of "I don't like that guy, so I'll take whatever I find first to paint him as clueless".

You're doing a great job of painting yourself that way TBH.

You have quoted some post (but not provided a link where it could be checked) that makes a claim, and in support of that claim has a link to something that states the complete opposite. From that you conclude that anybody who actually READ the link must be wrong.

And as your intention wasn't to actually discuss constructively or to actually contribute anything of value but rather to paint me as clueless

Wow, you really are a narcist. I was responding initially to the quoted text, and later to your idiotic reply to mine, where you were saying that what I said was BS whilst actually saying the same thing. Why are you making this about you rather than the details of the claimed vulnerability?

So it seems that the kernel developer Eric Biggers had it right while that ralf chap, (afaik) some guy at facebook, didn't even understand the problem.

It appears your doxxing skills are on a par with your reasoning skills.

Thanks for your capitulation

Howdy, Stranger!

Categories

In this Discussion

intel Ice Lake and successors crypto totally f_cked up

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

intel Ice Lake and successors crypto totally f_cked up

Comments