Spectre and Meltdown are major design flaws in modern CPUs. While they’re present in almost all recent processors, because Intel chips are so widely used, Intel is taking most of the heat for these bugs. Nowhere has the criticism been hotter than on the Linux Kernel Mailing List (LKML). That’s because unlike Apple and Microsoft operating system developers and OEMS like Dell and HP, Linux programmers do their work in the open. But, when Linux and Intel developers aren’t arguing, they are making progress.
It didn’t start well. As Linux’s creator Linus Torvalds said on the LKML when news of the problems broke, “I think somebody inside of Intel needs to really take a long hard look at their CPUs, and actually admit that they have issues.” Later, Greg Kroah-Hartman, maintainer of the Linux stable branch, wrote that this is “a textbook example of how not to interact with the Linux kernel community properly”.
Then, things heated up again when, annoyed by new Intel suggested patches, Torvalds snarled, “Is Intel really planning on making this shit architectural? Anybody talked to them and told them they are f*cking insane?”
David Woodhouse, an Intel Linux kernel engineer, replied:
If the alternative was a two-decade product recall and giving everyone free CPUs, I’m not sure it was entirely insane.
Certainly it’s a nasty hack, but hey — the world was on fire and in the end we didn’t have to just turn the datacentres off and go back to goat farming, so it’s not all bad.
As a hack for existing CPUs, it’s just about tolerable — as long as it can die entirely by the next generation.
In the meantime, Intel’s attempts to fix these problems just above the chip’s hardware and below the operating system with microcode has come to nothing. First, Intel recommended people stop using its current firmware updates. Since then, Dell and HP pulled Intel’s buggy Meltdown and Spectre microcode fixes.
Torvalds hasn’t been impressed, conceding, “Intel actually seems to plan on doing the right thing for meltdown (the main question being _when_). Which is not a huge surprise, since it should be easy to fix, and it’s a really honking big hole to drive through. Not doing the right thing for meltdown would be completely unacceptable.” But, he continued, “Intel is _not_ planning on doing the right thing for the indirect branch speculation. Honestly, that’s completely unacceptable.”
And, besides, “As it is, the patches are COMPLETE AND UTTER GARBAGE.” You can always count on Torvalds to call them the way he sees them.
But, Woodhouse replied that while it’s a “nasty hack in the short term I could live with [it].”
In a later message, Woodhouse continued, “I think we’ve covered the technical part of this now, not that you like it — not that any of us *like* it.” He then explained the logic behind these “garbage” patches.
This is all about Spectre variant 2 [CVE-2017-5715], where the CPU can be tricked into mispredicting the target of an indirect branch. And I’m specifically looking at what we can do on *current* hardware, where we’re limited to the hacks they can manage to add in the microcode.
The new microcode from Intel and AMD adds three new features.
One new feature (IBPB) is a complete barrier for branch prediction. After frobbing this, no branch targets learned earlier are going to be used. It’s kind of expensive (order of magnitude ~4000 cycles).
The second (STIBP) protects a hyperthread sibling from following branch predictions which were learned on another sibling. You *might* want this when running unrelated processes in userspace, for example. Or different VM guests running on HT siblings.
The third feature (IBRS) is more complicated. It’s designed to be set when you enter a more privileged execution mode (i.e. the kernel). It prevents branch targets learned in a less-privileged execution mode, BEFORE IT WAS MOST RECENTLY SET, from taking effect. But it’s not just a ‘set-and-forget’ feature, it also has barrier-like semantics and needs to be set on *each* entry into the kernel (from userspace or a VM guest). It’s *also* expensive. And a vile hack, but for a while it was the only option we had.
Besides being really messy, the shortcoming with all these patches is they drastically slow down processes. Google’s Retpoline patch is a “massive performance win”, Woodhouse admits. Retpoline works by blocking all processors’ indirect branch predictions, which is where Spectre lives.
But, Woodhouse continued, “Not everyone has a retpoline compiler yet” and the Intel “Skylake, and that generation of CPU cores,” which would still be vulnerable. The “IBRS solution, ugly though it is, did address that”. As it is, using only Retpoline “opens a *little* bit of a security hole”.
The work continues on a way to avoid “garbage” patches, while still keeping Intel Skylake — Intel’s sixth generation processor family — safe. Ingo Molnar, a Red Hat Linux kernel developer, has suggested a method, which appears to keep Skylake safe from Spectre.
Something has to be done. These holes enable hackers to get around system protections on almost all PCs, servers, and smartphones. So far, knock on silicon, no one’s managed to exploit them. But it’s only a matter of time. In the meantime, the fixes to date all slow down systems.
As the Linux discussions and Intel microcode news show, we’re still a long, long way from a complete fix.
Finally, just because we know what’s going on with Linux, doesn’t mean that macOS and Windows aren’t facing the exact same problems. They are. We’re not just hearing about them.