Kaiser security holes will devastate Intel’s marketshare

Analysis: This one tips the balance toward AMD in a big way

Intel LogoThis latest decade-long critical security hole in Intel CPUs is going to cost the company significant market share. SemiAccurate thinks it is not only consequential but will shift the balance of power away from Intel CPUs for at least the next several years.

Today’s latest crop of gaping security flaws have three sets of holes across Intel, AMD, and ARM processors along with a slew of official statements and detailed analyses. On top of that the statements from vendors range from detailed and direct to intentionally misleading and slimy. Lets take a look at what the problems are, who they effect and what the outcome will be. Those outcomes range from trivial patching to destroying the market share of Intel servers, and no we are not joking.

(Authors Note 1: For the technical readers we are simplifying a lot, sorry we know this hurts. The full disclosure docs are linked, read them for the details.)

(Authors Note 2: For the financial oriented subscribers out there, the parts relevant to you are at the very end, the section is titled Rubber Meet Road.)

The Problem(s):

As we said earlier there are three distinct security flaws that all fall somewhat under the same umbrella. All are ‘new’ in the sense that the class of attacks hasn’t been publicly described before, and all are very obscure CPU speculative execution and timing related problems. The extent the fixes affect differing architectures also ranges from minor to near-crippling slowdowns. Worse yet is that all three flaws aren’t bugs or errors, they exploit correct CPU behavior to allow the systems to be hacked.

The three problems are cleverly labeled Variant One, Variant Two, and Variant Three. Google Project Zero was the original discoverer of them and has labeled the classes as Bounds Bypass Check, Branch Target Injection, and Rogue Data Cache Load respectively. You can read up on the extensive and gory details here if you wish.

If you are the TLDR type the very simplified summary is that modern CPUs will speculatively execute operations ahead of the one they are currently running. Some architectures will allow these executions to start even when they violate privilege levels, but those instructions are killed or rolled back hopefully before they actually complete running.

Another feature of modern CPUs is virtual memory which can allow memory from two or more processes to occupy the same physical page. This is a good thing because if you have memory from the kernel and a bit of user code in the same physical page but different virtual pages, changing from kernel to userspace execution doesn’t require a page fault. This saves massive amounts of time and overhead giving modern CPUs a huge speed boost. (For the really technical out there, I know you are cringing at this simplification, sorry).

These two things together allow you to do some interesting things and along with timing attacks add new weapons to your hacking arsenal. If you have code executing on one side of a virtual memory page boundary, it can speculatively execute the next few instructions on the physical page that cross the virtual page boundary. This isn’t a big deal unless the two virtual pages are mapped to processes that are from different users or different privilege levels. Then you have a problem. (Again painfully simplified and liberties taken with the explanation, read the Google paper for the full detail.)

This speculative execution allows you to get a few short (low latency) instructions in before the speculation ends. Under certain circumstances you can read memory from different threads or privilege levels, write those things somewhere, and figure out what addresses other bits of code are using. The latter bit has the nasty effect of potentially blowing through address space randomization defenses which are a keystone of modern security efforts. It is ugly.

Who Gets Hit:

So we have three attack vectors and three affected companies, Intel, AMD, and ARM. Each has a different set of vulnerabilities to the different attacks due to differences in underlying architectures. AMD put out a pretty clear statement of what is affected, ARM put out by far the best and most comprehensive description, and Intel obfuscated, denied, blamed others, and downplayed the problem. If this was a contest for misleading with doublespeak and misdirection, Intel won with a gold star, the others weren’t even in the game. Lets look at who said what and why.

ARM:

ARM has a page up listing vulnerable processor cores, descriptions of the attacks, and plenty of links to more information. They also put up a very comprehensive white paper that rivals Google’s original writeup, complete with code examples and a new 3a variant. You can find it here. Just for completeness we are putting up ARM’s excellent table of affected processors, enjoy.

ARM Kaiser core table

Affected ARM cores

AMD:

AMD gave us the following table which lays out their position pretty clearly. The short version is that architecturally speaking they are vulnerable to 1 and 2 but three is not possible due to microarchitecture. More on this in a bit, it is very important. AMD also went on to describe some of the issues and mitigations to SemiAccurate, but again, more in a bit.

AMD Kaiser response Matrix

AMD’s response matrix

Intel:

Intel is continuing to be the running joke of the industry as far as messaging is concerned. Their statement is a pretty awe-inspiring example of saying nothing while desperately trying to minimize the problem. You can find it here but it contains zero useful information. SemiAccurate is getting tired of saying this but Intel should be ashamed of how their messaging is done, not saying anything would do less damage than their current course of action.

You will notice the line in the second paragraph, “Recent reports that these exploits are caused by a “bug” or a “flaw” and are unique to Intel products are incorrect.” This is technically true and pretty damning. They are directly saying that the problem is not a bug but is due to misuse of correct processor behavior. This a a critical problem because it can’t be ‘patched’ or ‘updated’ like a bug or flaw without breaking the CPU. In short you can’t fix it, and this will be important later. Intel mentions this but others don’t for a good reason, again later.

Then Intel goes on to say, “Intel is committed to the industry best practice of responsible disclosure of potential security issues, which is why Intel and other vendors had planned to disclose this issue next week when more software and firmware updates will be available. However, Intel is making this statement today because of the current inaccurate media reports.” This is simply not true, or at least the part about industry best practices of responsible disclosure. Intel sat on the last critical security flaw affecting 10+ years of CPUs which SemiAccurate exclusively disclosed for 6+ weeks after a patch was released. Why? PR reasons.

SemiAccurate feels that Intel holding back knowledge of what we believe were flaws being actively exploited in the field even though there were simple mitigation steps available is not responsible. Or best practices. Or ethical. Or anything even intoning goodness. It is simply unethical, but only that good if you are feeling kind. Intel does not do the right thing for security breaches and has not even attempted to do so in the 15+ years this reporter has been tracking them on the topic. They are by far the worst major company in this regard, and getting worse.

Mitigation:

As is described by Google, ARM, and AMD, but not Intel, there are workarounds for the three new vulnerabilities. Since Google first discovered these holes in June, 2017, there have been patches pushed up to various Linux kernel and related repositories. The first one SemiAccurate can find was dated October 2017 and the industry coordinated announcement was set for Monday, January 9, 2018 so you can be pretty sure that the patches are in place and ready to be pushed out if not on your systems already. Microsoft and Apple are said to be at a similar state of readiness too. In short by the time you read this, it will likely be fixed.

That said the fixes do have consequences, and all are heavily workload dependent. For variants 1 and 2 the performance hit is pretty minor with reports of ~1% performance hits under certain circumstances but for the most part you won’t notice anything if you patch, and you should patch. Basically 1 and 2 are irrelevant from any performance perspective as long as your system is patched.

The big problem is with variant 3 which ARM claims has a similar effect on devices like phones or tablets, IE low single digit performance hits if that. Given the way ARM CPUs are used in the majority of devices, they don’t tend to have the multi-user, multi-tenant, heavily virtualized workloads that servers do. For the few ARM cores that are affected, their users will see a minor, likely unnoticeable performance hit when patched.

User x86 systems will likely be closer to the ARM model for performance hits. Why? Because while they can run heavily virtualized, multi-user, multi-tenant workloads, most desktop users don’t. Even if they do, it is pretty rare that these users are CPU bound for performance, memory and storage bandwidth will hammer performance on these workloads long before the CPU becomes a bottleneck. Why do we bring this up?

Because in those heavily virtualized, multi-tenant, multi-user workloads that most servers run in the modern world, the patches for 3 are painful. How painful? SemiAccurate’s research has found reports of between 5-50% slowdowns, again workload and software dependent, with the average being around 30%. This stands to reason because the fixes we have found essentially force a demapping of kernel code on a context switch.

The Pain:

This may sound like techno-babble but it isn’t, and it happens a many thousands of times a second on modern machines if not more. Because as Intel pointed out, the CPU is operating correctly and the exploit uses correct behavior, it can’t be patched or ‘fixed’ without breaking the CPU itself. Instead what you have to do is make sure the circumstances that can be exploited don’t happen. Consider this a software workaround or avoidance mechanism, not a patch or bug fix, the underlying problem is still there and exploitable, there is just nothing to exploit.

Since the root cause of 3 is a mechanism that results in a huge performance benefit by not having to take a few thousand or perhaps millions page faults a second, at the very least you now have to take the hit of those page faults. Worse yet the fix, from what SemiAccurate has gathered so far, has to unload the kernel pages from virtual memory maps on a context switch. So with the patch not only do you have to take the hit you previously avoided, but you have to also do a lot of work copying/scrubbing virtual memory every time you do. This explains the hit of ~1/3rd of your total CPU performance quite nicely.

Going back to user x86 machines and ARM devices, they aren’t doing nearly as many context switches as the servers are but likely have to do the same work when doing a switch. In short if you do a theoretical 5% of the switches, you take 5% of that 30% hit. It isn’t this simple but you get the idea, it is unlikely to cripple a consumer desktop PC or phone but will probably cripple a server. Workload dependent, we meant it.

The Knife Goes In:

So x86 servers are in deep trouble, what was doable on two racks of machines now needs three if you apply the patch for 3. If not, well customers have lawyers, will you risk it? Worse yet would you buy cloud services from someone who didn’t apply the patch? Think about this for the economics of the megadatacenters, if you are buying 100K+ servers a month, you now need closer to 150K, not a trivial added outlay for even the big guys.

But there is one big caveat and it comes down to the part we said we would get to later. Later is now. Go back and look at that AMD chart near the top of the article, specifically their vulnerability for Variant 3 attacks. Note the bit about, “Zero AMD vulnerability or risk because of AMD architecture differences.” See an issue here?

What AMD didn’t spell out in detail is a minor difference in microarchitecture between Intel and AMD CPUs. When a CPU speculatively executes and crosses a privilege level boundary, any idiot would probably say that the CPU should see this crossing and not execute the following instructions that are out of it’s privilege level. This isn’t rocket science, just basic common sense.

AMD’s microarchitecture sees this privilege level change and throws the microarchitectural equivalent of a hissy fit and doesn’t execute the code. Common sense wins out. Intel’s implementation does execute the following code across privilege levels which sounds on the surface like a bit of a face-palm implementation but it really isn’t.

What saves Intel is that the speculative execution goes on but, to the best of our knowledge, is unwound when the privilege level changes a few instructions later. Since Intel CPUs in the wild don’t crash or violate privilege levels, it looks like that mechanism works properly in practice. What these new exploits do is slip a few very short instructions in that can read data from the other user or privilege level before the context change happens. If crafted correctly the instructions are unwound but the data can be stashed in a place that is persistent.

Intel probably get a slight performance gain from doing this ‘sloppy’ method but AMD seems to have have done the right thing for the right reasons. That extra bounds check probably take a bit of time but in retrospect, doing the right thing was worth it. Since both are fundamental ‘correct’ behaviors for their respective microarchitectures, there is no possible fix, just code that avoids scenarios where it can be abused.

For Intel this avoidance comes with a 30% performance hit on server type workloads, less on desktop workloads. For AMD the problem was avoided by design and the performance hit is zero. Doing the right thing for the right reasons even if it is marginally slower seems to have paid off in this circumstance. Mother was right, AMD listened, Intel didn’t.

Weasel Words:

Now you have a bit more context about why Intel’s response was, well, a non-response. They blamed others, correctly, for having the same problem but their blanket statement avoided the obvious issue of the others aren’t crippled by the effects of the patches like Intel. Intel screwed up, badly, and are facing a 30% performance hit going forward for it. AMD did right and are probably breaking out the champagne at HQ about now.

Intel also tried to deflect lawyers by saying they follow industry best practices. They don’t and the AMT hole was a shining example of them putting PR above customer security. Similarly their sitting on the fix for the TXT flaw for *THREE*YEARS* because they didn’t want to admit to architectural security blunders and reveal publicly embarrassing policies until forced to disclose by a governmental agency being exploited by a foreign power is another example that shines a harsh light on their ‘best practices’ line. There are many more like this. Intel isn’t to be trusted for security practices or disclosures because PR takes precedence over customer security.

Rubber Meet Road:

Unfortunately security doesn’t sell and rarely affects marketshare. This time however is different and will hit Intel were it hurts, in the wallet. SemiAccurate thinks this exploit is going to devastate Intel’s marketshare. Why? Read on subscribers.

Note: The following is analysis for professional level subscribers only.

Disclosures: Charlie Demerjian and Stone Arch Networking Services, Inc. have no consulting relationships, investment relationships, or hold any investment positions with any of the companies mentioned in this report.

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate