Today ARM is releasing a new microcontroller architecture called v8-M, a version of Trustzone to go with it, and the AMBA 5 AHB5 to connect it all. SemiAccurate thinks the interesting bits are not the high level features but how ARM pulls it off in such a low resource manner.
In case you aren’t familiar with the ARM M-series market, it is a relatively low-end microcontroller architecture for devices like IoT and embedded controllers. On the low-end you have the Cortex-M0 line which hardly qualifies as a microprocessor, think motes and sensors that can scavenge power from the environment, not 10-core phone chips. The high-end of the M-series is the Cortex-M7 architecture with early examples running at a heady 180MHz. Think dishwasher controllers rather than supercomputers for the M-series.
Now the previous range topping M7 cores are being supplanted by the more efficient and much more capable v8-M architecture. There are no cores bearing the architecture disclosed at the moment, just general capabilities it will allow, most notably the new security architecture called Trustzone-M, a ‘lightweight’ version of the full A-series Trustzone. While they both do the same rough things, they go about it in very different ways mainly due to the fact that resources available to the two architectures differ by many orders of magnitude. Lets take a look at the new v8-M and Trustzone-M in as much detail as we have for now.
The first new concept is that there isn’t one v8-M architecture but sub-profiles called Baseline and Mainstream. Baseline is the bare minimum you need to implement a v8-M core, Mainstream has all the extras like the DSP operations first seen on the M7 line. In the past these sub-profiles were given their own name, Baseline was v6-M and Mainstream was roughly equivalent to the v7-M. This new delineation of bare minimum and lots of optional extras all from a base bearing architectural improvements seems to be a lot cleaner and saner way to name things.
Even the v8-M baseline architecture adds a lot to the older v7-M moving it from a microcontroller to something a lot closer to what people expect as a CPU. The big items added are a hardware divide unit, compare and branch, long branches, wide immediate moves, exclusive accesses, and active interrupt bits. The branches and moves are useful for the new memory model that is needed for Trustzone-M, exclusive accesses add rudimentary support for multi-core SoCs, and active interrupt bits allow a lot of secure/non-secure mode tricks.
These baseline additions alone make a v8-M core much more capable than a v7-M core but the Mainstream additions go much farther. In addition to the optional DSP units that bring quite a bit of performance to the M7 class cores, v8-M can also have a full FP unit too. It is IEEE754 compatible and does SP and DP math, how fast is likely dependent on the individual implementations. Either way it will leave the software implementations on earlier cores in its dusty wake. The rest of the Mainstream ISA is not optional, ARM adds a much richer set of 32-bit instructions good for a claimed 40% performance boost over the v7-M cores.
All of this is nice and dandy but some of the hurdles needed to program the older M-class devices made it a place for dedicated microcontroller hackers rather than mainstream coders. One big limit here was the memory model which was limited to defining regions based on powers of two sizes by the older ISAs. This was inefficient and made memory management a pain not to mention debugging and optimizing code. It was yestertech that few if anyone liked.
The new v8-M architecture does away with all of that nonsense and lets a programmer define memory regions arbitrarily with a base address and limit configuration. Actually it isn’t fully arbitrary, it needs to be on 32-byte boundaries but that is granular enough for all but the most OCD coders out there and it enforces good coding practices too. This new memory management style simplifies almost every aspect of coding and debugging, plus debug watchpoints have been updated to recognize this new layout. ARM’s new memory model for v8-M is just a good thing for all.
One of the main reason for the change was the introduction of Trustzone-M the new security architecture for v8-M class microcontrollers. As you probably realize by now, traditional security models with execution layers, hypervisors running the show, complex memory models, and tortuous secure/non-secure communication paths won’t work well on this class of microcontroller. Setting aside the fact that the hypervisor itself probably needs more CPU resources and memory than a v8-M core has, it would just be too slow.
One of the main features of the M-class cores is their very deterministic interrupt latency mainly derived from the fact that the cores process interrupts in hardware. On ‘fat’ cores you have a lot of the interrupts being driven by software with extremely variable results. Throw in a hypervisor and you add the VMEnter and VMExit latencies, processing time, and all the rest for an unacceptable delay in interrupt handling. On a big PC core these added cycles are mostly irrelevant, on a microcontroller it is life and death. Actually given some of the use cases for M-class cores, it could quite literally be life and death.
If you need very tightly specified latencies for interrupt handling, how do you implement a secure and non-secure zone without breaking the timings? The key to doing it would have to be a very fast context switch between secure and non-secure operating modes which means it has to stay in the hardware domain like the v7-M architecture. It is a tricky problem and conventional solutions like hypervisors and other things seen on Cortex-A and x86 CPUs are right out. That said ARM’s solution called Trustzone-M does it, and does it in a really ingenious way.
Some banked, some not, all color coördinated
The first architectural change is selective banking of registers, and we do mean selective. As you can see above, the green registers are common between secure and non-secure states but the blue and red are banked. This means only some registers need to be saved and/or cleared on a secure/non-secure mode switch, and the contents of some can be shared if appropriate. Banking every register would of course be easier but it adds a lot of area, complexity, and potentially power use. Partial banking is a clever compromise for the use cases seen in the microcontroller world. Also note that the stacks for each mode are physically separate as are the priority masking registers.
ARM v7-M had two modes called Handler Mode and Thread Mode, you can probably guess what they were used for by the name. With v8-M there are now secure and non-secure versions of each, four modes instead of two. With physically different stacks and control registers, this is a fairly robust way of keeping the two instruction flows from interacting.
Remember those active interrupt bits we mentioned earlier? There are two kinds, secure (S) and non-secure (NS), and both can be re-prioritized in the v8-M architecture. For obvious reasons the secure side can re-prioritize the NS interrupts but the NS side can’t touch the S interrupt priorities. This allows secure code to put itself above NS code but not the other way around.
This doesn’t mean a NS interrupt can’t interrupt S code, it can. If it does, all registers are pushed, all registers zeroed, and the mode is switched to NS. The interrupt is then serviced, the mode is set to secure, all registers are reloaded, and the secure code keeps running. You might have noticed that the registers are zeroed on a S->NS transition but not necessarily in a NS->S one, a useful feature that can be useful for clever hacks.
Round and round go the interrupts, secure, non-secure, secure….
All of this leads to the obvious question of what defines a secure area and a non-secure area for these secure and non-secure operating modes, they need to be distinct for obvious reasons. This all starts with two units called the Security Arbitration Unit (SAU) and MPU or in this case MPUs. The SAU is what you use to set which regions of memory are S and NS, something made actually useful by the new memory allocation model discussed earlier. The so called old way would have fallen over if you needed to mark arbitrary code areas mostly because v7-M and earlier architectures simply could not even set up those areas in the first place.
If a core makes a memory request, the SAU determines if it is from a secure or non-secure bit of code, then sends it off to an S or NS MPU. Actually it is a banked MPU but it effectively works as two discrete units, not that a programmer can explicitly call either one. If a request to the NS MPU calls a secure marked memory region, you get a memory fault. A NS MPU call to a NS memory region works like it should, the NS MPU just sees S memory as holes in the memory space, IE they are not there. The memory fault should act just like a normal call to non-existent memory addresses, you shouldn’t need any specific handling.
If you think your memory is full of holes…
On the secure side it is a lot simpler, a S MPU request to a S memory space just works like it should. Unlike the NS side, a S MPU request to a NS memory space works like you would expect, secure can see non-secure but not the other way around for obvious reasons, NS space is not a hole to S MPUs.
This is all fine and dandy but as many failed security architectures have shown, you need to have a sane, reliable, and high performance way to make calls from secure to non-secure operating modes or your architecture won’t be all that useful. The author recalls talking to the point man for Microsoft’s vaunted NGSCB at an IDF in 2004 or so and asking him a few simple questions about this topic. He turned a nice shade of white, got very nervous, and couldn’t answer the questions.
NGSCB failed miserably, it didn’t just not do very much like all Microsoft ‘security’ measures, but went down in embarrassing volumes of flame. The moral of this story is secure areas are useless if they can’t talk to the outside world and vice-versa, but those methods need to be very secure too. The last bit was what Microsoft didn’t bother with before shouting from the rooftops about their bulletproof security concept.
ARM is the polar opposite of this, they actually thought out the details and did it in a very clever way. There are three types of memory, S, NS, and NSC. The first two are obvious, the last is Non-Secure Callable which means it is secure but it is an area that can be called by non-secure code. Sort of. Very sort of with very tight constraints that effectively neutralize many types of attacks with essentially no overhead or resources.
If you are calling a bit of code marked S from a NS instruction, it will fail but the other way around works fine. NSC by its name is secure but callable from NS code with the caveats mentioned earlier. NSC code only works if the first instruction called from NS code is a special instruction called SG or Secure Gateway. If the landing point is any other opcode, the system will fault and the call will fail. This means the most common way of poking holes in security, overflowing a stack, will fail unless it picks an SG instruction to land on in the value it pushes to an unintended location. If your secure code is 1000 instructions long, this SG landing point lowers the attack surface by three orders of magnitude for the cost of a single instruction. It is an amazingly clever bit of architecture.
Three instructions for a state change is pretty light
Better yet it means with the simple marking of code as S, NS, or NSC at coding or compile time, you lock things down very securely. Better yet the SG op means you can swap states with one instruction that carries almost no overhead. When ARM said it was a ‘light’ way of implementing security, they weren’t kidding, you can’t get much lighter than SG but it is still a very secure method. No monitoring, no overhead, no state keeping, just one hardware based compare and off you go. Brilliant.
Actually there are two more instructions needed to do this, you can’t just do an arbitrary jump to an NSC region and still maintain a semblance of security, it wouldn’t take long for a malicious program to test every memory location and map out all the SG op locations. To call a NSC region you need to specifically use the BLSF or BLSecureFunction opcode which requires a SG to land on, anything else should fail. If you are returning you need to use the BXNS opcode. Other than that secure and non-secure code is identical. It may not be foolproof but SG is a very powerful security paradigm.
That is the basis of Trustzone-M, a truly lightweight way of implementing secure and non-secure code and memory. It all depends on some of the changes made to the new v8-M architecture, mainly the new memory model and attendant controllers, plus the added registers. As with most ARM architectures, v8-M is also quite modular and security is no exception to this rule. If you don’t need security and Trustzone-M, you don’t have to pay the gates to implement it, it is optional. That said with IoT security being as woeful as it currently is, we don’t expect many licensees to skip this feature.
All of this core security is nice but what if you need something more than a secure core, for example a secure SoC? A bunch of peripherals that don’t see secure and non-secure zones make for a fairly painful programming exercise. Worse yet many of the basic tasks that need security tend to use some peripherals, secure networking for example. Locking down content on an SoC is rather pointless if you have to send it across a wire unsecured.
AMBA 5 AHB5 needs a better name
Luckily ARM has a solution to this issue as well, it is the new AMBA 4 AHB5 interconnect. On a basic level it is similar to AMBA 3 AHB3 but with more features. It supports more memory types but not as many as AXI, can do semaphore based multi-CPU operations, and now supports user defined sideband signaling. Better yet it adds multiple logical interfaces for a single slave interface so you can address multiple peripherals over one bus. This may sound silly but in the gate, size, power, and cost constrained world of microcontrollers, it is a very big win. Last but not least AMBA 5 AHB5 supports the passing of explicitly marked secure and non-secure transactions so peripherals can keep state correctly.
It is almost like the three new pieces, the v8-M architecture, Trustzone-M, and AMBA 5 AHB5 were all designed to work together synergistically. Actually they were and what you get is a really lightweight and fast way to secure memory and running programs. The most impressive part is that with some very clever ideas like the SG instruction, interrupt latency doesn’t suffer as a result. This combination may not offer all the bells and whistles of the full Trustzone with a hypervisor but for the IoT space, it is far better than anything else out there and should be more than enough for the task at hand.S|A