STMicro was the lead parter on ARM’s new M7 core and were first to launch with their STM32 F7 MCU. If you are not familiar with the core, take a look at what SemiAccurate wrote about it because the F7 builds on that.
The STM32 F7 is the new top of the line for ST’s microcontrollers, it sits above the F4 and offers a claimed 2x the performance. Luckily for customers who want more from their controller line, the F7 is both pin and code compatible with the M4 and runs quite a bit faster. If you are using an F4 now, getting this new part up and running should be a piece of cake.
There were two main goals for the F7 other than the faster bit that comes with the core, those would be power efficiency and I/O. If you haven’t guessed by now this new part does both quite well, quite a trick since the F7 is built on the same 90nm embedded flash process as the F4. In the end the F7 running at 200MHz will deliver 1000 CoreMarks compared to the 608 from a 180MHz F4, not quite double overall but not bad.
For the record an F7 at 180MHz will deliver 1.64x the performance of the M4 and will deliver the same CoreMarks/mW as the older device. This may not sound like much of an accomplishment but a significantly more complex device on the same process delivering the same performance per Watt is a tough goal. Since the F7 scales way up from where the F4 tops out, think of it as offering the same net performance per Watt with a lot more headroom if needed.
How does the F7 deliver this performance? A lot of the upside comes from the I/O subsystem, and it is pretty complex for such a device. One of the keys is having 0-wait state execution from local memory, and we don’t just mean from ARM’s optional Tightly Coupled Memory (TCM). ST adds an ART accelerator and an L1 cache to the mix.
ART is essentially an accelerator for flash implemented as a “sophisticated” branch cache. It is clear that ART does do it’s job but ST would not elaborate on any of the details for obvious reasons. On top of this they added a 4K L1D and 4K L1I cache to the architecture. Between this, the ART, and the TCMs, the F7 is said to meet its 0-wait goal.
Then there is the tiny problem of feeding this beast, with beast being a relative value of beastly because this is the embedded microcontroller world after all. In addition to the L1 caches there is a lot of SRAM scattered across the die in various places, 320Kb to be exact. Of this “240Kb + 16Kb” is on the bus matrix, 64Kb is used as data TCM, 16K as instruction TCM, and 4K for backup. In short there are a lot of islands everywhere to keep things flowing.
On the I/O side there is a lot to talk about starting with the AXI bus that the F4 lacked. Since it is pin compatible with the F4, the F7 obviously needs to have an AHB bus in there somewhere and it does have a bunch of them. By that we mean and AXI to multi-AHB bridge that supports four AHB busses. This is all connected to the functional units and external devices via an 8-layer multi-AHB bus matrix. It looks like this.
Simple, easy, and a lot more than the STM32 F4
With luck and careful planning, more on the planning side in this case, there should be little to no blocking for I/O. Where one path previously existed, there are now four total which have multiple paths to critical units. This goes a long way toward the goal of 0-wait states even with 12 bus masters all needing to be fed.
Some of those bus mastering devices are quite sensitive to I/O blockages and would not fare well without the bus matrix in the F7. Those include dual DMA controllers, a dedicated DMA for Ethernet, USB OTG HS ports, and a graphics accelerator. Please do realize that this is a 200MHz microcontroller, it will not run Crysis but thanks for asking, but it does blow an F4 away.
Any guess which one is the STM32 F7?
At Techcon where STMicro was the first to show off M7 based silicon, they had several demos running. All of these compare the F7 to the F4 and most of them used DSP and math heavy code to show off the core performance. The one pictured above was the classic raytraced balls demo but there was also a fractal zoom one too. Repeated runs all came impressively close to the claimed 40s time for the F4 vs 22s for the F7, almost exactly the claimed double accounting for a little overhead. In short the F7 does deliver on the promises fo the M7 core.
The STM32 F7 is the first variant of the M7 core to come from STMicro but others are going to follow in short order. Near the end of the briefing decks ST was claiming the next variant will run at ~400MHz and deliver 2000 CoreMarks once the next eFlash node comes out. This linear scaling bodes well for both the M7 core and the F7 MCU, more than a 3x jump over the F4 line is a big deal. If they can keep the power at the same level or less with the shrink, the F7 will really up the game for embedded controllers.S|A