The word of the day at ARM is derivative, something SemiAcccurate means in a good way. This is because today we see the launch of the Ethos-N57, Ethos-N37, Mali-G57, and Mali-D37, all of which are derivatives.
Of the four chips launching today the Mali not twins are the easiest to talk about. Mali-G57 is based on the same Valhall architecture as it’s big brother Mali-G77 but smaller. It replaces the older Mali-G52 and is a claimed ~30% better than that block on the same process node. G77 is also said to have a 30% higher performance density which, coupled with the performance increase means it is about the same size as the G52. A claimed 60% improvement in machine learning is likely due to the formats supported and a few new instructions.
On the display side is the new Mali-D37, a cut down Mali-D77. This new DPU is aimed at 2K resolutions which it can support in <1mm^2 on a 16nm process. While there is little to nothing a DPU can do that a GPU can’t, a DPU is dedicated hardware that can do it more efficiently. Whether or not that tradeoff is worth it for your design is an open question but the option is there for lower resolution screens now too.
Where do they fit?
Now we come to the Ethos duo of AI accelerators, the N57 and N37, both smaller versions of the N77. As you can see from the slide above, the three IP blocks all do the same jobs but tend to be segmented based on screen area/camera resolution. The biggest N77 is good for up to 4 TOP/s at 1GHz while the N57 only hits 2 TOP/s and the N37 tops, pun intended, out at 1 TOP/s. Both of the new devices have 512KB of internal SRAM while the N77 can be configured from 1-4MB.
That SRAM is key to the functioning of the Ethos units. AI is a lot of repetition and loading of weights, values, and data. The less data movement you can do, the more efficient your unit will be. ARM sized the SRAM caches for the most common workloads and claims >90% of data accesses will happen in that SRAM, a boon for speed and energy use.
The architecture of the Ethos blocks
In the above simplified version of the Ethos architecture you can see there are four main parts, the MCE, PLE, Network Control Unit, and DMA. The latter two are the I/O for the block and are basically fixed. The MCE and PLE together are called the CE or compute engine. What differentiates the three Ethos units are the number of CEs, N77 has 16, N57 has 8, and N37 has 4. The MCE is the sea of math units that crunch the numbers, the PLE is the higher level control block that is more programmable and runs less frequently used operations.
All of this pretty huge number crunching ability is rather pointless if you have to write the software for it from scratch. That is where ARM NN comes in, it is an inference engine/middleware solution that interfaces with common frameworks like TensorFlow, Caffe, and PyTorch among many others.
If your software supports ARM NN, it should work with the Ethos line pretty easily. Similarly if your device doesn’t have an Ethos block and the software you want to run supports ARM NN, it is likely to be passed to the CPU, GPU, or something else where appropriate. In short ARM is trying to take the pain out of the software side the same way everyone else is, but their solution is already on a claimed 350M devices so it isn’t exactly uncommon.
In the end all four of the new IP blocks from ARM, Mali-G57, Mali-D37, Ethos-N57, and Ethos-N37 are nothing ground up new, just derivatives of their bigger brethren. This isn’t a slight, all four save expensive die area because they deliver just enough performance to do the job at hand, 2K video for the D37 as an example. If your workload is fixed, why pay more for performance you will never use? That is the point of these four, they may not breath fire but they do what they are intended to do.S|A