ARM is releasing two new GPUs today called the Mali-T720 and Mali-T760 both based on the Midgard architecture. These two new ones should be seen as evolutions of the Mali-T600 line with some notable additions rather than a whole new paradigm.
The smaller of the two is the Mali-T720 and it is aimed at the mid-range market replacing the Mali-4xx line that was there. Since this market segment was not targeted by the T600 line, the T720 brings OpenGL ES 3.0 to that market as well as real GPU Compute capabilities. In short this part takes the mid-range device and brings it right up to date with modern specs.
Mali-T760 is the new top GPU at ARM and is obviously placed above the Mali-T678. The bullet point list includes scaling to 16 cores, much lower bandwidth used, and much higher energy efficiency. Two of these specs are implementation dependent, one is something you will undoubtedly see in the near future.
The lineup according to ARM
Officially the T760 is rated at 326 GFLOPS, has up to 2 512K L2 caches, and runs up to 600MHz. For the stat geek the pixel fill rate is 9.6 GPixels/s, 1066.6 MTriangles/s, and is claimed to do it all for far less power than a T604. The block diagram looks like this.
A rough overview of the Mali-T760
On the capabilities front the bandwidth reduction is likely a product of AFBC or ARM Frame Buffer Compression. This technique does exactly what its name implies, it takes textures and compresses them before moving them to and fro between the CPU, GPU, and display controller. ARM is claiming a large bandwidth reduction with almost no discernible quality reduction. Based on the demos SemiAccurate saw at GDC it looks to work really well, far better than existing implementations.
More interesting though is compositing functionality that should save even more bandwidth. We went in to great detail about what a compositor does in another article so we won’t rehash that here. The bandwidth reductions in the T700 line is not a full compositing engine, it is clever software that tracks layer changes on a per-tile basis and doesn’t re-send unchanged data layers. It is a neat and useful hack that can work with a hardware compositing engine as well. Suffice it to say the way modern UIs and graphics embedding work, if you don’t have some sort of assist with bandwidth reductions in your GPU you are at a disadvantage. The Mali-T760 does and it is claimed to drop bandwidth usage by 50%. This is a really good thing.
The smaller T720 GPU will sport up to 8 cores, has two 128KB caches, and all runs at 600MHz. To save you from doing the math that is a pixel fill rate of 4.8 GPixels/s, 533.2 MTriangles/s, and a total of 81.6 GFlops. Compared to the T760 that would be half, half, and a quarter respectively. The lack of FLOPS could very well be a product of decreased bandwidth but the exact reasons weren’t explicitly stated.
One last thing to mention is that both the Mali-T760 and Mali-T720 are explicitly called out as, “Optimized for Android”. This of course means OpenGL ES 3.0, Renderscript, and Filterscript. Given how close Renderscript/Filterscript are to related non-Google versions of the same specs this is more likely to be a driver tweak than a new chunk of silicon. In any case it is a very interesting window on the relevance of various OSes in the market.
In the end, ARM has two new GPUs, the 8 core Mal-T720 and the 16 core Mali-T760. The high-end is an evolution of the previous T600 line but the mid-range parts are a real revolution. It brings the volume device oriented GPU up to the OpenGL ES 3.0 and GPU compute world by replacing the T400 and it’s aging Utgard architecture. While it is quite rare for a mid-range device to be more significant, this time around ARM did just that.S|A
Have you signed up for our newsletter yet?
Did you know that you can access all our past subscription-only articles with a simple Student Membership for 100 USD per year? If you want in-depth analysis and exclusive exclusives, we don’t make the news, we just report it so there is no guarantee when exclusives are added to the Professional level but that’s where you’ll find the deep dive analysis.
Latest posts by Charlie Demerjian (see all)
- HyperX ships it’s 60 millionth enthusiast memory module - Oct 15, 2018
- Bittware/Nallatech water cools 300W of Xilinx FPGA - Oct 12, 2018
- More on Intel’s 10nm process problems - Sep 17, 2018
- Intel puts out another 14nm 2020 server platform - Sep 11, 2018
- Why Can’t Intel Supply Enough 14nm Xeons? - Sep 10, 2018