Haswell has three variants for consumers, GT1, GT2, and GT3, with 10, 20, and 40 shaders each. The shaders are derivatives of the Ivy Bridge shaders, so clock for clock, they should be a little faster than the current parts. The highest end GT3, currently slated for laptops only, has optional memory on the package called Crystalwell. When we broke the news about the 40 shaders in Haswell, we called it a “graphics monster” because of the massive shader count.
GT3 is a graphics monster, or at least it could be, but Intel is not going to use it in that fashion. Instead of really fast graphics with marginally functional drivers; sorry, broken and unfixable due to Intel’s moronic internal policies, they are going to use the added shaders to save power. If you downclock a CPU you save power. How slow vs how much energy saved depends entirely on the part in question, the starting point, the ending point, and a lot of black magic. Lets just say it can be better than a linear relationship, worse than linear, or both depending on where you stop. Performance is proportional to the clocks the chip runs at.
With GPUs, things are a little easier because you have multiple copies of the same units. Performance is proportional to the clocks times the shader count, something that is not true with CPUs. In a GPU, if you double the shader count, you can halve the clocks and end up with the same performance. Pick your counts right, and you can save a lot of energy.
This is exactly what Intel is doing with Haswell GT3. GT1 and GT2 run at ~1200MHz for the top turbo clock, SemiAccurate’s sources tell us that GT3 will run at 800MHz or so peak. 10 shaders at 1200MHz equals 12K ‘GPU work units’ (GWU – a mythical term we just made up), GT2 would be 24KGWU, and GT3 would be at 32KGWU. Note the almost linear progression 1, 2, 2.67, a number that rounds to three.
If Intel ran the numbers right, GT3 will probably burn less wattage than the GT2 while handily outperforming it. Since the GT3 variant is not currently slated for desktop use, we will never see the full 40 shaders at 1200MHz, but that is simply a marketing choice, not a technical problem. If anyone was concerned about how area would be used usefully with the next few process shrinks, here is a really big pointer to the direction Intel is taking. The radically different Sky Lake and Skymont architectures will fit very well with this paradigm, very well indeed.S|A