Physics hardware makes Kepler/GK104 fast

That is the marketing claim anyway

Nvidia world iconNvidia’s Kepler/GK104 chip has an interesting secret, a claimed Ageia PhysX hardware block that really isn’t. If you were wondering why Nvidia has been beating the dead horse called PhysX for so long, now you know, but it only gets more interesting from there.

Sources tell SemiAccurate that the ‘big secret’ lurking in the Kepler chips are optimisations for physics calculations. Some are calling this PhysX block a dedicated chunk of hardware, but more sources have been saying that it is simply shaders, optimisations, and likely a dedicated few new ops. In short, marketing may say it is, but under the heat spreader, it is simply shaders and optimisations.

The market has treated hardware PhysX like an unexplained sore that shows up a week after a night you can’t remember through a tequila induced haze. Numbers vary about the absolute magnitude of PhysX’s overwhelming success, but counts of 2011 game releases supporting hardware acceleration range from a low of two to a high of six. The snowball has pretty much stopped rolling, or to be more accurate, it never started, all the developers SemiAccurate spoke with indicate that their use of PhysX hardware acceleration was a cash flow positive experience, but we didn’t talk to all six listed.

With this new bit of information, one big question is answered, but specific hardware implementations details are a bit murky. Is the ‘hardware block’ dedicated to physics calculations when there are some being issued, or is it a AMD/GCN like multiple instruction issue? Is it just shaders with an added op or two that speed up math routines heavily used by physics simulations? How much die area is spent on this functionality? This isn’t very clear, and given the marketing materials Semiaccurate has seen, explanations will only serve to impede the impending hype.

That said, SemiAccurate is told Kepler/GK104 will be marketed as having a dedicated block, and this will undoubtedly be repeated everywhere, truth not withstanding. Luckily, since most of the target audience isn’t technically literate, it may “become fact” through the VIECOOCDF (Vast Internet Echo Chamber Of Often Repeated Dubious Facts). Lowering the collective intelligence can be profitable if not ethical. Luckily, the story doesn’t end here, it gets much worse.

This part ties in to the story SemiAccurate published a few weeks ago saying that Nvidia would win this generation. A lot of people have been asking about Kepler/GK104 performance and if it is really that good. The short story is yes and no, depending on your views on some very creative ‘optimisations’ around physics.

We stated earlier, Kepler wins in most ways vs the current AMD video cards. How does Nvidia do it with a $299 card? Is it raw performance? Massive die size? Performance per metric? The PhysX ‘hardware block’? Cheating? The easy answer is yes, but lets go in to a lot more detail.

GK104 is the mid-range GPU in Nvidia’s Kepler family, has a very small die, and the power consumption is far lower than the reported 225W. How low depends on what is released and what clock bins are supported by the final silicon. A1 stepping cards seen by SemiAccurate had much larger heatsinks than the A2 versions, and recent rumours suggest there may be an A3 to fix persistent PCIe3 headaches.

To date, an A3 spin has not been confirmed, but if it is necessary, it will likely push out the late March/early April release date by at least a month. One other possibility is for Nvidia to pull an Intel and release cards without the official PCI SIG stamp, adding it when A3 silicon is available. In any case, the number of PCIe3 supporting computers on the market is minimal, so functionally speaking, it doesn’t matter. You may loose a small bit of theoretical performance, but for a mid-range part, it is unlikely to be noticeable. Marketing is a completely different story though, one not closely tied to the reality most of us live in.

The architecture itself is very different from Fermi, SemiAccurate’s sources point to a near 3TF card with a 256-bit memory bus. Kepler is said to have a very different shader architecture from Fermi, going to much more AMD-like units, caches optimised for physics/computation, and clocks said to be close to the Cayman/Tahiti chips. The initial target floating among the informed is in the 900-1000MHz range. Rumours have it running anywhere from about 800MHz in early silicon to 1.1+GHz later on, with early stepping being not far off later ones. Contrary to some floating rumours, yields are not a problem for either GK104 or TSMC’s 28nm process in general.

Performance is likewise said to be a tiny bit under 3TF from a much larger shader count than previous architectures. This is comparable to the 3.79TF and 2048 shaders on AMD’s Tahiti, GK104 isn’t far off either number. With the loss of the so called “Hot Clocked” shaders, this leaves two main paths to go down, two CUs plus hardware PhysX unit or three. Since there is no dedicated hardware physics block, the math says each shader unit will probably do two SP FLOPs per clock or one DP FLOP.

This would be in line with the company’s earlier claims of a large jump in compute capabilities, but also leads to questions of how those shaders will be fed with only a 256-bit memory path. Given the small die sizes floating around, it is unlikely to be Itanium-esque brute forcing through large caches. The net result is that shader utilisation is likely to fall dramatically, with a commensurate loss of real world performance compared to theoretical peak.

In the same way that AMD’s Fusion chips count GPU FLOPS the same way they do CPU FLOPS in some marketing materials, Kepler’s 3TF won’t measure up close to AMD’s 3TF parts. Benchmarks for GK104 shown to SemiAccurate have the card running about 10-20% slower than Tahiti. On games that both heavily use physics related number crunching and have the code paths to do so on Kepler hardware, performance should seem to be well above what is expected from a generic 3TF card. That brings up the fundamental question of whether the card is really performing to that level?

This is where the plot gets interesting. How applicable is the “PhysX block”/shader optimisations to the general case? If physics code is the bottleneck in your app, A goal Nvidia appears to actively code for, then uncorking that artificial impediment should make an app positively fly. On applications that are written correctly without artificial performance limits, Kepler’s performance should be much more marginal. Since Nvidia is pricing GK104 against AMD’s mid-range Pitcairn ASIC, you can reasonably conclude that the performance will line up against that card, possibly a bit higher. If it could reasonably defeat everything on the market in a non-stacked deck comparison, it would be priced accordingly, at least until the high end part is released.

All of the benchmark numbers shown by Nvidia, and later to SemiAccurate, were overwhelmingly positive. How overwhelmingly positive? Far faster than an AMD HD7970/Tahiti for a chip with far less die area and power use, and it blew an overclocked 580GTX out of the water by unbelievable margins. That is why we wrote this article. Before you take that as a backpedal, we still think those numbers are real, the card will achieve that level of performance in the real world on some programs.

The problem for Nvidia is that once you venture outside of that narrow list of tailored programs, performance is likely to fall off a cliff, with peaky performance the likes of which haven’t been seen in a long time. On some games, GK104 will handily trounce a 7970, on others, it will probably lose to a Pitcairn. Does this mean it won’t actually do what is promised? No, it will. Is this a problem? Depends on how far review sites dare to step outside of the ‘recommenced’ list of games to benchmark in the reviewers guide.

Ethically, this could go either way, and in a vacuum, we would be more than willing to say that the cards are capable of very high performance. The problem is that the numbers that Nvidia will likely show off at the launch are not in a vacuum, nor are they very real, even considering the above caveats. Nvidia is going out of their way to have patches coded for games that tend to be used as benchmarks by popular sites.

Once again, this is nothing new, and has been done many times before. One example that is often mentioned is Starcraft II’s use of stencil buffers. People with inside knowledge of that game’s development have said that Nvidia gave Blizzard help in coding some parts of the game during the final ‘crunch’ period. The code is said to heavily use stencil buffers to fix some issues and patch over minor glitches. Again nothing unusual, AMD, Intel, and almost everyone else does this on a case by case basis, especially for AAA titles released in conjunction with new hardware.

Since Nvidia’s Fermi generation GPUs are very good at handling stencil buffers, they perform very well on this code. Again, this is normal practice, Nvidia put in the effort and now reaps the benefits, good for them. What is odd about this case, is that several knowledgeable sources have said that the code actually net decreases performance on both cards. The above tale may be anecdotal, but Starcraft 2’s release code sure seemed to use stencil buffers a lot more than you would expect, unreasonably so according to many coders. This however doesn’t constitute proof in any way, but it fits what SemiAccurate has seen Nvidia do in prior cases.

More to the point is antialiasing (AA) in Batman: Arkham Asylum. If you recall, AMD stated complaining about that game’s AA routines upon release. They directly stated that if AMD cards were detected, the game would disable AA for non-technical reasons. (Note: The original post that TechPowerUp refers to has the pertinent sections in the comments, not on the front page. It takes a little searching to find the post that also talks about several other games having similar ‘bugs’.) It goes on to state that if the card IDs were changed, the AA in the game functioned correctly on ATI hardware.

Short story, this turned in to the proverbial “epic pissing match”, with Nvidia claiming that it was Eidos that owned the code, and they were free to do with it as they feel fit. This is technically true. Unfortunately, emails seen by SemiAccurate directly contradict this. Those emails state unequivocally that Eidos should not change code written by Nvidia and provided to Eidos as a part of Batman: Arkham Asylum. At the point they were questioned on why, Eidos says they could not do anything due to advice of their attorneys.

Since it was the attorneys objecting, not the coders, we can only speculate that this was due to Nvidia’s financial sponsorship of the game, not any technical reason. Since sources tell SemiAccurate that Batman: Arkham Asylum only uses standard DirectX calls to implement AA, and it appears to function if the graphics card IDs are changed, this seems to be nothing other than Nvidia directly sabotaging their competition and not allowing AMD remove the lockout. Go and re-read the statements from AMD/ATI, Nvidia, and Eidos, then draw your own conclusions.

Why do we bring these two cases up in a Kepler article? Well, we hear that it is happening again. Both AMD and Nvidia have developers that they can and do ’embed’ at game companies. This is an old and quite legitimate practice for GPU and non-GPU hardware companies. Everyone does it. It can be done ethically or not, with net performance gains for the end user or not, and with the intent to hurt or harm. In general, the more marketing money involved, the more most developers are willing to go out on a shaky ethical limbs.

One last really good example, tesselation. High end Fermi cards, GF100/110/GTX480/GTX580 are heavily biased toward geometry performance. Since most modern GPUs can compute multiple triangles per displayable pixel on any currently available monitor, usually multiple monitors, doubling that performance is a rather dubious win. Doubling it again makes you wonder why so die area was wasted.

Since Nvidia did waste that die area, helping games show that prowess off is a good thing for users, right? Look at Crysis 2, a AAA title that is heavily promoted by Nvidia, it positively flies on Fermi based cards, but performance on AMD GPUs is far less impressive. Why? The amazing detail in things like the concrete blocks, brick walls, and vast expanses of realistically modelled water. Breathtaking isn’t it? All thanks to Nvidia’s efforts to make the game experience better on their hardware. How could this be interpreted as anything but a win for users by a reasonable observer?

Nvidia is said to have around 15 developers they can embed at companies to help ‘optimise’ their code, ‘fix bugs’, and work out ‘performance problems’, even if those problems are not on Nvidia hardware. The count for other companies is less clear, but unlikely to be much different. Sources tell SemiAccurate that about half of them are currently working at Eidos on, wait for it, a patch for the recently released Batman: Arkham City game. Since Both the original and and the new Batman games are flag bearers for Nvidia’s hardware/GPU PhysX acceleration, it doesn’t take a genius to connect the dots. Since neither the patch or Kepler based video cards are out yet, we can only wait to see what the end result is.

If the purported patch does change performance radically on specific cards, is this legitimate GPU performance? Yes. How about if it raises performance on Kepler cards while decreasing performance on non-Kepler cards to a point lower than pre-patch levels? How about if it raises performance on Kepler cards while decreasing performance only on non-Nvidia cards? Which scenario will it be? Time will tell.

How many other games have had this level of attention and optimisation gifted upon them is another open question. One thing we can say is that the list of benchmarks shown off by Nvidia where Kepler has an overwhelming advantage all support PhysX. This is not to say that they are all hardware/GPU PhysX accelerated, they are not, most use the software API.

This is important because it strongly suggests that Nvidia is accelerating their own software APIs on Kepler without pointing it out explicitly. Since Kepler is a new card with new drivers, there is no foul play here, and it is a quite legitimate use of the available hardware. Then again, they have been proven to degrade the performance of the competition through either passive or active methods. Since Nvidia controls the APIs and middleware used, the competition can not ‘fix’ these ‘problems with the performance of their hardware’.

Going back to Kepler, we see that this happy and completely ethical game is going to be starting round 3, or round 17, depending on how you count. Nvidia appears to be stacking the playing field to both cripple the competition and raise their own performance. Is the performance of Kepler cards legitimate? Yes. Is it the general case? No. If you look at the most comprehensive list of supported titles we can find, it is long, but the number of titles released per year isn’t all that impressive, and anecdotally speaking, appears to be slowing.

When Kepler is released, you can reasonably expect extremely peaky performance. For some games, specifically those running Nvidia middleware, it should fly. For the rest, performance is likely to fall off the proverbial cliff. Hard. So hard that it will likely be hard pressed to beat AMD’s mid-range card.

What does this mean in the end? Is it cheating? Is it ethical? Is Kepler/GK104 going to be worth the money? Will it beat AMD’s 7970? These are all subjective decisions for you to make. What software will Nvidia show off as benchmarks to promote Kepler’s performance? That list is a little narrower. What will happen to sites that dare to test software that is not ‘legitimately accelerated’? No idea, but history offers some clues. One thing you can say for sure is that the information released prior to and with the card is unlikely to be the whole story. Legitimacy, performance, honesty, and ethics are unlikely to resemble the official talking points, and the whole truth is likely to be hidden from prying eyes for very partisan reasons. Big grains of salt around this one people, be very skeptical of everything you hear, and take nothing at face value.S|A

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate