Search  
 

Nvidia purposefully hobbles PhysX on the CPU

Real World Tech proves Nvidia's de-optimizations

by Charlie Demerjian

July 7, 2010

Nvidia world iconNVIDIA JUST HAD one of their most sleazy marketing tactics exposed, that PhysX is faster on a GPU than a CPU. As David Kanter at Real World Tech proves, the only reason that PhysX is faster on a GPU is because Nvidia purposely hobbles it on the CPU. If they didn't, PhysX would run faster on a modern CPU.

The article itself can be found here, and be forewarned, it is highly technical. In it, Kanter watched the execution of two PhysX enabled programs, a game/tech demo called Cryostasis, and an Nvidia program called PhysX Soft Body Demo. Both use PhysX, and are heavily promoted by Nvidia to 'prove' how much better their GPUs are.

The rationale behind using PhysX in this way is that Nvidia artificially blocks any other GPU from using PhysX, going so far as to disable the functionality on their own GPUs if an ATI GPU is simply present in the system but completely unused. The only way to compare is to use PhysX on the CPU, and compare it to the Nvidia GPU version.

If you can imagine the coincidence, it runs really well on Nvidia cards, but chokes if there is an ATI card in the system. Frame rates tend to go from more than 50 to the single digits even when you have an overclocked i7 and an ATI HD5970. Since this setup is vastly faster than an i7 and a GTX480 in almost every objective test, you might suspect foul play if the inclusion of PhysX drops performance by an order of magnitude. As Real World Tech proved, those suspicions would be absolutely correct.

How do they do it? It is easy, a combination of optimization for the GPU and de-optimization for the CPU. Nvidia has long claimed a 2-4x advantage for GPU physics, using their own PhysX APIs, over anything a CPU can do, no matter what it is or how many there are. And they can back it up with hard benchmarks, but only ones where the Nvidia API is used. For the sake of argument, lets assume that the PhysX implementations are indeed 4x faster on an Nvidia GPU than on the fastest quad core Intel iSomethingMeaningless.

If you look at Page 3 of the article, you will see the code traces of two PhysX using programs. There is one thing you should pay attention to, PhysX uses x87 for FP math almost exclusively, not SSE. For those not versed in processor arcana, Intel introduced SSE with the Pentium 3, a 450MHz CPU that debuted in February of 1999. Every Intel processor since has had SSE. The Pentium 4 which debuted in November of 2000 had SSE2, and the later variants had SSE3. How many gamers use a CPU slower than 450MHz?

Of the SSE variants, the one that matters here is SSE, but SSE2 could also be quite relevant. In any case, Intel hasn't introduced a CPU without SSE or SSE2 in almost a decade, 9 years and a few days short of 8 months to be precise. For the sake of brevity, we will lump SSE, SSE2, and later revisions in to one basket called SSE.

AMD had a similar API called 3DNow!, but the mainstream K8/Athlon64/Opteron lines had full SSE and SSE2 support since May of 2004. Some variants of the K7 had SSE with a different name, 3DNow! Professional, for years prior to that.

Basically, anything that runs at 1GHz or faster has SSE, even the Atom variants aimed at phones and widgets supports full SSE/SSE2. Nothing on the market, and nothing that was on the market for years prior to the founding of Ageia, the originator of PhysX that Nvidia later bought, lacked SSE.

To make matters worse, x87, the 'old way' of doing FP math, has been deprecated by Intel, AMD and Microsoft. x64 extensions write it off the spec, although you still can make it work if you are determined to, but it won't necessarily be there in the future. If you don't have a damn good reason to use it, you really should avoid it.

What's more, x87 is vastly slower than using SSE. x87 is stack based, meaning that to do an operation with x87, you need to push things to the stack, use instructions like FXCH to manipulate it, and push a lot to memory needlessly. Simply using the equivalent SSE instruction instead of x87 will net you about 20% more speed. You can design a pathalogical case where SSE is slower than x87, but you would have to go out of your way to make it happen. I am pretty sure Nvidia will demo this kind of 'valid benchmark' in the near future, a purposefully designed pathological case that proves their point. In the real world, multiple game developers, assembly experts, and chip designers, spoken to for this article can't think of a situation where SSE is slower.

As Real World Tech pointed out, the Ageia PhysX chip used 32 bit math, and the now Nvidia PhysX programs likely do as well. The code runs on G80 based GPUs, and they did not have DP FP capabilities. This means that you can pack 4 of those data points into a 128 bit number.

Why is this important? SSE has scalar and vector variants for instructions. Scalar basically means one piece of data per instruction, and that is where you get the ~20% speedup over x87. Vector allows you to do the math on 1 128 bit instruction, 2 64 bit instructions, or 4 32 bit instructions simultaneously. Since PhysX uses 32 bit numbers, you can do 4 of them in one SSE instruction, so four per clock. Plus 20%. Lets be nice to Nvidia and assume only a 4x speedup with the use of vector SSE.

What does this mean? Well, to not use SSE on any modern compiler, you have to explicitly tell the compiler to avoid it. The fact that it has been in every Intel chip released for a decade means it is assumed everywhere. Nvidia had to go out of their way to make it x87 only, and that wasn't by accident, it could not have been.

If they didn't go out of their way, the 2-4x speed increase by using a GPU for PhysX would be somewhere between half as fast and about equal to a modern CPU. Even if you use numbers that are generous to Nvidia, PhysX would be SLOWER ON THE GPU IN EVERY CASE. To top it off, there is NO technical reason for Nvidia to use x87, SSE is faster in every case we could find.

But it gets worse. Far worse. The two programs that Real World Tech looked at, and others looked at by SemiAccurate, are single theaded. That means the CPU can only run PhysX on one core at a time. Multi-threading an API like PhysX is hard work, but Nvidia has already done that.

GPUs have lots of 'cores', the GTX285 for example has 240 of them, ATI's HD5870 has 1600, and Northern Islands has...nah, that would be telling. Without quibbling over the definition of 'core', we will just take Nvidia's statement at face value and assume 240 cores. If they can get 4x the performance of a single Intel core with 240 of their cores, each Nvidia core is worth 1/60th of an Intel core, or less. If the PhysX code didn't thread well, and it does, GPU physics would run slower than a dishwasher controller on heavy painkillers.

So the PhysX code is threaded when run on the GPU, but not on the CPU. On consoles, the XBox360 and PS3 specifically, the code is threaded just fine. (Note: The Wii only has a single core without any kind of SMT, so threading won't help that playform) None of the consoles have a CUDA capable GPU, something that Nvidia claims is necessary for GPU physics. The PS3 uses a variant of the G70 or G71 for it's GPU, the first Nvidia product that supported PhysX is the G80.

All the consoles run PhysX just fine, and the frame rate doesn't suffer the same order of magnitude performance decrease as a CPU would. Why? Because Nvidia allowed the code running on console CPUs to use multiple threads to do the work, and ported it to AltiVec, the PowerPC vector instruction set. With that, a console that barely has the power of a low end P4, will run PhysX, the game, and everything else just fine. Gosh, what might that infer?

Most modern games fully use less than 2 CPU cores, and most gaming PCs now have 4 cores, the newest ones have 6. Nvidia will not allow PhysX to run on the other 2-4 cores that are basically idle when gaming. If they only allowed a second thread to run PhysX, you would double the speed at a minimum.

Since everything is in one thread for the programs that Real World Tech looked at, PhysX isn't even fully utilizing a single core, so adding more threads would almost assuredly mean far more than a 2x speedup. On a 4 core CPU, you could easily get a 4x speedup from even basic threading, far less than Nvidia has done for consoles.

The problem is that the 4x speedup from threading would once again erase the 2-4x 'advantage' from running PhysX on the GPU. Threading would once again relegate GPU PhysX to somewhere between half as fast and barely equal to a modern Intel CPU. See the problem? To 'fix' this 'problem', Nvidia won't thread PhysX on the PC CPU, something that they do on every other platform the API is available for.

We are told that Nvidia claims that threading Physx on the CPU is not their problem, it is up to the game developers to implement. Only on the PC though, for the rest they are happy to make the effort. Like the "no one wants SSE, game developers clamor for x87 code" line of bull they spew, this is nothing more than plausible deniability for the technically unaware. Then again, after years of hype, the number of games released that use PhysX on the GPU for anything more than trivial eye candy can be counted on one hand. Make of that what you will.

Imagine if instead of purposefully de-optimizing PhysX for the CPU, Nvidia instead just did what they do for every other platform, IE not restrict the instruction set use for PR purposes and thread the code. On a modern 4 core CPU, you would get 4x speed increase from SSE, and a 4x increase from threading. Math says that would get you a 16x increase in speed, more than the decrease that you see going from GPU PhysX to CPU PhysX today.

The 2-4x advantage that Nvidia claims for the GPU is only when they hobble the CPU. If they didn't, the CPU would have a 4-8x performance advantage on Nvidia's own API. Havok and Bullet physics APIs seem to do just fine, better than PhysX actually, when running on the CPU. For some unknown reason, it is only the physics API by the GPU-only vendor that has problems on modern CPUs. Anyone have a clue why this is the case?

To take this a step farther, if you de-optimized the GPU version of PhysX in the same way that Nvidia does to the CPU version, imagine what would happen? To start with, on a GTX285, executing one instruction per clock would mean going from a '2-4x advantage' over the CPU to a 60-120x disadvantage over de-optimized CPU code. With the simple threading and SSE optimizations above, the CPU would run it 960-1920x faster than single threaded GPU code. Even a lowly Atom CPU would probably be 100x faster than single threaded GPU PhysX code. If you take away vectorization as well, the GPU performance drops yet farther.

In the end, there is one thing that is unquestionably clear, if you remove the de-optimizations that Nvidia inflicts only on the PC CPU version of PhysX, the GPU version would unquestionably be slower than a modern CPU. If you de-optimize the GPU version in the same way that Nvidia hobbles the CPU code, it would likely be 1000x slower than the CPU. If Nvidia didn't cripple CPU PhysX, they would lose to the CPU every time.

One thing you can be sure about, Nvidia will react to the Real World Tech article with FUD and tame attack sites. The official drums about no developer wanting SSE and how threading is up to game developers will have a few more technically devoid talking points added, and Nvidia innocence will be proclaimed. It doesn't matter, the GPU is the wrong thing to run physics on, it is slower than the CPU at that task. Period. This won't stop Nvidia from saying the exact opposite though, facts don't seem to get in the way of their company's PR statements.

If Nvidia wants to prove that PhysX is actually faster on the GPU, I will offer them a fair test. Give me the code tree for PhysX and the related DLL, and I will have them re-compiled for GPUs and then optimized the CPU version with some minor threading and vectorized SSE. Then I will run the released PHysX supporting games on both DLLs as a benchmark. How about it guys? If your PR claims are anything close to true, what do you have to lose?S|A

Discuss this in our forums

File under Microprocessors and Graphics and Channel and Reviews and Opinion and Rumors and Humor and Desktop and Efficiency and Gaming and Software

Slashdot del.icio.us Technorati Reddit Digg YCombinator TwitThis

56 Comments

  1. FOS3 July 7, 2010, 1:47 p.m.

    Oh Nvidia, I would love for Ati to eat you up. Months late, 100watts short, and still selling your crap to the ignorant fools who buy your junk. Loved the article Charlie! I can always count on your thorough details to hammer a few more nails in Nvidia's coffin. Maybe Ati will buy Nvidia? Oh the irony of 3dfx. Long live 3dfx!


  2. Hi July 7, 2010, 2:22 p.m.

    How long did you spend on writing that?


  3. Farkus Man July 7, 2010, 2:37 p.m.

    I can clearly see that a person with 0 knowledge base about hardware has written this article. God please forgive people who gave websites to noobs...


  4. Frank July 7, 2010, 3:08 p.m.

    To use SSE efficiently the code need to be crafted specially for it, the compiler can't exploit the vector in SSE, for that it need human input, someone that know what the code should do.

    That is not an easy job (Trust me I know what I'm talking about) and I totally understand why NVidia wont put energy into that, especially considering they have no interest making the PhysX's software engine faster on X86...

    So NVidia did not intentionally disable SSE in PhysX, they simply did not put the time and money to optimize it for SSE, that's a big difference.


  5. Stalker July 7, 2010, 3:08 p.m.

    @Farkus, that noob being you?

    Charlie is just pointing out the findings in the main article. Even in the comments, the author has seen a heavy increase in performance in the HPC medium while using proper optimized code and the right instructions set to go along with it, nut just pissx. And get this... AVX, the new set coming out with Sandy Bridge is (theoretically) supposed to run the same algorithms 8x faster than x87.


  6. Boble Max July 7, 2010, 3:09 p.m.

    Instead of stupid irrelevant fan boy comments coming form both sides, can't we have some good objective based discussion? I think that's what the comments are meant for.

    P.S. read the source material while your at it.


  7. Boble July 7, 2010, 3:11 p.m.

    previous post directed @ farkus and FOS3


  8. Xentropy July 7, 2010, 3:27 p.m.

    "GPUs have lots of 'cores', the GTX285 for example has 240 of them, ATI's HD5870 has 1600, and Northern Islands has...nah, that would be telling."

    We hates you! We wants it, we needs it. Must have the precious.


  9. rich wargo July 7, 2010, 3:40 p.m.

    Farkus, until you list your real name and real qualifications, you have no credibility for making statements of that ilk. Just another bloviating monkey-butt.

    Bottom line, not providing PhysX that can run on any platform has only hurt nVidia's pocketbook. Just look at the "vast" number of games that support it. "Vast" being, in the case of nVidia PR, a number greater than zero, as in, "we shipped a vast number of Fermi cards this quarter." Yeah, right.

    Who cares? If it made a real difference, there might be a call, but with Havok, and a big "ho, hum, who cares?" in the market; all nVidia's tactics have done is prove the stupidity of their "executives".


  10. thomasxstewart July 7, 2010, 3:45 p.m.

    Both Towelee' & Eye like Depth of Charlies' Articles, Oftem takes several Redin', yet Something unstated & waiting to come out, worth intrique.

    Next, Rocket towelee' going to Dishwasher Controler counter, in case of Unexploited Holiday controlers, Here:

    http://www.123greetings.com/send/view...

    Super America Time, So Bring Along Nice Towel.

    vondrashek


  11. Brandon July 7, 2010, 3:51 p.m.

    Yup a dirty but sadly a typical and effective move on Nvidia engineers and PR... if you cant beat your opponent...cut his legs


  12. stifter July 7, 2010, 4:22 p.m.

    Just another proof, that nVidia is the devil... :)

    BTW: Very nice article!


  13. fingerbob69 July 7, 2010, 4:47 p.m.

    A nice summation of an original article which despite some technical galloping rabbits is actually not so hard to get around.

    To surmise the summation:

    Nvidia, in order to keep PhysX proprietary, kept PhysX optimised in a code that gpu's, their gpu's, can read easily but cpu's can't... not because cpu's couldn't but because cpu's stopped reading that sort of code over five years ago!

    To wit:

    Nvidia want to sell gpu's. It helps Nvidia not one iota for games developers to be able to instigate physX on the cpu (more efficient, faster) with SSE (Simple Succinct Easy) because Nvidia don't make cpu's! Doh!


  14. Frozen Space Turd July 7, 2010, 5:05 p.m.

    Ageia, the father of PhysX was started up in 2002... its a safe bet that the originating partners were working on PhysX quite some time prior to filing there articles of organization for the corporation. This would put the origination time closer to the 1999 x87 SSE changover, giving a possible explanation as to the use of the older Floating point instructions. (Especially since as you remember, SSE wasn't exactly welcomed with open arms upon introduction. It took a couple years for enough SSE supporting chips to flood the market that it was worth it for developers to start using the new instruction set without fear of alienating a large portion of the non-SSE supporting user base.)

    I find the use of x87 to be completely plausible from Ageia's standpoint... back then it was much more familiar and less time consuming to code for than SSE (And time was something Ageia was racing against to get it's first add on cards out the door.. I seriously doubt they had the financial or development resources needed to retool their design for SSE if they wanted to...)

    Now, since nVidia purchased them 2.5 years ago, and allegedly ported their technology over to CUDA, I would go as far as to call them lazy for not updating the instructions used for the task, but then again, I am not an electrical engineer, and have no idea which types of instructions "flow" better through the GPU/CPU pipelines... Maybe nVidia's engineers were being rushed by JHH to get PhysX working on the GPU ASAP, so they cut corners and stuck with x87 to simplify things and keep dear leader happy.


  15. NeelyCam July 7, 2010, 5:14 p.m.

    David Kanter rocks! All Real World Technologies articles are good stuff.


  16. David Kanter July 7, 2010, 5:25 p.m.

    Frank: To use vectorized SSE may take some work. Although Intel's auto-parallelizing compiler is pretty good. HOwever it takes 0 work to use scalar SSE, which is still an improvement over the blight that is x87.

    Thanks to everyone for the kind comments!


  17. aberkae July 7, 2010, 5:26 p.m.

    Tell you the truth I own 2 gtx 480's in sli and I don't give a hoot about physX, because most good game developers make their own physics api in the engine such as Crytek. And I don't have a dedicated physX card. I went with gtx 480 for the perfomance and scalability in sli most benchmarks show up to 95% perfomance gains.(something ATI can't show) I wish ATI's northern island blow Nvidia out of the water. Can't wait for sandy brigde either. Love your articles. Facts are facts .


  18. Mike, CO July 7, 2010, 5:32 p.m.

    Charlie,

    There are some good things and bad things about this article.

    First, the bad things. PhysX operations are not easily implemented in multiple threads on a CPU. Adding multi-core, multi-thread support for PhysX may actually reduce performance on a CPU due to caching issues, and the extra overhead (context switching, etc...). I would not expect there to be any gains if PhysX were implemented in a multi-threaded fashion on CPUs.

    Now, the good things, which requires some background. Overall performance is always determined by a single bottleneck -- where some procedure in a pipeline of operations cannot proceed any faster, and the performance of operations before and after that bottleneck have no effect on overall performance.

    An important observation is that PhysX implemented on the GPU can add to the usual bottleneck for graphics operations -- the GPU. GPUs are made to be fast so that they can attempt (in vain) to balance the processing load with the CPU, i.e. the goal of the GPU is to try to move the bottleneck in the pipeline closer to the CPU. Ironically, PhysX can add to the GPU bottleneck by increasing the responsibility of the GPU when the processing power could be shared by the CPU instead. This makes the balance worse, not better.

    So yes, it is practical and desirable to offload PhysX computations on the CPU when possible. Offloading these computations onto the CPU saves threads on the GPU that can be used for other purposes. It also saves potentially expensive context switches on the GPU that can add substantial latencies to the overall graphics pipeline.

    The decision to use the GPU or CPU for PhysX should be based on the amount of dynamic 'geometry' or vertex data that is used by the application. In most cases, the amount of geometry that is processed by PhysX is relatively limited, making it more efficient to implement the PhysX transformation operations on the CPU.

    Because of these 'inconvenient truths', I agree that nVIDIA has crippled PhysX on the CPU. Otherwise, on average, a CPU should almost always outperform a GPU implementation of PhysX. It is very likely that nVIDIA marketing needed a tangible difference in performance between CPU and GPU, and convinced engineering to manufacture that difference. I would expect this to be only one of nVIDIA's many tricks.


  19. Dr. Kenneth Noisewater July 7, 2010, 5:38 p.m.

    http://www.ode.org/

    I wonder if THAT is optimized for SSE et al?


  20. shiggz July 7, 2010, 6:50 p.m.

    This sort of behavior from Nvidia is why this upgrade I went from a gtx260 and hopped over to a 5870 that has been even better then expected. side note: I thought eyefinity was a gimmick but at night I sometimes grab all the monitors in the house for some 3screen gaming. It is awesome! Like being a horse with the blinders finally taken off.

    I have been reading Charlie since the early inquirer days but I was still buying Nvidia most of the time. The assassins creed 10.1 deal really pissed me off and the batman AA issue was the last straw. This sort of thing only reinforces for me that for now Nvidia is not a business I want to deal with. The TWIMTBP logo not a become a negative connotation for me. If I am deciding between two games to buy I will choose the the non-TWIMTBP since I dont want to get screwed over again like with assassins creed.


  21. shiggz July 7, 2010, 6:52 p.m.

    BTW I was playing assassins creed on my TVPC which then had an ati 10.1 card. My main rig had the gtx260. TVPC now has a gt220 which has been ok other then some freezing and booting up with a very bright pink tint over the screen.


  22. Bob Saget July 7, 2010, 7:01 p.m.

    @Dr. Kenneth Noisewater :
    http://bulletphysics.org/wordpress/

    Has Intel and AMD supporting it, as well as nvidia now being forced to like to too and even supplying CUDA support for it, SSEx codepaths have been there from the start, it's starting to pick up commercial use tho the libraries are not as verbose as PhysX.

    Also, re : Ageia
    They were not the fathers of PhysX, NovodeX were the real bunch behind it, when it was first shown (there was no public fan fare etc), it was NovodeX branding on one of the ugliest sites ever. Downloading the demos which at the time didn't need any fancy drivers (my memory) or anything they were incredibly fast - the same demos are still in the nvidia driver pack today, much slower. Pretty sure you can still find this if you look really hard.


  23. robbielee July 7, 2010, 7:57 p.m.

    Mike, CO:" Overall performance is always determined by a single bottleneck -- where some procedure in a pipeline of operations cannot proceed any faster, and the performance of operations before and after that bottleneck have no effect on overall performance."

    That type of thinking only works in purely parallel computations, in processes that happen in sequential order, each step of the process takes time, and each step increases the total time of the computation to take place, with no real bottleneck.


  24. rich July 8, 2010, 12:03 a.m.

    okay but intel has just stated their 1000$ i7 is only 2.5x to 14x slower then the gtx 280. this is by their account on an arsenal of test that may have actually been poorly optimized for the nvidia gpu. Intel admits openly this up to 14x percent increase in raw power this two yr old gpu has over their fastest i7. But chuck claims that that in physx the average cpu would be more efficient and even able to outperform the gpu if it had sse physx code. This is another attempt in smearing from a guy who never gives up his hatred.

    Also so as to make clear to all the ppl who think they cant run physx without a physx card from nvidia;to all who think that there is something wrong with the fact they have purchased this technology to make a profit and not to give away their investment for free. Well the truth is they do give you all, even ati owners, physx for free. Physx runs on every PC in every game that has been developed with it. It is given to developers for free and is running on ati systems as well as every other. The advanced gpu accelerated physx is available to nvidia GPU owners as it uses cuda. This advance visuals improve the physx look and expereience but if its not on high the PC is still using physx code for the basic physics processing engine and it is brought to you by nvidia even if you choose to dislike them for it. ATI turned down physx, its not nvidia. Of course it would be nice for nvidia to program gpu accelerated physx to run on ati hardware for free but it does seem more like ati should step up more then nvidia do this hand out. BTW ati has repetitively put down physx technology and claimed it was pretty much worthless.

    It seems nvidia is evil because they give physx away to all for free. But the ppl who actually buy nvidia cards get to see even more of the advancing gpu accelerated physx for free too. So they give some more free stuff then others so lets hate them for it. Also they didnt recompile the cpu code to run in SSE so that makes them even worse. I didnt know that these games were struggling to play with the basic physx on the cpu? Even with the nvidia gpu accelerated systems the basic physx is still ran on the CPU. Its retarded but maybe nvidia isnt gonna improve onthe basic cpu side because it is more then capable to function on cpus today no problem. Nvidia is never gonna let advanced physx run on the cpu as they have vested many dollars in this technology on a per game basis. They are already giving it away for free to all for all. PPl who own nvidia gpu get to see the technology at its highest capability, ppl who dont still get to use their technology but complain.

    Think of it as a software instruction set, like an OS. Mc$ doesnt give windows to anyone for free. It (physx) is expensive technology that we are talking about. And we all do get to use it for free.


  25. Prof X July 8, 2010, 1:34 a.m.

    Check here to understand why Nvidia owns this mad mans every thought.
    http://charliedemerjianisadouchebag.b...


  26. Stalker July 8, 2010, 2:59 a.m.

    LOL rich... you made one of the dumbest arguments in this article. Pissx is not free! You pay for it in the price of the game. Every label that you see on the box as a wow-look-@-how-cooler-this-gamez-izzz allows the publisher to scam the customer for more money. End of story!

    Corrections: physics processing can be run without and add-in card or a GPU for that matter; ATI never claimed Pissx is "worthless"... but closed (source) to nvda only, unlike open source OpenCL which is they support; M$ doesn't give away their OS for free as much as nvda doesn't give away their pissx for free... not even with co-processing cards when the main card is ATI.


  27. psolord July 8, 2010, 4:20 a.m.

    Excellent article Charlie!

    Regarding the multithreading part of PhysX, every one with a quad core should really check out Fluidmark 1.2.0. It is the only app that uses multithreading PhysX.

    I have even made a video of it.
    http://www.youtube.com/watch?v=2DwOtw...

    This clearly proves that you can gain a 4.6X performance from going to multithreading alone!

    On the other hand PhysX brings 50% performance hit even when nothing great is drawn on screen. I have also uploaded a video that proves this. There are two runs in this video
    http://www.youtube.com/watch?v=lzIX-T...

    PhysX must be stopped. It is only bad for PC gaming. Spread the word.


  28. Pippo July 8, 2010, 5:36 a.m.

    It is because of this kind of bull nvidia pulls constantly, although I really like Physx, that I will continue to buy ATI (DAAMIT) graphics cards. That is of course if they continue to provide the best price/performance ratio of the market! Go on and cry, freaking fanboys!!!


  29. Frozen Space Turd July 8, 2010, 5:46 a.m.

    @ProfX

    Whoever operates that blog must lead a sad existence to put that much effort into spinning everything Charlie says into douchbaggery... In an unrelated note, check out my upcoming blog:

    lameasswhothinkscharliedemerjianisado...


  30. kadir July 8, 2010, 6:46 a.m.

    As a hard-core gamer, I can say that I am pretty happy with havok while playing Bad Company 2. I don't think that I need to pay extra money to get pyhsics from nvidia. It's all about fun, allright? and if the game is exciting enough, who cares? nearly three year old consoles still have perfect graphics and good game play.. why do they always want to suck the money out of pc gamers?


  31. Icinix July 8, 2010, 8:33 a.m.

    Crippled or no, you can still pick up a 50 - 100 dollar card to run beside your 5870 / 480 / whatever and have outstanding gameplay / eyefinity / physx.
    As much as I dislike NVIDIA, how Physx works is entirely up to them. I wouldn't call it crippling anymore than I would KFC not giving McDonalds its secret herbs and spices (Cough * Paprika* Cough)


  32. Adam July 8, 2010, 11:04 a.m.

    So Nvidia puts out a physics API that runs purposefully slower on CPUs and non-Nvidia hardware, and this is somehow not crippling software? Granted, the game programmers ultimately decide whether or not to use it, but Nvidia advertises it as a performance boost. Clearly, physics can be done well on hardware that isn't based on CUDA cores (as Havok (and PhysX for non-computer platforms) easily demonstrates). It seems Nvidia has turned PhysX into nothing more than a marketing label to try to get enthusiasts to demand it in their games, which in turn would sell more Nvidia hardware thanks to the fact that it runs so poorly on everything else.

    Granted, it may have been turned into nothing more than a marketing label by Ageia long before Nvidia ever touched it.

    My point is there's no "secret sauce" to PhysX. The same effects can be done with other methods on other hardware to equal or better effect. PhysX is being pushed solely for marketing purposes.


  33. Newtstein July 8, 2010, 11:26 a.m.

    There's a few flaws with both CPU and GPU that throttle both, depending on the type of Physics being performed, eg computational fluid dynamics, gravity etc. The ideal world would see both CPU and GPU being used for Physics calculations depending on the type in use.

    When performing Physics on a GPU, the bulk of the time is spent transferring the data to/from to the CPU - the delays are so high in fact, it makes GPU Physics only feasible for applications where the computed data stays local to the GPU ie "eye candy".

    The GPU isn't really ideal for Physics due to their inability to implement Brownian motion efficiently. That would require interaction between the cores at computation time - the only method to do this currently on a GPU is to multipass and use textures. More that a few passes and the GPU becomes very inefficient.

    Once those delays in transferring data are sorted and GPU cores can communications efficiently with each other, GPU's will be quicker at Physics simply by the fact they can out thread a CPU. As far as I'm aware, ATI/nVidia/Intel/Microsoft have any plans in the pipeline to resolve these issues.

    One thing I'm surprised hasn't been mentioned is the DMM Physics engine, which implements finite element physics quite happily on a CPU. Are there any GPU based PhysX games that do this? I'm not aware of any.

    The last time I looked, nVidia couldn't implement all the Physics models on the GPU that Ageia implemented within the PPU. I believe it still falls back to the CPU for some of the computation and only uses the GPU for tasks that can be vectored - oh the irony.


  34. Mike, CO July 8, 2010, 12:16 p.m.

    robbielee:

    "That type of thinking only works in purely parallel computations, in processes that happen in sequential order, each step of the process takes time, and each step increases the total time of the computation to take place, with no real bottleneck."

    Wrong. There is always a single bottleneck in a pipeline of operations, even if you employ sequential or parallel processes in parts of the pipeline.

    Of course, if you alter the characteristics of the pipeline, the location of a bottleneck is likely to change. But for any given pipeline, by definition, throughput is limited by one specific set of operations in the pipeline. This is the reason to use the term 'pipeline' -- there is effectively one input side and one output side and everything goes through the pipe.

    Anyone who deals with performance issues must first identify and address the single source of the bottleneck so they can start the process of optimization. Then they can move onto the next bottleneck.


  35. BillehBawb July 8, 2010, 1:28 p.m.

    This will cause flaming. I know it, but here's how I see it.

    AMD cpus: Budget, low end, single GPU gaming builds.
    ATI cards: Single GPU systems, sips power compared to nvidia.

    AMD+ATI = great bang for buck gaming machines. Something like a Athlon X3 with a 5850.

    Intel cpus: High end, good for anything you throw at them. Multi gpu fares better on this platform, but the only platform worth getting is X58, because it can accomodate multi gpu and USB 3.0/Sata 6
    nVidia cards: SLI still scales better than Crossfire, that's how SLI 470s can keep up with CFX 5870s (say +/- 10% of 5870). Chugs power like a b*tch, but in a high end system, who cares about power consumption? Also very hot. Watercooling recommended.

    But this leads to a problem. Now I just so happen to be a HUGE Gigabyte fanboy, ever since they came out with 3x USB power and USB on/off whatever the heck it does, I don't really know what it does, I just know I want it. The Gigabyte P55 + USB3 = meh. Not gonna take any chances. Going X58 gigabyte brings yet ANOTHER issue to light, however, as the slot layout on the boards available is RUBBISH. Or is it? Perhaps it would be best with watercooling (Cost up again). So in the end there's two options. One option is the economically sensible AMD/ATI option covering the sensible range of $500 to $1200 depending on how you set it up (gotta make sure its gigabyte though), and the other option is to spend well over $3000 on a system with 3 watercooled 470s and a core i7 processor powered by a nuclear powerplant and if you're gonna spend that much money on a system you might as well have 3 480s which drives the cost up to $4000 give or take. *inhales deeply*

    Not to mention the by the time you've regained your sanity after doing all of this something better comes out. Damn.

    Now, about le situation at hand? Figures. So what? Nobody uses PhysX anyway. It's just like red food coloring. Looks pretty but tastes horrible.


  36. grndzro July 8, 2010, 2:02 p.m.

    Na SLI dosen't scale better than Crossfire it is simply more prevalent in older titles. Almost all new games have excellent scaling for Crossfire and SLI. But the real question is the Electricity bill going to skyrocket if you get a top SLI setup.

    I'd rather get Crossfire and save 300 watts idle power. Nvidia cards are space heaters U take their price and add 20$ a month or more overhead.


  37. stewox July 8, 2010, 2:10 p.m.

    Yeah , i use ATI cards , and what i see happening to nvidia and that ATI cards were always my choice because of the reliability, stability, price, screen quality, not to mention ATI was always ahead of nvidia when released new product, it's a lot of PR, fanboys, fake demos, bribed journalists, and "meant to be played" branding on most of the games to optimize it for nvidia GPUs, a lot of dirty work from nvidia , same as intel against AMD , that's why AMD and ATI agreed to join forces.

    Never felt into physic, developers these days are crap , crap games , no support , who needs this tech if it's only used in demos and benches.


  38. Tim July 8, 2010, 3:25 p.m.

    Look, bottom line, marketing strategy requires that you be able to differentiate your product from that of your competitors. That's how you give your product an edge, and even if it's slight, it can make the difference between success and failure. It's a battle that takes place between all market competitors, whether it's Intel and AMD, Adidas and Nike, Ford and Toyota, Nvidia and AMD, that's business.

    Nvidia differentiate themselves by bolting on Cuda, PhysX, TWIMTBP, SLI, 3D surround etc. AMD bolt on Stream, eyefinity, crossfire etc.

    From a purely business marketing perspective I have to say that in my mind Nvidia have done a good job on differentiating their products by bolting on all of these extras. That's probably the main reason they have survived in recent times.

    Anything you can do that your competitor can't gives you an advantage, being first to the post with a technology gives you an advantage, AMD have proven this with the 5000 series being first to market with DX11 capable cards and it has cost Nvidia dearly.

    To slate Nvidia for PhsyX or Cuda is to slate AMD for it's stream technology or eyefinity technology, are we to sugest that all technologies should work on rival competitors products?

    Then how exactly would any business survive with no edge? The 5000 series would probably have spelt the end of Nvidia if they could not use what in the eyes of many are seemingly unfair marketing strategies, if they also had PhysX and Cuda they would have almost certainly have killed Nvidia and seeing as Nvidia invested in these technologies then they would have paid for their own demise.

    Anybody who thinks that all should be fair in the world of business should wake up, if it were so then there would be no rival competitors, no price competition and for that we would all suffer.

    In the business world the strategy is known as "differentiate or die" and I see the death of either Nvidia or AMD as not being in the best interest of any one of us.

    I've never commented on one of your articles before Charlie, although in this case I feel compelled to do so. In its current product state Physx is faster on an Nvidia Physx capable graphics card than on any competitors product, be that a competing graphics card or a cpu. In that case Nvidia surely have the right to claim it is so and use it to differentiate their product in the marketplace. To alter physx so that it can perform better on a competitors product would be business suicide. If what you wish to see is the death of Nvidia then I can see how this might be in your interest, but unless you are on the payroll or a shareholder of one of their competitors then I can't see how that would benefit any of us.

    Let both companies thrive and we will all benefit in the long run.


  39. Techno July 8, 2010, 3:42 p.m.

    So if physics is so much better on the cpu than the gpu why is everyone getting so excited about opencl and gpu bullet physics? Additionally why have we not already seen fantastic game changing physics from the cpu if this is the best and prefered hardware to use for it?


  40. dahouze July 8, 2010, 5:28 p.m.

    so what you are saying is that a company made some soft ware that works best on there hard ware. I'm so shocked O no what is the world coming to.


  41. steph7832 July 8, 2010, 6:29 p.m.

    @Tim: "Let both companies thrive and we will all benefit in the long run."
    Yes you are right, we need competition, fair competition, not one company cheating on benches, false advertising, crippling benefits to users etc. etc., the moment one cheats there is a possibility they would win and therefore the loser might disappear. Let's expose the cheaters and move on, that's what's happening here, we don't Nvidia to disappear and of course ATI, but cheaters should be exposed and there market share should be reflected accordingly, ie. Nvidia's discrete GPU share should be less than ATI's... spread the word !


  42. thomasxstewart July 8, 2010, 7:49 p.m.

    OpenCL Stands for Open Command Line. Like early Windows & still in Windows, Command Line allows Direct interaction into system by entering in Command Line from library of commands . As For Physics, what do expect from people who claim fastest switch in world & made it by short circuiting wires while strung around table & arbituaryily flipping on & off switiches into measured mess.

    People should feel lucky to have such depth to articles by Msr. Demerjerian.

    vondrashek


  43. comprehension July 8, 2010, 9:24 p.m.

    Nobody is saying NV should compile their physics stack for ATI.

    Everybody is saying that NV physX should not switch off as soon as an ATI product is detected in the system.

    Charlie is adding that, by no accident, NV's physX for CPUs has been hamstrung to make their cards look more powerful than they really are.


  44. rich July 8, 2010, 9:48 p.m.

    duh, Stalker
    that was my point!!!!!
    M$ doesnt give their OS away for free and you arent upset with that. Everyone gets to use physx just as anyone else on the basic cpu operations. It is important to bring this up cause there are ppl who think nvidia has prevented them from physx completely.
    The only difference is Nvidia has taken an extra step to add to the gaming experiences for their customers so they can enjoy a more advanced gpu accelerated physx at no extra cost to them. This is all to push their gpu into new areas. With all the power of a gpu these days it only seems logical to harness them in an out of the box way. Remember nvidia doesnt sell anything but GPUs and they are actually pushing the boundaries....which i cant see anything wrong at all with this. Really there are super narrow minded ppl who cant see this for what it is because of what ever reason.

    So (1) Nvidia doesnt make CPUs,
    (2)they want to take advantage of the product they do sell. They want to offer their customers more then a gpu as they have realized their is much more potential in these ultra powerful add in cards raw power.
    (3) physx is an example of nvidia pushing their existing product into new areas
    (4) is has nothing to do with them punishing ati owners. Absolutely nothing

    How can anyone believe it is anything other. How can ppl not see their strategy. It is important for nvidia to make more uses for their GPU, why would anyone be against this. It gives end users more purpose for their GPUs as well, and all on nvidias tab.

    So If you choose to not buy Nvidia cards then you have nothing to hate nvidia for. Just as if i choose to not buy a black edition cpu, i have no right to be upset with amd when i have a cpu with a locked multiplier.

    These rants are mindless and it seems that very few actually think these days at all. Just the same thing repeated "thats why i buy ati" and then instead of the truth being stated "so i can be upset with nvidia for physx not running on my gpu", ppl seem mindless and are hateful to nvidia. If you choose to not buy into physX then you arent gonna get all the bells and whistles of advanced gpu acceleration in your games. Its not gonna happen. And the only way you are gonna have this is simply buy a gpu that nvidia makes in which they support having this option for the user for free. Kinda like a car which runs on electric. If you want one buy one. Or hate ford for making them and not letting you have the great gas mileage when you didnt even buy a car from them that could.

    How freaken simple a thing it is???? How can ppl mix this up?????? Dear mother earth please bring understanding!!!


  45. TheDude_temp July 8, 2010, 11:01 p.m.

    to: dahouze...

    No. They made the software run badly on both -their- hardware and others, but also run much much worse on anything else but theirs. And proves that Physics software (not Nvidias) can be easily better than anything we have today.

    *SO* in order to control this "feature" which is still baby technology, everyone suffers.

    There is a chance that AMD is doing something in the background (I hope so), because Nvidia is holding back the technology more than anyone else... the same way that 3Dfx had control of 3D market with Glide API.

    With DirectX & OpenGL, 3Dfx lost its hold on the 3D market. When there is a Physics API that anyone can use, including Nvidia... things will change quickly.
    Who knows, MS could be adding Physics to DX12, but since MS doesn't support PC gaming... that is most likely a fantasy.

    I'd like to have PhysX.... but I'm not willing to buy an Nvidia card at this time... so Nvidia loses software market for something they paid MILLIONS for. I'll gladly buy an ATI 5830 over a GTX 260.

    Nvidia is stupid... they even prevent people from using ATI & GeForce cards in the same computer. Yes, they loses sales because there would be people who'll have two 5850 cards and then buy a GTX-260 for Physics... but they'd need a special motherboard, etc.


  46. Techno July 9, 2010, 3:07 a.m.

    This is not about Nvidia using its product to try and support its hardware sales, that is quite understandable and acceptable.
    This is about Nvidia being completely disingenuous, they would gain alot more kudos in the eyes of many if they just came out and said yes we are keeping Physx to our own hardware on the pc and have disabled it on other hardware. It has been proven that physx can work perfectly well on a secondary nvidia graphics card while an ATi card is used for primary graphics, so it is not a quality assurance issue which was the lie that nvidia were using to justify their actions.
    Now its been proven that physx will run better on the cpu than the gpu and only doesn't because nvidia have gone out of their way to make it that way and then claim a x4 superiority.

    Really Nvidia all this deceit is just putting many people off you and your products, I would now only grudgingly buy your hardware if it was unequivocally the best and without equal.......which it is not.


  47. nicola July 9, 2010, 9:54 a.m.

    So NFail fanboys and hardware newbies read S|A?Since when?
    My gosh.


  48. nicola July 9, 2010, 10:13 a.m.

    I mean,Farkus Man,come on,share all your great hardware knowledge with us.
    You write compilers,huh?
    Or maybe you won a recent IOCCC contest?
    Do you perhaps work at NSA?
    Are you proficient in Perl,Python,C#?
    Do you you design GDDR5?
    Do you even know what a f***in' IRQ is?
    Do even know basic binary algebra?Show us!
    OTHERWISE SHUT THE F*** UP LAME NFAIL FANBOY.


  49. Anonymous July 9, 2010, 10:33 a.m.

    If they say "threading Physx on the CPU is not their problem, it is up to the game developers to implement." they shouldn't have any problem or fear to give you the code tree for PhysX and the related DLL, to do what you said in the last paragraph and make a simple comparison, with a benchmark.

    Because if it is up to the developer to implement it, as they say; they will have to hand that data to work with anyway.


  50. godrilla July 9, 2010, 1:10 p.m.

    Here is the latest game to support physX recommended requirements:

    RECOMMENDED SYSTEM REQUIREMENTS
    Operating System: Microsoft Windows XP (SP2 or later) / Windows Vista / Windows 7
    Processor: 2.4 GHz Quad Core processor
    RAM: 2 GB
    Video Card: nVidia GeForce 9800 GTX / ATI Radeon HD 3870 or better
    Hard Disc: 10 GB
    Sound Card: 100% DirectX 9.0c compliant card
    Peripherals: Keyboard and mouse or Windows compatible gamepad

    PHSYX/APEX ENHANCEMENTS SYSTEM REQUIREMENTS
    Operating System: Microsoft Windows XP (SP2 or later) / Windows Vista / Windows 7 Minimum Processor: 2.4 GHz Quad Core processor
    Recommended Processor: 2.66 GHz Core i7-920 RAM: 2 GB

    Video Cards and resolution: APEX medium settings
    Minimum: NVIDIA GeForce GTX 260 (or better) for Graphics and a dedicated NVIDIA 9800GTX (or better) for PhysX
    Recommended: NVIDIA GeForce GTX 470 (or better)

    Video Cards and resolution: APEX High settings
    Minimum: NVIDIA GeForce GTX 470 (or better) and a dedicated NVIDIA 9800GTX (or better) for PhysX
    Recommended: NVIDIA GeForce GTX 480 for Graphics and a dedicated NVIDIA GTX 285 (or better) for PhysX NVIDIA GPU driver: 197.13 or later.
    NVIDIA PhysX driver: 10.04.02_9.10.0522. Included and automatically installed with the game.


  51. godrilla July 9, 2010, 1:11 p.m.

    ^^^^Mafia II ^^^


  52. Tim July 9, 2010, 1:40 p.m.

    @steph7832
    What I'm trying to say is that it's clever marketing. I'm not overly concerned with the technicalities of whether the claims they make stack up when put under close scrutiny, in truth very few claims manufacturers make actually do stack up.

    Like yourself, siezing on one line from a post that I made and then twisting it to serve your argument, the article is siezing one selling point and putting it under the microscope, then twisting it to favour the required outcome. Yourself and the author it would seem think alike.

    It's an informative experiment that provides interesting results, but all it really proves is that the product works as intended, but could work better if altered to work as not intended by the manufacturer. I would certainly not describe this as cheating as you put it.

    Let's take a look at what AMD says about it's Radeon 5870 on it's website:

    Features & Benefits

    Expand your visual real estate with ATI Eyefinity Technology, with revolutionary multi-display capabilities that let you see more and get more done.
    Accelerate the most demanding applications with ATI Stream technology and do more with your PC.
    With full support for DirectX 11, these GPUs enable rich, realistic visuals and explosive HD gaming performance so you can dominate the competition.

    Let's take a closer look at this statement: "Accelerate the most demanding applications with ATI Stream technology and do more with your PC."

    The most demanding applications they say, what exactly are the most demanding applications? That's anyones guess right? Do you actually think that if we looked at a list of the most demanding applications ATI stream technology would accelerate them all? Secondly, what exactly does "and do more with your PC" mean? Again, anyones guess right? Can I fry an egg? No, for that I need Fermi right?

    Then let's take a look at another statement: "With full support for DirectX 11, these GPUs enable rich, realistic visuals and explosive HD gaming performance so you can dominate the competition."

    Explosive HD gaming (will my monitor explode?), dominate the competition (hell, each to their own I guess), realistic visuals ? Open your eyes steph, not much that you read here can actually stand up under close scrutiny can it. This is the world of marketing.

    You obviously have a preference for ATI, but if you're smart, it will not be because you have read an advert written by AMD or Nvidias marketing departments.


  53. pogsnet July 9, 2010, 4:19 p.m.

    Nvidia is capitalizing on the point that Physx is faster done on GPU not on CPU while in fact they purposely make it slow in CPU. That is tha point on this topic.

    So stop whining, this is not about AMD or Intel.


  54. krumme July 10, 2010, 11:49 a.m.

    von drashek, you made my day. Still rofl. Keep up the good work.
    I guess you still own an xp3000, win xp sp3, your amp is an NAD 3020i, you play cds, but still keep the vinyl records in a safe place.


  55. Dentad July 11, 2010, 6:05 p.m.

    The next step is to benchmark PhysX, Havoc and several other physics APIs - each on several GPUs and CPUs. The pattern will be shown, and the truth will out.


  56. fvdbergh July 13, 2010, 10 a.m.

    Having actually written programs using many of the technologies mentioned here (x87 vs. SSE, ODE, Novodex/PhysX), I can add the following:
    1. In my early testing of CUDA/PhysX (on a G92 GPU) about a year ago, only particle effects appeared to be accelerated on the GPU. Rigid body physics applied to convex polyhedra did not appear to be accelerated at that time. This choice makes sense: particle effects (e.g., water) involve very large numbers of individual particles, hence benefit more from appropriate acceleration (e.g., via CUDA), compared to a handful of more complex objects.
    2. Implementing physics code in a multi-threaded way is not as simple as the author of this article assumes. Neither is writing good SSE code. While I agree that you could potentially do both (multi-threading + vectorization) to improve PhysX performance, this costs developer time. Taking my point 1) into account, you have to wonder how much performance will be gained by optimizing/multi-threading the remainder of the physics effects.

    It seems that Charlie (and the author of the original RWT article) is convinced that there are large performance gains to be had by "simply recompiling with SSE optimization enabled", but they lack the evidence to back it up. Sure, PhysX is using x87 code exclusively. But do yourself a favour: implement a simple program that could (potentially) benefit from automatic SSE-style vectorization. I chose a 4-tap FIR filter. Take an array of 1 million 32-bit floats. Each output value is the sum of four consecutive input values, multiplied by the corresponding filter weight. This should be a slam-dunk for SSE, since the innermost loop of four consecutive multiplies could be done in a single SSE instruction (the summation following that, however, will be sequential even on SSE, assuming SSE1 instructions). Compile this with gcc-4.4.3 -O3, using the -fpmath=387 versus the -fpmath=sse -msse options. Verify (with -S) that the compiler is generating 387 or SSE code. On a Core Duo, applying this filter 500 times took 5.75 seconds on SSE, and 6.2 seconds on 387. (Intel icc 10.1 -aXP compile took 5.9 seconds).

    (**I know that this way of implementing an FIR filter is not the most efficient, and hand-coded SSE can be used to get close to 4x speedup, but the argument is that a "simple recompile" of the code will magically make SSE "work").

    My point: sure, you can turn on the compiler flags to generate SSE code, but this does not mean that it will run 4x faster. To get that kind of speed-up requires more effort from the developer.
    This article, and the original RWT, appear to be based on the speculation that performance *could* be improved, but they have absolutely no data to back this up.


Add your comment





Comments are un-moderated except for automatic spam-reduction services, these services are not related to liposuction or any other dieting method. Hitting the [POST] button here is the legal equivalent to self-publishing. This means that you are liable and therefore RESPONSIBLE for all consequences of what you are writing and publishing. S|A is not and will not be held liable for your publications using our platform. We will happily turn over your IP address to any legal authority with a valid search warrant.

Comments are un-moderated except for automatic spam-reduction services, these services are not related to liposuction or any other dieting method. Hitting the [POST] button here is the legal equivalent to self-publishing. This means that you are liable and therefore RESPONSIBLE for all consequences of what you are writing and publishing. S|A is not and will not be held liable for your publications using our platform. We will happily turn over your IP address to any legal authority with a valid search warrant.