AMD paves the way for future GPUs

TFE 2009: Power and thermal solutions for 2011

ONE OF THE more interesting conferences of the year is AMD’s TFE, or Technical Forum and Exposition. If you have never heard of TFE, that is likely because this conference on near future thermal and power management technology is by invitation only.

The conference was founded by Dr. Gamal Refai-Ahmed, AMD’s Chief Thermal Architect for the Graphics Product Group, and was initially not open to the public or press. For the last two years, TFE was held at the NTUH International Convention Center in Taipei, Taiwan. TFE discussed technologies to enable products that were far from announced – not the next generation product, but the ones after that, 18-24 months out or more. Every new generation of chips, especially high power and high speed CPUs and GPUs, has implementation challenges, and with each new generation, those challenges get harder and harder.

Graphs aplenty

A typical slide from TFE 2008

This is a very different type of meeting from the normal product or technology conference. Instead of talking about raw technology or end user products, TFE goes for the middle ground. Intel’s IDF focuses on how to use its products, IEDM on much lower level work, and TFE is all about enabling the manufacturing of those products. Making a semiconductor device on a far sub-micron process has become so complex that few companies can afford the engineering costs to make their own. Fewer still can afford the much higher cost of the basic R&D needed to make the processes that those designers use.

A modern fab costs billions of dollars to build and equip. Designing a CPU costs an order of magnitude or two less, and GPU development starts a bit lower, but can end up costing as much as a low end CPU before things are done. This may sound cheap until you stop and think about the numbers; we are still talking about 8 digits or tens of millions of dollars. This isn’t news to anyone familiar with computers. Everyone realizes that researching fab technologies and chips is hard work. What they don’t realize is how hard the next step has become lately.

In the old days, making a board was simple, or at least it was mainly a logic problem combined with a little silk screening and soldering. Lately though, it has become hugely complex and problematic, with multi-layer boards, interference, routing, power delivery and cooling factors involved. Each new generation of boards pushes the boundaries a bit farther, and in so doing, requires new tools and devices to make it all work.

If a company waits for delivery of an IC to start the board design process, it will delay the product launch by months or years. The ancillary technologies needed must be ready before the chips are so the testing of parts can begin almost simultaneously. Those tools are now becoming as complex and expensive to make as the last few rounds of chips have been. Development times are going from months to years, and if a company is not proactive in laying the groundwork, it will be left in the dust.

Chipmakers, especially the larger ones, can afford to do the R&D because their sales volumes are in the millions of parts. Motherboard and GPU makers sell far fewer units, so this increasing cost hits them disproportionately. What was once a CPU company problem is now an industry-wide problem, and the only way to solve it is to work collaboratively.

R&D is a competitive differentiator though, and most large OEMs, ODMs, and AIBs are about as friendly as two politicians running for the same office – they smile at each other in public, but that is about it. Chipmakers asking these groups to work together on their own is about as useful as asking a politician to keep their campaign promises. It theoretically could work, but don’t count on it.

That is where TFE comes in. The conference is meant to be a showcase for technologies that these competitors need to know about, and it allows them to talk together in the semi-open about problems they face. The main focuses are on power delivery technologies and thermal management, showcasing companies that make innovative solutions in this space, and sharing implementations details.

A second and possibly more important function of TFE is to bridge the gap from academic research to purchasable products in ways that don’t necessarily make sense for a single OEM to invest in. Academic research is great for answering questions that may not ever have commercial value, but it is not necessarily good at taking that knowledge and putting it into commercial production.

The goals of AMD’s TFE are to make partnerships with academic institutions, be they individuals, teams, or the institutions themselves, and bring the research to people who can implement it. If a team in a dark lab somewhere comes up with a new heatsink design that is more efficient, how does that get to companies like Gigabyte, Asus and Sapphire?

If you attend TFE, it seems very much like any other technical conference, with keynotes, informational booths, and talks about technologies. What you tend not to see however is anything related to a product that an end user eventually buys. There was no chatting at TFE about the HD7870, no talk about Bulldozer cored Phenom Vs, or anything an enthusiast would care about. Instead, you heard about GPU power density curves as they intersect with the physical limits of air and water cooling. How do you feed power to the next next generation GPU, and how do keep it from cooking if you get that power there?

These are the burning questions, pun intended, that need to be answered by anyone making that slick gaming GPU you will see next year. In the past, you just got a chunk of aluminum and glued it to the GPU package. That led to larger aluminum heatsinks, then copper, followed by faster fans, ducting, heat pipes, and now vapor chambers.

While the technology has been able to keep things functioning, bounds are being pushed harder and harder with every succeeding generation. A simple slip-up can mean thousands of cards dying in the field, with warranty claims hitting millions of dollars, and damage to reputations exceeding that.

The stakes are getting higher while the engineering difficulty climbs exponentially in the background. A big slip ends up with a Pentium III/1.13 problem or XBox 360 RROD. Incompetent engineering coupled with bad management can finish a company, as Bumpgate will likely do to Nvidia before it is all finished. To say that thermal and power management is of paramount importance to IC makers is understating the problem.

If you are Intel or AMD, you invest the money you need to have a stable and functional product. That investment pays for itself many times over. If you are a mid-tier AIB, you simply don’t have the resources to develop the needed technology to ensure your products will live, so you have to use third party solutions. Who makes those? Who will tailor those technologies for your particular need?

Once again, that is why Dr. Refai-Ahmed started TFE and is expanding it to bring more academic research into the fold. Think of it as a cocktail party for researchers, power and thermal solution vendors, and the consumers of the results, only without the cocktails or smoky atmosphere. It might not seem exciting to a non-engineer, but you won’t have next year’s GPU without it.

On the academic side, AMD is involved from the earliest stages to the end products stage. One of the two keynotes at TFE this year was by Dr Matthew M. F. Yuen, Acting Vice President for Research and Development at the Hong Kong University of Science and Technology. It is working with AMD to develop an infrastructure to move ideas developed there from lab research to end user products.

AMD bridges the gap between the university and the AIBs needing something to power and cool their wares. Even if that gap is bridged and the newly developed electro-thermal-cooling widget is perfect for the upcoming HD7870, it takes a lot of time and energy to educate each potential user about what is coming in the near future. Multiply those dozens of buyers by dozens of solutions, and you have a recipe for a lot of frequent-flyer miles.

It would be a lot easier to get all those people together in the same room at the same time and just tell them about all the upcoming electro-thermal-cooling widgets at once, right? That is the end goal of TFE, to educate the target audience about the coming solutions in time for them to implement the technologies. This in turn allows AMD to come out with GPUs and CPUs knowing that the products will have appropriate power and cooling solutions waiting for them. Can you say time to market?

A good example of this is AMD’s collaboration with the Stokes Institute at the University of Limerick, Ireland. It looked at the efficiency of a finless heat sink design which, quite counterintuitively, worked better than the ‘normal’ finned heatsink. That heatsink is being put into production by Asia Vital Corporation (AVC), and it should show up in some upcoming low or mid-range GPUs in the not so distant future.

New vs Old heatsinks

Smaller is better, cheaper, and quieter

Since it is more efficient, you can get away with a smaller heatsink. This saves material cost, shipping, and of course end user power bills. Building a cheaper card cooling solution that works better than the alternative, a standard copper heat sink and fan – what’s not to like?

Before you pencil TFE 2010 into your entertainment calendar, just be aware that it is a hardcore science and engineering conference. Three of the four sessions were entitled “Best Design Practices in Power Electronics”, “Innovative Thermal Management Solutions”, and “Testing and Manufacturing”. Tourists looking for shiny things to play with, upcoming games, and free t-shirts will be very disappointed. Engineers working in the trenches on the current GPU solution-plus-two will be more than amused.

In the end, what Dr. Refai-Ahmed has done with TFE is to simply lay the groundwork for the people making the next-next generation GPUs. Starting from the raw research, moving it to component makers, and then on to the OEMs and AIBs themselves, all the pieces are put into place. TFE then pulls all of the players into the same room and gets them talking, listening, and mingling.

Even without umbrella laden drinks, TFE is an interesting place to go to learn about the components that go into making a graphics card, with emphasis on the card, not the GPU itself. TFE is an invitation only event, but if you are interested in attending, contact the people listed here.S|A

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate