AT ANALYST DAY, Chekib Akrout of AMD spilled the beans on the chip company’s two new CPU cores, and Fusion. AMD’s Bulldozer and Bobcat processors plus its memory controller that is called Fusion were all outed.
The Bobcat core was the first mentioned. It is two-thirds of a Bulldozer, with a very interesting structure. The claimed sub-1W capable core has four integer (Int) pipes and two floating-point (FP) pipes, making it fairly ‘wide’ in terms of execution units. More interestingly, AMD is claiming only the SSE1-3 and AMD-V instruction sets will be in the Bobcat core. Is this just a slightly tarted up K8?
Bulldozer is more interesting. If you think of it as Bobcat with another 4-pipe integer unit bolted on the side, that is a good starting point. Instead of having one combined Int and FP scheduler like Bobcat and just about every other modern CPU out there, Bulldozer has two Int schedulers that share an FP scheduler.
This is the heart of Bulldozer. It shares an FP unit with two Int units, making it a 1.5 core chip. Since the FP unit has two 128-bit floating-point multiply-accumulate (FMAC) capable pipes without restrictions like Bobcat, it should be able to function like two separate FP units. If one core is not heavily using the FP unit, the second core should be able to temporarily use twice the resources usually available to it.
As long as AMD does not totally screw up the ‘inter-core’ FP scheduler, the Bulldozer core should have decent FP capabilities without the die area that normally accompanies a beefy FP unit. It is a best of both worlds approach, and as long as the power and communications needs of that scheduler do not limit frequency scaling, it should be a clear win.
Bulldozer will be built by Global Foundries on a 32nm silicon on insulator (SOI) process with High-K Metal Gates. The only chips like it that I am aware of are the Niagara line from Sun, but that is a little more heterogeneous than Bulldozer.
AMD promised to sample the chips next year, so we will see what happens when the silicon hits the OEMs. At the very least, it looks like AMD’s strategy of clear departures from the status quo is alive and well. If it pays off this time like the IMC and AMD-64 did the last couple of times, we will be in for a very interesting 2011.
The last part is how Fusion is done. If you look at the above picture, the memory controller is the heart of the beast. This means two very interesting things. First is that the initial fusion cores are going to have x86 and GPU as discrete chunks, connected via a memory controller or crossbar.
More important is that GPUs need massive memory bandwidth. Currently, the top line HD5870 from ATI has memory bandwidth that’s 256b wide 5GHz (effective) GDDR5, or about 160GBps, 1280Gbps, or a lot of bandwidth. A current Opteron has two channels of 64b wide DDR3/1333, or 21.33 GBps, and a Nehalem i[somemeaninglessnumber] with three channel DDR3/1333 has a ‘mere’ 32GBps.
You can see why the memory controller is so important. If you throw two more cores into the mix, and then add even a quarter of an HD5870 – likely the minimum needed for the 2011 time frame – then you are going to have to at least triple your memory bandwidth from current levels.
While this is possible by adding pins and board layers, it isn’t sane or cost-effective to do it that way. AMD will have to get really smart to pull this off, and that is where the interesting parts of Fusion are. If you think making a new core is hard, and it is, feeding eight of the new bouncing baby chips with a few hundred GPU ‘cores’ attached is quite another level of pain. Keep a close eye on this part – it may be far less sexy than a new processor core, but it is deadly important.S|A