AMD’s (NYSE:AMD) Phil Rogers gave the first keynote at AFDS/Fusion 11, the topic was where are we going on the software side of things. The software side of Fusion is every bit as critical as the hardware, and likely harder to put in place.
We will skip the history lesson for the most part, and jump in to where Phil sees the state of GPU and Heterogeneous Computing, the end of the ‘Standards Drivers Era’. Basically we went from a time when you had graphics only, or graphics mostly APIs, and had to hack GPU compute on top of that. From there, things evolved in to real GPGPU supporting APIs and the drivers that supported them.
Proprietary, standards, and architected eras
This was better than the ‘Proprietary Era’, but not good. Things like multiple address spaces, manual data management, APIs designed for people who are proud of their pocket protectors, and other things that drive even strong men insane. It worked, and it could be learned, but it wasn’t nice, and it sure wasn’t elegant. That is the state of things now.
The next step, starting in 2012, maybe a little earlier if the TSMC fairies deliver with their 28nm wand of metal gates, is called the ‘Architected Era’. The idea is simple, the GPU should be part of the CPU, one address space, the same memory space, and everything you could do to a CPU, you can do to a GPU. It is now a fully pre-emptable co-processor. In theory anyway, it works well.
The hardware roadmap for the software roadmap
In the current state of the art, basically Llano and Ontario/Zacate if you stretch it, we are somewhat there, but some big pieces are missing, the memory model being one of them. Basically we are between columns 2 and 3, with most of 3 and 4 in the next generation of CPUs and GPUs. By the time the hardware is in place, the software to make use of the new capabilities should be quite far along. That is the idea behind Fusion 11, get people interested and working on the software. Imagine the coincidence!
The top of the slide above has the term FSA on top, and that is an important acronym, it stands for Fusion System Architecture. This is the idea that a CPU and GPU combo should have an open way of communicating, and that should be hardware agnostic. An open architecture allows others to join in, innovate, and move things forward. One company doesn’t have to do everything, or at least drive everything, and that is usually the best way.
AMD is promising a fully open FSA, and has stated they will publish the FSA virtual ISA (FSAIL), FSA memory model, and FSA dispatch. These specs will be ISA agnostic for both the CPU and GPU, anyone can play in the new sandbox, and AMD wants others to join. They have even pledged a review committee for FSA, so break out the party hats and get ready, there will be punch and pie.
More importantly is the FSAIL itself, or FSA Intermediate Layer. FSAIL is a virtual ISA that basically a thin and light JIT that writes to the real ISA. This JIT is called a “Finalizer”. The FSAIL is fully threaded, supports all the debugging features of a CPU, the GPU can hit all the CPU services, and the CPU can hit the same on the GPU. Once you are here, the GPU becomes part of the CPU, or at least is theoretically seamless enough that such distinctions are pretty pointless.
As an interesting side note, the FSA Memory Model is designed to support C++, Java, and .Net should that still be around when the hardware comes out. This is a BIG hint about what tools AMD thinks people will be using to write for the future Fusion chips. No one will be able to argue about being able to write code for both components with their high level tool of choice.
The last piece is the most interesting and likely the most misunderstood, command and dispatch between the two components. Right now, CPU and GPU are very distinct, and you need to explicitly put threads on one or the other, and explicitly move them between the two should that be necessary. This more or less kills any speed advantage you might get when moving from one side to the other in the middle of a job. The way things are now, you really need to pick one spot and run the task there start to finish.
Command and Dispatch is now seamless
In the new order of Fusion, you can take a task, bounce it around between cores, shaders, and whatever else you want whenever you want. It won’t be a free swap, but it is theoretically painless enough to be quite useful. Many people at AFDS talked about executing outer loops on the CPU cores and inner loops on the GPUs, the granularity is said to be good enough to make this idea a reality.
If you have been wondering what the holdup was for an OpenCL SDK that could automatically parse between the CPU and GPU, now you know. It may never be automatic, but in about a year, it will be much more seamless, and far less painful to implement. As time goes on, even those barriers should fade away.
In the end, Phil Rogers’ talk was about making things open, available and seamless. Fusion isn’t about locking people out, it is a method for enabling the next generation, be it from AMD or anyone else. In about a year, the hardware and initial software will be in place for people to play with and more importantly, validate the concept. It will be fun to watch.S|A