One thing that keeps cropping up in geek circles is the use of GPUs in ‘cloud’ environments, and with the advent of Onlive’s desktop offering, we asked them about it. At GDC, the topic of the conversation was their unique new offering, and how the company used GPUs in the cloud.
If you don’t know what Onlive is, the company basically offers gaming over a remote connection, basically VNC for fun instead of work. To do this badly is easy, just install VNC and off you go. To do it well is a lot tricker than that, latency is a killer. If you know anything about how TCP/IP networks work, it isn’t easy to measure latency over arbitrary and often changing connections, much less easy to ‘fix’ the problem. Onlive has done it, but they went through a lot of hoops to get there. VNC and a rack of servers is not what you get with their products.
Without going into the networking side of Onlive, lets just focus on the things they have in the data center, and what services they offer. The first iteration of the service was gaming in the cloud, you get a little box with a controller that is a glorified KVM on one side, NIC on the other, with a video decompressor in the middle. The same job can be done in software if you choose, and it is being built into several TVs and set top boxes as we speak. Any modern ARM CPU is more than up to the job of decoding the stream and passing controller moves back in sane time frames.
The result, short version, is that it works, but it will never replace a dedicated gaming PC. Then again, for road warriors and others without a dedicated monster gaming PC, it is a decent enough offering. That brings us to the obvious question, if you have this massive low latency, high bandwidth network, why not offer productivity apps? The end points are all there, and if a tablet or phone can run a game acceptably, it sure can run a word processor. That is just what Onlive released a few weeks ago.
Other than being a good idea, there is one bit about why this makes more sense for Onlive to do than you might think, user habits. Gaming is a recreational activity for most people, something you do at home or after hours, gaming at work is generally frowned upon. That means all those racks and racks of gaming PCs in the cloud sit there idle for the majority of the day. On evenings and weekends, basically when people are not at work, the usage picks up. If you offer a service to be used by businesses, they have a very low probability of stepping on the toes of gamers. Onlive gets a massive PC cloud for apps backed by a low-latency network for ‘free’.
Once again, while easy to describe, the interesting parts are below the surface in how this is getting done. For that, we need to take a look at the hardware the company has in the cloud. Onlive upgrades their back end servers quite a bit, somewhere in the 6-18 month range depending on a lot of things. The net result is that they have racks of various machine types with both AMD and Nvidia GPUs, plus some with no GPU at all. There isn’t one Onlive server specification at any given time.
All the boxes are virtualized, it would bordeline silly not to. Unfortunately, most commercial VMs don’t care much about latency minimization, and their GPU support can best be described as antagonistic. Neither of these situations are good for a low-latency gaming cloud offering. To fix this, Onlive wrote their own VMM, it is called Olives. Olives is not what you might think, it’s objectives are not necessarily the same as VMWare, Citrix, et al. Olives probably cares about low latency and GPU support first, and then other ‘normal’ things. Since Olives runs only on Onlive servers, the company can tailor the VMM to support only the hardware they need, and tune the heck out of it.
Another interesting bit is how Onlive does video compression. The company has adaptive compression, state of the art real time bandwidth measurement, and can adjust the video stream on the fly. With all the talk about GPU video compression from AMD, Nvidia, and Intel, Onlive is the perfect candidate for the technology, right? Wrong.
One thing about GPU video compression is that it takes a lot of time to do. No, not to compress the frame, but to move it in and out. If you think about the normal pathway for data to reach the screen in the form of pixels, it goes CPU -> PCIe -> GPU -> monitor. Onlive adds compression and network transfer to the mix so it hypothetically now goes GPU -> PCIe -> GPU -> GPU (for compression – hopefully on the same card) -> PCIe -> CPU -> PCIe -> NIC -> network -> user. This is slow, but not nearly as slow as doing it on the CPU would be.
For LAN use, the added latency is barely noticeable. Over a WAN however, what may be borderline locally adds far too much latency to meet Onlive’s stringent requirements. GPU and CPU compression was right out. That is where Onlive’s video compression ASIC comes in. While it was not stated by the company, mainly because we forgot to ask, this ASIC is probably a video compression chip on one side with a NIC on the other. The compression chain now goes CPU -> PCIe -> GPU -> Onlive ASIC -> network -> user. The latency saved by going this route is tremendous, and may very well make the difference between Onlive being viable versus frustrating to a large percentage of potential users. The company says it also saves a bunch of power, something we don’t doubt at all.
The ASIC is obviously the most important piece in the chain, everything else in the data center is interchangeable, and as mentioned, is often changed. You just can’t do things the ‘old way’ and meet the latency numbers needed to make a user not give up in frustration. In case you haven’t been paying attention though, the ‘old way’ is on it’s last legs, Intel and AMD have integrated GPUs in their chips, and while not suitable for gaming yet, the writing is on the wall.
Intel’s GPUs are about to get a massive boost in hardware performance, but lack even basic driver functionality, so they are out for any real use. Fast and broken is still broken. AMD on the other hand has some features that are directly applicable to this problem, Zero Copy and Pin In Place. Between this and whatever the upcoming Trinity chip brings to the table, it may be possible to do what Onlive does on the CPU itself. More importantly, you not only get rid of latency, but also a full GPU, ASIC, and a lot of space, power, and complexity. Now do you get this and this? GPUs in the cloud may be hit and miss now, but they will be a whole lot more useful in a few months.S|A Part Two
Editors note: You can learn more about this type of material at AFDS 2012. More articles of this type can be found on SemiAccurate’s AFDS 2012 links page. Special for our readers if you register for AFDS 2012 and use promo code SEMI12, you get $50 off. Early bird registration ends April 8th.