SEAMICRO IS A STARTUP that is attacking data center power use through a novel idea, optimizing hardware for a specific workload. For web serving workloads, the idea is simple, cram 512 Atom servers into a box with a virtual ‘shared nothing’ configuration.
Updated: Fixed wrong CPU speeds
Seamicro did not just decide to make a machine with the highest CPU socket count out there, instead they began with the intended workload, and tried to make the best hardware to execute those tasks. The workload they are attempting to optimize for is the key to all of this, small, simple, independent workloads.
By that, we mean things that you do a lot of in parallel, and task A does not depend on task B in any way. If you are doing a search on Google for Taco Bell locations near Barstow, CA, and someone two countries away is doing a search on mountains in Peru, even if they end up on the same server, the workloads don’t depend on the other. What one thread needs for data, and how it uses it is not related to what the other thread does. It is ‘shared nothing’ for data.
In this case, multi-core CPUs with huge caches don’t buy you much over many small CPUs with decent caches. The workload is likely to be more I/O bound than CPU bound, and it will be exceedingly rare that any single task needs data from another. In this case, the performance per watt of an Atom is 3.2x as good as a Xeon according to Seamicro’s numbers.
Then there is the problem of things on the server that are not the CPU, and how much power they consume. The wattage consumed by the ‘other’ category has been steadily climbing over the years, and hasn’t undergone the same power reduction scrutiny that the CPUs have. Using Seamicro’s numbers once again, ‘other’ takes up about 2/3rds of the wattage consumed by PCs. Similar numbers have been bounced around by a number of vendors lately, so it is likely very close to real world levels.
There is one last thing to consider, and that is cooling. Cooling takes power, and the less power you use, the less power you have to use to remove that heat. The cooler things run, generally the more efficiently they run, temperature in semiconductors is a fairly virtuous cycle.
As far as engineering goes it is not simply a matter of cooling X watts; airflow, heat density, and several other things complicate the cooling process. To oversimplify things, think about this, do you think it is easier to cool a 1cm square chip that puts out 100W or 10 .5cm square chips that put out an aggregate 100W?
Now think about cooling those same loads in a space constrained area, say 1cm available height. With 10 10W chips, you can cool it with a heatsink and decent airflow from fans. The 100W solution is going to need exotic cooling solutions, and more height to avoid meltdown. This is one serious advantage lots of little chips have over one big one, even if little chips pull more net wattage, the end result can be less total power use when you include cooling power.
Getting back to Seamicro, they looked at the total problem of web serving apps, like search, email, social networking, casual gaming, and other tasks where thread A has little or no interaction with thread B. Instead of doing things from the top down, they started with the task at hand, and looked at how to optimize for it with the building blocks at hand.
Front full of drives, side full of mobos, that is the SM10000
The end result is called the SM10000, a 10U unit that houses 512 Atom based servers, 1Terabyte of DRAM and 64 SSDs. For I/O, you can have 64 GbE adapters or 16 10GbE links, your choice. The front is a big stack of SSDs, with holes left over for airflow. The sides have 32 motherboads each, two columns of 16 each.
NICs and fans make up the rear
The back of the box has three main features, fans, networking, and PSUs. There are six fans, three per side, and four redundant power supplies on top. The center is dominated by network cards, either 8 8 GigE units, or 8 2 10GigE boards, take your pick. There are also two ethernet ports on the top left, presumably for out of band management.
One mobo or eight?
The motherboards are where the magic starts happening, they have eight ‘servers’ in a single 5 x 11” board, or less than 7 square inches per ‘server’. I use quotes around server because this is the first place where Seamicro’s rethinking of how servers are made shows up. If you look at the board picture, you will see 8 Atoms, 1.6GHz ULV Menlows for the curious, 8 chipsets, and four Seamicro ASICs. The back of the board is dominated by 8 DDR2 SO-DIMM, 2GB per CPU/Server.
Backs for ‘da memories
The first problem solved by Seamicro is heat density. Atoms sip power, and have excellent performance per watt figures when compared to any other x86 chip out there. 16 low level heat sources are fairly easily cooled with small heatsinks, allowing the close packing of boards you see above. A single big, hot, Larrabee or Beckton architecture could outperform Atoms, but are unlikely to prevail when you take density or system performance per watt into account.
Much of the magic is centered around the Seamicro ASIC, or what they call CPU I/O Virtualization Technology. What this is can by summarized by calling it a virtual motherboard, there are no components, but the ASIC presents the interfaces to the chipset as if they were. As far as the Atom is concerned, it is sitting alone on it’s own mobo with it’s own drive and NIC.
The ASIC connects to the chipset via a standard PCIe 2.5Gbps connection, more than enough for the I/O needs of a single server. CPUs, chipsets, memory and ASICs, that is all there is on the motherboards. This not only simplifies the board design, but it drops power use by a huge amount. If a device is not present, it does not use power.
One interesting side note is that the ASIC talks to the CPU’s chipset via a standard PCIe interface. Seamicro says that the ASIC is CPU independent, so you could theoretically put in ARM, PPC, full power x86, or any other CPUs you wanted. If you are thinking there may be SM10000s with non-Atoms in them, you are probably right. New models likely depends more on customer demand than technical complexity.
Those ASICs connect to each other in a torus configuration. If you don’t know what that looks like, take a look at this. The idea is that you have an CPU connected to six neighbors in a square, N, S, E, W, up, and down, and the edges wrap around. In the Seamicro configuration, any board can fail, and the torus will route around it seamlessly.
Not only does this provide failover capabilities, but it also allows for pretty massive bandwidth. Seamicro says that the cross-sectional bandwidth of the system is 1.28Tbps, more than enough to pass the data needed for their intended workload.
ASICs are not the only thing connected to the torus though, you might recall that the SM10000 as drives and ethernet ports too. Since they are not on the motherboards, they have to be on the torus, and that is exactly where they are.
Both the ethernet and storage cards have an FPGA that interfaces with Seamicro’s interconnect fabric. They take virtual storage or networking calls, service them, and pass the info back to the appropriate motherboard ASIC. The ASIC then presents the data as if it came from an attached drive or NIC.
Once again, this allows Seamicro to drop component counts, and bump utilization up to very high levels. Parts can be chosen for efficiency, and only the bare minimum number are needed. Power used by the ‘other’ parts of the motherboard are dropped well below that of having many under-utilized components taking up rack space. It also allows for a level of redundancy not typically found in racks of servers, and even works as a hardware ethernet switch and load balancer.
In the end, Seamicro did the exact opposite of what companies normally do. Instead of picking the best hardware available for a particular problem, the picked the problem they wanted to solve and made optimized hardware for it. To do this, they stripped out everything that wasn’t necessary to solve the problems at hand, and then picked the parts they needed wisely.
The end result is the Seamicro SM10000, a 10U box stuffed with 512 ‘servers’. It takes 2KW of power, and should be available on July 30, 2010. Base price for a 512 CPU model with full memory and 8 GbE ports but no HDDs is $139,000. Buy three, one in each color.S|A