Calxeda has a new paradigm, but what for?

Silicon is easy compared to changing things

Calxeda LogoCalxeda launched their new chip, the EnergyCore ECX-1000 line last Tuesday, and that changes the question from what to who. The chip is real, the elephant in the room is who will buy said chip.

We went over the tech behind the new CPU/SoC earlier, and said it was a game changer, but didn’t really describe why we think it is. That is a long story, and one that most people don’t seem to understand. Luckily, there are enough people spewing opinions that range from insightful to purposefully misleading, adding to the confusion. The short story is that the time has come for Calxeda and other companies aiming at the Big-Box-Of-Shared-Nothing-Little-Servers (BBOSNLS).

Markets:

The obvious question to ask is who would want one of these boxes? The answer is simple, not everyone, in fact, there is only a relatively small portion of the market that can use this type of machine at all. Architectures like Calxeda are really good at doing problems that are parallelizable, and positively stink at single threaded workloads. Basically, if your problem splits up well, you could possibly benefit from this architecture, if it doesn’t, you shouldn’t even consider a BBOSNLS server.

Performance per watt, an absolutely key metric for any modern data center, perhaps the only metric for a modern data center, is also a factor. That said, it isn’t a deciding factor, the work/not work test is a pass/fail one for Calxeda, performance per watt and performance per dollar are only considered after you pass the first hurdle. Basically, if Calxeda delivers 100x better performance per watt than a rack of Xeons, and does so at 1/10th the cost of said rack of Xeons, if it can’t do the job, do “MIPS per metric” matter? A Prius is far more fuel efficient, not to mention cheaper than a 747, but good luck getting 500 people from New York to Paris in said Prius. Then again, if you only need to go on a day trip, you could fly, but a Prius is a much better tool for that job.

This is one place where Calxeda is being quite frank about their abilities, they don’t promise the moon in general, but they do promise a lot on specific workloads. The workloads Calxeda is targeting officially are offline analytics, web applications, middle tier applications, and file serving. Of these, Hadoop, MemCacheD, and web serving are the key areas. Most people haven’t heard of Hadoop or MemCacheD, but any big server farm depends on them.

The list of targets workloads for a Calxeda box may be small, but each of the targets buys a lot of servers. If the list of names HP was tossing around at their Calxeda based server launch has even a moderate conversion rate, they should sell pretty well. It is hard to find even one of the top 100 web sites that doesn’t use a few of the listed open source programs, Microsoft included.

Software:

That brings up the next big question, what runs on this architecture, and the answer there is everything. Anything Linux or *nix based will compile for Calxeda in almost no time. The cores are vanilla ARM A9s, and the interconnects look just like ethernet. The tools are already there, open source and mostly free. Windows does not run on ARM, but Windows is largely irrelevant in any data center not owned by Microsoft. If you look at the top 500 web sites, the number of them running on Windows is almost zero and shrinking.

Talking to people working on code for this type of cutting edge box, one thing became quite clear, high performance computing (HPC) work doesn’t care about architectures, it just cares about solutions that are actual solutions, not presentations and smiles. They test, evaluate, and if it provides an ROI advantage, they buy.

To this crowd, money matters, period. What the metric for evaluating money is depends a lot on the company itself, sometimes it is raw performance, other times performance per watt or dollar. What doesn’t matter is software compatibility. Google, Yahoo, Amazon, Facebook, and all the other big player have their own code base, or use open source code. Either way, the software is tuned for the target architecture, and can be re-tuned, re-compiled, and tweaked as needed.

If you are buying hardware by the hundreds or thousands of servers per month, software optimization costs pale vs running costs. On the other side of the coin, if you are one of the performance at any cost, and we mean literally any cost, you want the best solution for your needs. In any case, the first gating question, does Calxeda work for you, is still in force. Then you define performance metrics and look at them.

Memory:

That brings us back to what some believe is the key weakness of the EnergyCore ECX-1000 architecture, the A9 CPU, specifically it’s memory controller. The A9 core is limited to 32 bits of memory, and that means 4GB of memory per controller. The shipping memory quantity is the maximum, and you only get 4MB for each 4-core node. On cell phones, 4GB of RAM is getting tight, but on desktop PCs, it is unacceptably small.

Servers are another kettle of silicon, they tend to have varying bottlenecks based on workload. Intel CPUs are seen as generalist machines, high performance and applicable to almost any workload. Fast or narrow are much easier to engineer than fast and wide, but far less applicable to the real world. Calxeda has no pretensions of generalist use, they have a niche, and that is their current lot in life.

Some server workloads can fit comfortably in 4GB of RAM, others not. Some are bottlenecked by disk, memory, network, or a dozen other choke points, so a chip like Calxeda’s may very well work nicely. For some, 32-bit memory is a non-starter, period. For others, the raw CPU power is not close to sufficient. The workloads targeted by Calxeda all partition well, so both concerns can be addressed by what they do best, pack lots of little cores in a small space. MemCacheD and Map/Reduce algorithms like Hadoop have portions that comfortably scale well below 4GB, and the single threaded performance doesn’t matter nearly as much as aggregate performance per watt, dollar, or rack unit.

In the end, memory is a dealbreaker for many workloads, but irrelevant for a few others. Those few are once again, huge target markets. If pass/fail test is passed, BBOSNLS are unquestionably the way to go. For the rest, Intel and AMD have solutions for your needs.

How big is the TAM?

Last up is a rather open question, the target markets for Calxeda products may be well known, but how big are they? The list of Hadoop or MemCacheD users is stunning for not only it’s breadth, but the number of servers used at each location. Apache and the LAMP stack in general powers the majority of the internet, and a much higher percentage of ‘big’ sites than small. The workloads Calxeda says they are suited for comprises a TAM of massive proportions.

This silver lining has a big asterisk behind it, the workloads targeted are not homogeneous, they tend to have a lot of parts, and some parts pass the pass/fail test, others don’t. Map/reduce has two distinct portions, mapping and reducing. LAMP has four programs that make the stack, each with potentially dozens of sub-functions. Of these, many, if not most, are not suitable for Calxeda chips.

So the TAM question is tricky. On the surface, the answer is mind-bogglingly huge. Once you dig in, that size gets pared down to a fraction of what it was, but it is still a pretty impressive number, more than enough to keep Calxeda in executive perks for the next few hundred years. Of those potential markets, how many would consider Calxeda? How may will buy? That is an open question, but from SemiAccurate’s view, the realistic TAM is pretty large.

Support:

Support for anything that goes in to a data center environment is the key. Without credible support, well, you likely won’t be considered, much less get in the door. This is one are where Calxeda has things covered, or at least it is an area that they don’t have to cover. Calxeda you might recall doesn’t sell servers, or even make servers, they make chips. In this case HP makes servers, and sells them too.

After the make or break question of work or not work, the next most important question for many organizations is support. Here Calxeda has things covered, you can’t do much better than HP in the server support department. They have options, albeit pricey ones, for just about any level of support you want or need. If you want 24/7 on-site support with hot spares, HP has it, and that means Calxeda now has it too.

Given the redundant nature of the HP box shown off, and the failure tolerant nature of the network, you probably won’t need a babysitter for the box. If you do need 100% failure-proof computing, well, it is likely impossible, you are still stuck with the same old trade-off of adding a few more zeros to the price tag for every extra 9 in reliability. This is one area where the EnergyCore line has a killer feature, the switch fabric.

Networking with fabric:

Remember the bit about a network that you can configure to your heart’s content, both easily and cheaply? If you need a box with redundant links, or a box that can route around a failed node, it is easy enough to do, and orders of magnitude cheaper than a new chipset that does the same. These options are included in the price of the chip, and having a PCB made is not a killer roadblock. It is possible and actually likely to happen.

More important are management tools. HP has a suite of them, and while nothing was said about integrating Calxeda in to the mix, it is very likely that they will have at least some integration there. The tools Calxeda provides weren’t mentioned at the initial briefing, but there have to be some, you can’t make a custom BMC without it. Even rudimentary tools from either company would be better than some new architectures SemiAccurate has seen in the past.

Things get interesting when you take custom architectures in to account. If you have an app that has 10 distinct stages of work, you could design an architecture where each chip does a stage, and passes it to a directly connected next stage, rinse and repeat. If each stage can break down in to less than 4GB of code and data, you are set. If not, see the deal breaker above. Network processing, crypto, deep packet inspection, firewalls/intrusion detection, and other similar tasks may be a good fit here. Doing a custom backplane for a Calxeda card is vastly cheaper than spinning a custom network processor by several orders of magnitude. If the workload fits…… then this is a killer app.

If a tree falls…..

The last open question is simple, who would want this architecture, and why? To look a little more closely at that, the summary is simple, if your problem can fit in a BBOSNLS architecture, and live under a 4GB/instance cap, then Calxeda will be your dream system. If you need large memory, massive single threaded performance, or coherency among CPUs, then, well, those are flat out deal breakers. So, the remaining subset of potential customers, how many are there? The short answer is a lot.

A modern server CPU hasn’t been a single core for a long time, and while single threaded performance has hit a wall, hard, over the last few years, core count continues to grow. Intel/AMD/x86 servers have the same rough problem as a Calxeda cluster, if your app doesn’t thread, you have serious problems.

Intel/AMD/x86 has two advantages though, they scale to 4+ sockets, 64+ cores right with coherent memory systems. Not just coherent, but they can go to hundreds of GB per system, so any single thread can access it all, and have single threaded performance that ARM cores can only dream of. Calxeda on the other hand has a maximum of 4GB/node, with each node being four comparatively small cores.

For some, no, for many, that is an absolute deal breaker, and will unquestionably cap Calxeda’s market. The ARM A15/v7a architecture improves performance a lot and has a 40 bit limit per memory controller, so ~1TB of RAM is possible per memory chip. Each thread is still limited to 4GB of work space, so the glass ceiling isn’t raised very far. Allowing your code and OS to run in different spaces has a lot of benefits, but not nearly enough to do much more than put lipstick on a small side of bacon. ARM’s V8/64-bit architecture will change all of this, but that is a couple of years out.

The end of the beginning, or the beginning of the end?

Back to Calxeda how big is the potential market? In SemiAccurate’s semi-educated estimates, we would say big enough. Tasks like the ones mentioned earlier, MemCacheD and Hadoop are huge, as is light duty web serving. For things like SAAS and applets that don’t need huge CPU power, you can serve an awful lot of those from a single Calxeda node. If you recall in the previous part of this article, we mentioned Zoho, at the launch, they seemed very interested. The EnergyCore architecture is perfect for them, the non-presentation/GUI side of an office app is pretty light duty. There are a lot of similar tasks out there.

How many servers will Calxeda sell? Initially, we think they will sell a fair amount, there are a lot of problems that fit their solution now, so these boxes will unquestionably sell. If there wasn’t a market identified, do you really think HP have stepped in? They aren’t exactly known for taking risks when it comes to data center customers.

More importantly, the first set of Calxeda servers will let customers kick the tires on a BBOSNLS machine and see what it can and can not do. A few will also design a custom machine with interconnects to suit their purposes, remember the three socket Opteron systems that IBM was showing off? Same idea, different monetary scale. The interconnect flexibility on the ECX-1000 chips will allow customers to think outside the box. And they will. And they will design around it. And it will take some time. In the mean time, Calxeda is going to sell boxes to those who’s problems fit it, and a lot more to people who want to experiment.

Changing things:

The future is what really matters. The ECX-1000 looks very much like a modular architecture, one that combines a vanilla core with a flexible interconnect. If you can swap that core out relatively easily, an A15 based ECX-1000 would expand the set of those who’s problems fit the solution by a fair number. An ARM64/v8 ECX-1000 would take the list of restrictions down to one, single threaded performance, and ease that one notably. The set of applicable problems expands yet again.

Where things will change, and change radically, is how people think of what a server is. Up until a few years ago, you had a server that simply is what it is. It is a big box with a small number of cores, and performance ramped up with clock speed/single threaded performance. Within the last five years, that changed, single threaded performance gains slowed dramatically, almost stopping, and core counts exploded. That trend isn’t going to change.

Unfortunately, servers have the same architecture that they always did, just with more and more cores. The engineering needed to make that work, to feed the cores and keep them coherent with each other grew exponentially, but the same box with 1-4 cores was being sold as a server. Almost no one stepped outside that proverbial server box.

A few have recently, but those are all fixed architectures. Some are very good at what they do, SeaMicro being the current best example of a BBOSNLS and Tilera being the poster child for flexible architectures on a chip. No one has produced a credible flexible chip interconnect design/paradigm until Calxeda. Potentially, this combines the best of both worlds, and allows designers, architects, and other data center trolls to do something that they never could before. That is the very definition of game changing.

And that is how Calxeda will change how things are done. If it works as well as it does in theory, it will validate the market for others to step in to. If it doesn’t, it will show the world that such markets aren’t real, and we are all doomed to a world of performance about where it is now. In time, others may do similar things, but right now, there is only one way to go if you want to play with this new paradigm.

It has huge potential, and by the time the inevitable ARM64/v8 based EnergyCores get here, designers will have an understanding of what to do with it, and there will either be a market for it or there won’t be. Then the prevailing paradigm will change, the market potential will be known, and a new crop of winners and losers will appear. There will always be a place for big, loud, hot, single threaded performance monsters with massive memory spaces, but architectures like Calxeda will eat their markets from the bottom up. If this isn’t exactly the future, flexible interconnects are a large part of it.S|A

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate