LSI and 6WIND are teaming up to release a hardware and software combination to address several high performance networking workloads. The main goal is LTE base station processing, but many other software defined networking applications are addressed.
SemiAccurate has written extensively about LSI’s drive controller offerings in both LSI and Sandforce guises, but never about their network processing chip line called Axxia. Bundled with an optimized version of software from 6WIND, they are the basis of today’s announcement. The idea is to make a functional out of the box high performance networking solution that lets developers focus on adding value, not bringup and plumbing.
The hardware is nothing like a normal desktop processor or SoC, Axxia is a network processor in the vein of Cavium, Netronome, and Tilera. If you come from a PC background, these CPUs look very odd with lots of little cores, more fixed function blocks, and massive internal networks to all that shuttle data around. The LSI Axxia is no exception, it starts out with 2-32 ARM or PowerPC cores and two complete busses. From there you can add 15 types of accelerators and lots and lots of I/O. The best diagram we could find looks like this.
Axxia overview without much detail
If it isn’t obvious, the Axxia 3400 family is fairly flexible with a wide range of available CPU and accelerator options. If you are a large enough customer you can pick your CPU type, pick the count, and spec out the accelerators you want. For the truly committed, you can even put your own IP blocks in, this line is quite malleable. If you have a penchant for coding problems and debugging nightmares, you can theoretically mix and match ARM and PPC cores too, but it is not recommended. If you can pay for it, LSI will likely make it.
Another view of the Axxia chips
The chips start to get interesting when you look at one of their main markets, cell phone/LTE base stations. This is where 6WINDGate enters the picture, but more on that later. Cell phone providers are notoriously picky about who gets what data, be it customers or competitors. The Axxia line is aimed at both of the workloads that encompass LTE base station hardware, data plane and processing plane work.
Since cell towers are often shared between providers, sub-providers, and all sorts of middle men that you will never be aware of, keeping the disparate data streams actually disparate is critical. Axxia has two separate busses, one for the CPU side of the chip and one for the accelerators or other data plane oriented blocks. You can set the two sides to be coherent with each other or non-coherent, it is up to the user to decide. More interesting is that you can chop the coherency domains in to pretty much any subdivisions you choose.
If you have a 32-core Axxia with 16 accelerators, you can have eight hard coherency domains with 4 CPUs and two accelerators per, or whatever mix and match arrangement you want to program. This seemingly crazy feature is necessary because of the nature of the data being passed across it, if you want to sell to this market, you damn well better have partitioning like this. Somewhat counter-intuitively, the coherency comes at no cost, so unlike most consumer CPUs you don’t gain any benefit by turning it off globally. It is simply there to satisfy some customer requirements. The CPU plane and data plane can also be run autonomously if the user requires, each on it’s own bus.
One last bit that came up during the talks with LSI about todays announcement, the commonality between the LSI storage controllers and the Axxia line. On the surface, there is none, LTE processing hardware is a far cry from a SAS RAID chip, right? In practice there are a lot of common blocks that both use, from PCIe to Ethernet, physical layer to memory interfaces, both do a lot of the same things. LSI has done the smart thing and co-develops blocks for both lines, so while you will never see any similarities in the products, they are there under the hood.
By now you are probably thinking that these chips are very complex, and more importantly there are enough variants to confuse just about any developer. How do you develop software for a two core ARM chip with four different accelerators and make it run on a 16-core PowerPC device with six totally different accelerators? Just for fun, imagine you are writing code for something with 4+ nines of fault tolerance where any down time is basically unacceptable. Needless to say, this is a big problem, and a bigger expense for anyone attempting it. The barriers to entry in this space are huge and bugs cost contracts.
Enter 6WIND. They are shipping a customized version of their software 6WINDGate that is optimized for the Axxia chip family and runs out of the box on LSI’s reference designs. Instead of beating your head against the wall in a nightmarish bring-up and debug marathon, it just works. The hard parts are taking this software and customizing it for your device, service, or whatever else you want to make, not the low level chores. This is not a trivial effort but it is orders of magnitude less effort than doing it yourself from the ground up. Instead of redoing the basics, you start with added value code.
The end result is something akin to the buzzword de jour, software defined networking. 6WINDGate is a complex piece of low level software that provides a lot of networking functions, replacing some OS features with optimized and customized protocols and stacks. The idea is to increase throughput, reliability, and reduce latency.
On Axxia, 6WINDGate runs on dedicated cores outside of the OS. It can do data plane or packet work on a single core or more often on dedicated cores, and is VM aware. One feature that sounds very intriguing is that 6WIND says they scale almost as high as you would care to go, but does so in a single instance or VM. This includes multi-core, multi-socket, and even cross rack scaling. In theory, you can have a single 6WINDGate instance twiddling packets for an entire data center. It is very modular and runs as an abstraction layer between the OS and the hardware, hopefully transparently to higher levels of user space code.
The initial obvious use is cell phone base stations, they need a lot of signal processing and data movement to support LTE workloads. In addition to the basics, 6WINDGate can do a lot of value added work, things like packet classification, packet prioritization, and video streaming all fall under the umbrella of unpredictable performance-hungry workloads. The Axxia + 6WIND team wants to accelerate all of that and more.
From there you get in to things like virtual devices and related workloads. If you have a high end Xeon server with 32-cores and a Terrabyte of memory, you can run lots and lots of VMs on it. Those VMs need to talk to each other and they need to have various controls run on the inter-VM traffic as well as the generic server I/O. There is a lot of traffic that passes through these virtual switches, and some of it needs a lot of rules applied. Worse yet, each day brings new and more complex services that quickly become mandatory to keep the whole paradigm not just functional, but secure and optimized. Blame management, not techs for that. These types of workloads are the targets for the joint offering.
In the end, you can think of the addressable markets for Axxia and 6WINDGate as anything with high bandwidth and high complexity that can benefit from added services. To the carriers this means better monetization, so they want as much performance as they can get as fast as they can get it. Security and stability are assumed, and time to market is critical as well. The more features you can add to the mix, the better your device looks vs the next guy in line.
So that is what LSI and 6WIND are doing, making a platform for companies to create this class of device on. Buyers get a hardware platform that is bundled with software that runs out of the box with lots of extras thrown in too. From there, you can customize the hardware, software, or both to your requirements, and hopefully end up with a low latency high bandwidth smart NIC, a router card for complex VMs, or a cell phone base station. If LSI and 6WIND did their job, the plumbing is done and debugged, and you can concentrate on value-added code. On the surface it sounds like a very interesting joint play.S|A