What do you get if you add ARM cores to a Cavium Octeon CPU? An Octeon TX SoC. It is the why and how of this new concept that SemiAccurate finds interesting.
For those not familiar with the Cavium line of products, the three surrounding the new Octeon TX are the Thunder X ARM CPU, the Octeon Fusion-M ‘base station on a chip’, and the older MIPS based Octeon III. Each does a different job, the Thunder X is a throughput oriented compute heavy product with accelerators, think control plane here. The Octeon III is more of a packet pusher with heavy accelerators and relatively less raw compute power, 16 cores vs 48 in the Thunder X, and definitely aimed at the data plane side. The odd one out is the Fusion-M, in essence an Octeon III with a DSP to handle the signaling side of the market, it lives up to the nickname of base station on a chip.
The newer Octeon TX is best described as an Octeon III with much more compute power, a combination of the Thunder TX and the older Octeon lines. It is still much more accelerator laden than the Thunder but carries over several advances from that side of the product spectrum too. You might want to think of it as a network service provider box on a chip. The layout looks like this.
Rough Octeon TX block diagram
It can have up to 24 of Cavium’s custom ARM-TX (v8.1 ~= A72 ISA) cores and all of the subsystems and accelerators you would find in the older Octeon lines. It also brings a bunch of features over from the Thunder X line of CPUs, most notably some virtualization and VNIC functionality. These may not strictly be needed, the TX can parse every byte in the headers of all inbound streams at line rate already, these features seem to be included more for software compatibility than anything else.
As an aside for the CPU geeks out there, there has been a raging debate among the uninformed about whether 32-bit hardware was required on a 64-bit ARM core. ARM has been very adamant that if a company could show that they would never need 32-bit functionality in their products, they could skip the hardware implementation but there hasn’t been any cores to point to that left it out. Guess what? The cores in the Octeon TX do not have hardware 32-bit functionality, they emulate it in firmware and software. Cavium never had any 32-bit ARM cores on the market and there is no 32-bit software that will run on the Octeon TX, so they got the green light to skip that unit. Debate over, back to the story.
That brings us to the biggest question out there, why is an Octeon + compute SoC needed? The answer there is the changing market, specifically on the service provider side of the world. As more and more things get moved to the cloud, the requirements for compute at each step of a packet’s path go up. Security, firewalls, content streaming, and simple networking features are getting more and more complex. That is the market for the Octeon TX.
Service providers used to run these tasks on an x86 CPU that had another CPU, SoC, or even separate computer attached for data plane work. The x86 would do the so called open control plane duties while the SoC or second server would have closed control plane software and of course the data plane wares. For those not familiar with these terms, think about the open control plane duties as software you would recognize as a user, the closed control plane wares as an embedded OS like you would find on a smart switch or firewall, and the data plane as a more realtime OS that makes sure packets get where they need to in hard realtime.
Going back to the changing world part, the open control plane tasks are multiplying fast. The TX line is aimed at what is called the embedded CPU market which now means things like security routers, storage servers, industrial controls, smarter switches, and content aware wireless endpoints. This category is not the dishwasher controllers you probably first thought of, and it is growing fast in both unit volume and compute requirements.
This open side of the control plane used to be open because it was traditionally an x86 box tethered to the closed control plane and the closed data plane. As happened in the PC world, the energy cost, latency, and overall efficiency of a multi-chip solution is under pressure and everything possible is moving to the SoC model. Since Intel is unlikely to give Cavium x86 cores to put on their chips, another way was needed and the industry has pretty much coalesced on ARM as the way forward. This explains the shift from MIPS cores in Octeon III to ARM ISA cores in the Octeon TX.
More importantly those customers are used to running open software and tools, Linux, LAMP stacks, and all the thousands of other plumbing bits that make the internet work. With x86 this is taken for granted, the overwhelming majority of these distros and components are made for x86 and may be ported to other architectures, sometimes. Over the last few years just about all open source software has been ported to ARM so it is on par with x86 for these types of applications.
For the service providers, OEMs, and the rest, their software stack doesn’t need massive upheavals when they go from x86 to ARM any more, it should just work out of the box. This solves the major complaint of the open control plane folk. The closed side is a bit thornier but much of that will be done by Cavium so the pain should be minimal. Most importantly the data plane side, the tricky hard realtime, hiccup sensitive bits are going to stay the same, the accelerators and uncore bits should be the familiar Octeon line wares. So in short the movement from Octeon + x86 to Octeon TX should be pretty easy, the hardest bits stay the same.
Embedded processor markets
We are now in a world of service provider applications running on virtualized hardware, often moving around closer to the edge and back out as demand warrants. Having a single architecture, ARM, to run this on means added flexibility for the service providers and better results for the users. Additionally customers are starting to demand that their apps run both in the ‘cloud’ and at specific points in the data path, think content streaming caches or advertising injection points.
Do you want that running in a data center half a continent away, or would one CPU and some heavy packet inspection at the cable modem endpoint be a better deal? For the local ‘big game’, it probably makes more sense to run ad injection as close to the users as possible, both for latency and locality of content reasons. For the rest of the time a datacenter may suffice so movement of the VM that does the job is critical, but things are probably all allocated dynamically now.
This is the world we live in, and those VMs and other wares running on the edge are coming on fast. They also explain some of the hardware features the Octeon TX pulled in from the Thunder X line, specifically VNICs. The Octeon line is made for packet twiddling in realtime so why bother with an obfuscation layer? The apps that are the core purpose of this line both need and expect them, so they were pulled in. Don’t think of things like this as a functional requirement, think of them as lowering the bar for software developers, that is the key. Cavium is making a software friendly SoC for users and service providers alike.
So what does the new CN82xx83/xx line bring to the table? Up to 24 ARM v8.1 cores running at up to 2.2GHz, 8MB L2 (Note: This is 4x what the old Octeons had and should give you a clue about how the TX is catering to random user applications now), and 2 DDR3/4 channels. For I/O there are 12 10GbE channels which can be arranged as 3x40GbE, 6x SATA 3.0 lanes, and 8x PCIe 3.0 lanes. Cavium says that the Octeon TX can do line rate crypto/SSL on a 40Gb link, quite the impressive trick for a chip of this class. In short it is a throughput oriented SoC with multi-user VM support built-in.
The entire family has six members ranging from the 1 memory channel CN8235 to the fully specced CN8350, the second digit connotes the memory spec more than anything else. MSRP ends under $500 for the high-end but volume customers usually don’t pay MSRP. This lines up pretty well with the competition Cavium lists, Intel’s Xeon-D, Rangely, and 64-bit ARM SoCs from Marvell and NXP. These are all 4-8 core devices with more single threaded performance per core than the Octeon TX but likely far less throughput and hardware dedicated to process separation. Pick the SoC that best suits your intended use case, or more likely use the one your service provider picked and smile.
So that is the world we live in now, or will be living in soon enough. Apps are provisioned dynamically and shuffled around to the best point along the wire to do the job right. Static isn’t really an option here, nor is separate data and control panes, be they open or closed. Cavium is addressing this market with their new ’embedded’ SoC that blends Octeon and Thunder X strengths into the Octeon TX. At least to SemiAccurate, it makes a lot of sense.S|A