Cavium Thunder X ups the ARM core count to 48 on a single chip

Computex 2014: While others are stuck at 8, Cavium does things 6x more

Cavium logoCavium is upping the stakes in the ARM server SoC core count race with the new 48-core Thunder X chip. As SemiAccurate has been saying for years now, core counts are just a number if you can’t feed them and Cavium has that part down too.

Most people will be rather stunned by the 48-core count but that is not the most interesting part about the Cavium Thunder SoC, the uncore is equally compelling. Those cores are not just generic ARM A53/A57 or even this one, they are custom architectural license designs from Cavium.

I guess custom core is the new replacement for fab in that classic Jerry Sanders quote and Cavium has them. Not only are they bespoke, but they run up to 2.5GHz and number up to 48 per die but unfortunately those are all the details Cavium would give out on them. Given TDPs we doubt you will see the full 48 cores at 2.5GHz without a shrink but the end results should not be too far off that.

You might have noticed that all other ARM server SoC vendors are limited to eight cores per SoC and there is a good reason for that. GICv2 currently found in all ARM v8 SoCs hard limits core count to eight, one reason why we don’t have even sillier numbers in Asia market phones and tablets at the moment. Cavium wisely didn’t use GICv2 which could be a problem seeing as it is mandatory for SBSA Level 0.

In case you didn’t see where this is going, Cavium is the first and so far only company doing GICv3 based designs, something they pushed hard to do. GICv3 raises the possible core count to, wait for it, at least 48. Actually according to the SBSA guide, Level 2 needs GICv3 (Note: Level 1 uses something called GICv2m for some reason) and ups the core count to 2^28. This is projected to be sufficient for even the highest end Asia market phones through late 2016 and server SoCs roughly forever. That is how Cavium broke the core count ceiling, they just made the first GICv3, and SBSA Level 2, device.

As you might be aware Cavium has been in the business of making networking and throughput oriented SoCs for a long time. Their current large chips are called Octeon and are based on MIPS CPU cores which are also custom designed by the firm. These similarly large beasts are designed for markets like telco switching, backbone routing, and low latency packet manipulation at crazy speeds. They are packed with accelerators, offload engines, and offer lots and lots of I/O. As you can see from the smallest Octeon there is almost as much in the uncore as the core.

Thunder continues that tradition with a similarly complex uncore. Once again the picture below is the best block diagram Cavium gave out on the chip but it should give you an idea about what it is capable of. Octeon is aimed at switching, routing, and infrastructure tasks, Thunder is more compute and server oriented. While both have many uncore accelerator blocks in common, Thunder has some unique ones more suited to compute but lacks a few networking oriented features.

Cavium Thunder X block diagram

Thunder block diagram in really rough terms

The first uncore function is probably the most important for this class of chip, I/O. Cavium has a long history of doing this well and they have an ethernet fabric built-in to the SoC. It supports up to 100Gbps of external connectivity which you can split up almost any way you like. That would be 10×10, 2×40 plus a few, or a single 100Gbps link. This should be nearly enough for most use cases, or at least enough to saturate the compute side of the device. You can’t say the uncore won’t be enough to feed the cores with 100GbpE capabilities.

If you do need more, well there are six 4x PCIe3 controllers which you will probably see as three 8x slots in the first few reference designs. Either way there is a decent but not overwhelming amount of I/O here. There are also a few SATA6 ports, how many Cavium would not say but considering one of the ‘personalities’ they identified is a storage server, well it should be a fair number. One nice touch is that you can hard partition off segments of the chip to do things that telcos and other confidentiality aware verticals care about. I/O is mappable to the segments so for example you could map a 40GbE link and 4 SATA ports to a 16-core partition, and the rest to the other 32 cores. This is a necessary feature for this class of device, all high-end server SoC have it now.

Cavium Thunder 2S system diagram

A 2S reference system diagram

On the memory side Thunder X supports DDR3 and DDR4 natively at speeds of 2133/2400 respectively. One reference design picture had eight DIMMs per socket and a cap of 256GB per socket so 32GB DDR4 DIMMs are supported out of the box. This is a decent chunk of RAM but on a per core basis it is not that impressive. I am sure any OEM out there needing more has the capabilities to design it in so 256GB should be enough for the initial testing.

Cavium only lists a generic “Cloud Accelerators” block for the rest of the uncore so we can’t say much about them. Two uncore blocks that were identified in the presentation were a vNIC accelerator and the ability to support SMMUv2. The vNIC accelerator is kind of a no-brainer given that 48 cores pretty much guarantee virtualization on some level. Hardware handling of vNICs will make things a lot nicer for admins so no complaints about that one. We aren’t really sure about the SMMUv2 but we can assume that it will pop up in the next major SBSA revision in due time.

If those specs aren’t enough for you, Thunder X supports 2S configurations via CPI, also known as Cavium Processor Interconnect. You can coherently connect two Thunders for double the above specs, 96 cores, 512GB, 48 PCIe3 lanes, and all the rest. That makes a pretty impressive system if you think about it, per-core performance isn’t that impressive but per-socket throughput is simply massive.

There are two main Thunder SoCs coming, the 8-16 core CN87xx and 24-48 core CN88xx, both sampling in Q4. Both devices are made at Globalfoundries on their 28HP process. More interestingly is that the two reference designs, an ATX and a 1/2 SSI form factor, are being built by Gigabyte’s server division. They make solid stuff but get little credit so this pairing should be quite interesting, ARM server SoCs and standard form factors have promise.

Those two Thunder chips are being sold in four different guises, per ASIC we assume, called ‘personalities’ These personalities are defined by what Cavium fuses off to suit a workload. The Thunder X_ST is meant for storage servers so it has all the SATA and storage accelerators left on. The X_CP is aimed at compute workloads so it lacks some of the storage I/O and accelerators but has a full complement of cores and memory. X_SC is aimed at security and X_NT is for data plane applications, you can probably guess what those two bring to bear based one the names.

In a nutshell that is the new Cavium Thunder X line, two ASICs, four personalities, 48 cores, 2.5GHz, and lots of uncore goodies. It is similar in concept to the MIPS based Octeon line but replaces the MIPS cores with ARM and then goes on from there. Given Cavium’s history of telco oriented throughput processors, the new line should meet the most important goal of any many-many-many-core SoCs, keeping the cores fed. We won’t know how things will perform for certain until Q4 but on paper, it looks good so far.S|A

Have you signed up for our newsletter yet?

Did you know that you can access all our past subscription-only articles with a simple Student Membership for 100 USD per year? If you want in-depth analysis and exclusive exclusives, we don’t make the news, we just report it so there is no guarantee when exclusives are added to the Professional level but that’s where you’ll find the deep dive analysis.

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also a council member with Gerson Lehman Group. FullyAccurate