Intel is releasing two new D-1500 Broadwell server CPUs today but the big news is the market segment. You might recall this new CPU from its previous names, Denverton or Broadwell SoC.
Update March 10, 2015 @ 1:10pm: Intel says this part is not Denverton, that is a different and still unreleased server SoC based on Atom cores. We won’t search and replace the names here and instead use this as an opportunity for readers to show off their coding skills by writing their own app to do that. Have at thee!
Today there are two new SKUs in the Intel server lineup, the 4-core D-1520 and the 8-core D-1540. Both run at 2.6GHz max for one core turbo, 2.5GHz for all core turbo, and 2.0/2.2GHz base clocks for the 1540/1520 respectively. Both have HT enabled, pull 45Wmax, and have the same 37.5 x 37.5mm package. Think of it as half a Grantley/Haswell-EP die with a generational update as polish.
Since they are Broadwell based both are 14nm chips with a die size of 14.764816 x 10.8526mm or 160.236642122mm^2. As an aside we are a little skeptical of the claims, our ruler says the die is 14.764813 x 10.8522mm, a pretty substantial difference. That is a joke, we didn’t measure it but the first numbers are really what Intel gave us. For future reference we will just say it is 160mm^2.
Cache is the standard Broadwell 32K each for L1 Icache and Dcache, 256K of L2 per core, and 1.5MB of (presumably shared) L3 per core. On the memory side, Denverton has a single 2-channel memory controller that supports DDR3L/1600 or DDR4/2133 with two DIMMs per channel. The family supports up to 32GB DIMMs so you can make a 128GB server based on these chips.
The package block diagram
Rounding out the features list we have 24 lanes of PCIe3 from the main dies and 8 lanes of PCIe2 pulled from the chipset so prepare for latency there. The chipset also adds 6 SATA3 lanes, 4 USB3, 4 USB2, and two 10GbE lanes, a fairly complete offering given the CPU power available. The D-1500 line is a well balanced low-end server part, something that has been missing from the Intel lineup for far too long.
Lets look a little deeper at the technical minutia on this first and only Broadwell based server offering. We will add the caveat that there is probably a lot more interesting stuff than we are going to discuss because Intel has refused to give any real core briefings since Westmere. While we would like to tell you about that kind of detail, Intel has made it impossible to do. That said, here is some of what we learned.
The core CPU block diagram
Broadwell looks to have a single ring similar to those present in server parts since Beckton/Nehalem-EX. There is one Home Agent/Memory Controller block hence the two channels of DDR plus one I/O ring stop. As we mentioned earlier, this keeps the I/O to CPU performance ratio about where it is in the bigger dies, everything is scaled down fairly equally. Some of the nifty RAS tricks that could be done with two channels are lost but that is pretty minor in the grand scheme of things. Everything is about what you would expect so far.
The Broadwell core itself adds AVX2 to the mix, that would be 256b integer math which is important to some. There is now a dual FMA engine so that means 2x the performance on that front but since we have no clue what Haswell had so that is about all we can say. Most FP math has been sped up, it now stands at 3 cycles instead of 5 for everything but FMA. The STLB is now 1.5K instead of 1K too in case you were wondering. Other instructions have been sped up too, a little here, a little there.
VM enter/exit latencies have been dropped from ~500 cycles to ~400 cycles, a major improvement given how much of everything is virtualized in the modern data center. There are a lot of other improvements on this front, starting with improved cache allocation which we wrote about here.
It now has eight IDs per cache slice but the storage side is still a generation away. Bandwidth monitoring has been added to the mix too but unfortunately we can’t tell you anything more than that. Most importantly though is the ability to post interrupts, effectively pooling them to avoid a VM enter to service only that. This should save a lot of needless work especially on certain types of tasks which generate high interrupt counts.
On the RAS side things look a lot like Grantley and far less like Centerton which lacked most of the big core CPU’s goodies. Denverton adds PCIe ECRC or End-to-End CRC to the mix, about the only real RAS advance from Grantley. It does lack the memory lockstep, mirroring, sparing, and scrubbing features of the bigger platform because these need a second memory controller to operate. In short the D-1500 is RAS feature complete with the hardware at hand.
There are a few more platform features on D-1500 that don’t fall cleanly into any category. Crypto instructions have been sped up and a few more added, plus there is a new random seed generator. Because Intel has been shown to weaken hardware security for the benefit of 3rd parties, we don’t suggest you trust this stuff, stick with open and verifiable software which may be technically inferior but isn’t compromised. That said, Broadwell can now generate compromised keys faster, this is good right?
On the good side transactional memory is now available on an Intel server part, the bug in the Haswell iteration has been fully quashed. Other than the basics of the tech though, we can’t tell you what it actually does in Broadwell or how, sorry. There is also a built-in processor trace now too, a very useful feature if you are debugging at a very low-level.
Another newly added feature is called Hardware Controlled Power Management, basically pulling power management down to a much lower level. That means the server can react to changes in far less time potentially saving a lot of power. The OS can give the hardware hints as to higher level operation or it can just let things go on without intervention. Since this is the first outing of the tech, the software support isn’t really there but in a year or two it will be and users should see big gains. This is a useful tuning knob for coders and admins to be able to turn.
Similarly to Haswell, the Broadwell generation has added intelligence to turbo functionality. If there are a lot of stalls registered but there is power available to up the clocks, that would just waste energy. Instead the D-1500 line will up the uncore clocks and speed everything up rather than making a core idle faster. This again could prove very worthwhile in the real world.
But wait, there’s more… all of which is not directly related, take ADR or Asynchronous DRAM Refresh for example. This tech essentially sees a system power loss and puts the DRAM into a battery backed self-refresh mode that lasts until power comes back on. Hopefully. In any case you will not see the real reason this exists for three generations, IE Broadwell+3, that is where this groundwork pays off.
SMAP or Supervisor Mode Access Protection is not related at all but to ADR but is related to SMEP or Supervisor Mode Execute Protection. SMAP prevents supervisor mode processes from interacting with user mode data and the other way around, SMEP does the same for execution. Like the 734 things before it, this one will make Windows secure, really, this time for sure. For the rest it will be transparent.
On the PCIe front there are two more additions, Non-Transparent PCIe Bridging which does what it says, allows redundancy across multiple PCIe lanes. Since it is non-transparent it requires software support but that won’t be an issue to anyone that needs this feature. This is useful as it stands but PCIe DualCast makes it far more sane to implement and could be interesting on the consumer side for GPUs too someday.
There are a lot more little details that we simply lacked the time and architectural background to dig into. Most everything on the Xeon-D/Denverton/D-1500 CPUs are improved, some of which is due to the Broadwell core, the rest to a new generation of uncore. Both SKUs are on sale today with the 4-core D-1520 costing $199 and the D-1540 a mere $581. Both replace the unloved and rather pointless Centerton but this generation fills a much more important niche, a server part that isn’t overkill for non-datacenter uses. That is a very good thing for users and possibly Intel too, mainly for financial reasons.
Note: The following is for professional and student level subscribers.
Disclosures: Charlie Demerjian and Stone Arch Networking Services, Inc. have no consulting relationships, investment relationships, or hold any investment positions with any of the companies mentioned in this report.
Latest posts by Charlie Demerjian (see all)
- Intel kills off the 10nm process - Oct 22, 2018
- HyperX ships it’s 60 millionth enthusiast memory module - Oct 15, 2018
- Bittware/Nallatech water cools 300W of Xilinx FPGA - Oct 12, 2018
- More on Intel’s 10nm process problems - Sep 17, 2018
- Intel puts out another 14nm 2020 server platform - Sep 11, 2018