Intel is talking about three things this ISC, Phi, SSF, and machine learning. In short it is about everything SemiAccurate expected them to talk about at the show and it all ties together.
Lets start out with the good stuff, the Knights Landing CPU aka Xeon Phi 72xx series CPUs. We told you about the architecture in-depth last year and the chips are finally out as a product. Take a Silvermont Atom, knead thoroughly, and add a sprinkle, that would be two, of 512b vector units. Multiply by 72 cores and bake in a 14nm fab until a golden-brown crust forms. Salt to taste. Optionally you can add whipped cream or an on-package Omni-Path adapter. SemiAccurate feels the whipped cream would look MUCH better than the ungainly Omni-Path adapter but Omni-Path definitely has higher bandwidth, see?
Does this remind you of anything clammy?
We will skip all the superlatives and performance claims and get right down to the heart of the value proposition, if a $2438-6254 CPU can be labeled ‘value’. The first Phi’s out on the market are going to be socketed rather than a PCIe card. They will boot a full consumer OS, in this case RHEL or SuSe Linux are blessed but others should work like they do on a normal x86 system. For the masochistic, Windows will run but it isn’t officially supported initially, expect it to get the blessing in a few months. The card variant will have an embedded Linux but that is more of a housekeeping OS, the host system will run the user facing OS.
All four SKUs for the new Phi
As you can see from the specs the price puts the two middle cards, the sweet spots for perf/watt and bandwidth, at the low-end of the large GPU energy draw spectrum and are comparable on price. Add in 15W for Omni-Path and you are still below the 300W usually sucked up by GPUs. Since these parts are a big vector engine with a little CPU bolted on, there won’t be much dark silicon here so Intel’s 14nm process is definitely a cut above the competition. For those into statistics, the Knight’s Landing die is 658mm^2 (20.853mm x 31.558mm) and has a bit more than 8B transistors.
Better yet a Phi will run bog-standard x86 code, no latency hit for PCIe traversal, no odd memory caps or mappings, 384GB DDR4 + 16GB of fast MCDRAM is a tad more than you can get on a modern GPU. By an order of magnitude or so, plus if you read the above link, the allocation is pretty flexible too. Couple this with modern GPU competitive performance of 6.09/3.05GF SP/DP and you have a pretty nice chip. The tools Intel provides for parallel programming, optimization, and the rest are top-notch too. In short Intel is offering a GPU performance part with an x86 learning curve, IE little to none.
For those wanting in on it you can either buy a 1S workstation/vertical server, from Intel or wait for system builders to start offering their wares. Intel is effectively sold out of the 7290 for the first few months, they should be back in stock by September. By then you should see a flood of other Phi/Knights Landing based systems on the market from a host of other vendors. This is partly because of part two of the story, SSF.
SSF stands for Scalable Systems Framework or an all encompassing hardware, design, and certification system for Intel products in the HPC/supercomputing market. Designing things well on this end of the market is not as easy as plugging in Ethernet ports and certification for various software packages is another painful slog. Intel is trying to take the pain out of the whole process by making and certifying all the building blocks possible beforehand. With luck a VAR will only have to pick the right blocks, build the system, and spend the time tuning and optimizing rather than reinventing the wheel and doing the paperwork to prove it. If this sounds a lot like the old Cluster Ready program, SSF has now subsumed and replaced that program and expanded its reach.
Last up is an interesting one, Artificial Intelligence. Intel has been rather quiet about this area while the GPU and ASIC guys get all the credit. That changes today with Intel shouting quite loudly about their AI program, both hardware and software. The first point they make is pretty interesting, most AI tends to run on one box because scaling it is really hard. This is why you see 8-GPU systems with as much memory as possible, a painfully expensive configuration vs 8x 1-GPU 1-CPU or 4x 2-GPU 1-CPU systems.
One of these things scales well
In case you missed the bit about Phi being bootable, having Omni-Path on package, and having 10x+ the memory of a GPU, these things matter a lot for AI. Intel’s chips provide better than GPU performance per Watt, don’t need an expensive CPU to host the code, and have more local memory space too. The most informative part about the scaling on AI workloads is the transition from 32 nodes to 128, Intel’s offerings can, GPUs possibly could but like Intel, I couldn’t find any public data on it. Now so you see why Nvidia is pushing NVLink?
Intel has all the relevant AI libraries optimized for their CPUs like they do with most software so the optimization of your cluster should be less about plumbing and more about useful things. Add portability between Phi and normal Xeons and you have a lot of headaches simply avoided. With the woes of Nvidia’s Pascal GP100 device being delayed until 2017, Phi is looking pretty good especially for greenfield builds.
If you haven’t got the big picture yet, let me fill in what Intel is trying to do in the HPC/supercomputer arena. Instead of selling CPUs or even servers, they are trying to push out complete solutions be it from a VAR or self-made clusters. To do this they are putting the pieces together into building blocks, offering tools, libraries, and optimized software, and pre-certifying the results as much as possible. This should seriously lower the cost of entry into the space and hopefully give the Intel based solutions a much better TCO than the competition. With the new offerings today, they look to have nearly all the pieces in place from silicon to VARs.S|A