Intel bolts bonus gubbins onto Skylake cores, bungs dozens into Purley Xeon chips

科技动态 The Register (源链)

Deep diveIntel has taken its Skylake cores, attached some extra cache and vector processing stuff, throw in various other bits and pieces, and packaged them up as Xeon CPUs codenamed Purley.

In an attempt to simplify its server chip family, Chipzilla has decided to rebrand the components as Xeon Scalable Processors , assigning each a color depending on the sort of tasks they’re good for. It’s like fan club membership tiers. There’s Platinum for big beefy parts to handle virtualization and mission-critical stuff; Gold for general compute; and Silver and Bronze for moderate and light workloads.

These new 14nmSkylake-based Purley Xeons fall under the rebrand. And in typical Intel fashion, it’s managed to complicate its simplification process by introducing a bajillion variations .

Before we get stuck in, here’s a summary of the base specifications, compared to last year’s Broadwell-based Xeon E5 v4 gang, plus the system architecture and socket topology:

Click to enlarge any picture (Source: Intel)

And here’s Intel’s slide laying out the main changes between the Skylake desktop cores and the Skylake cores in the Scalable Processor packages – AVX-512 vector processing and more cache, basically.

So let’s look at what you can now order, or at least enquire about, from today:

Xeon Platinum 81xx processors

Up to 28 cores and 56 hardware threads, can slot into one, two, four or eight sockets, can clock up to 3.6GHz, and each has 48 PCIe 3.0 lanes, six memory channels handling 2666MHz DDR4 and up to 1.5TB of RAM, up to 38.5MB of L3 cache, three UPI interconnects, AVX-512 vector processing with two fused multiple-and-add units (FMAs) per core. The power really depends on the part, going all the way up to about 200W.

Xeon Gold 61xx processors

Up to 22 cores and 44 hardware threads, can slot into one, two or four sockets, can clock up to 3.4GHz, and each has 48 PCIe 3.0 lanes, six memory channels handling 2666MHz DDR4 and up to 768GB of RAM, up to 30.25MB of L3 cache, three UPI interconnects, AVX-512 vector processing with two FMAs per core. The power really depends on the part, going all the way up to about 200W.

Xeon Gold 51xx processors

Up to 14 cores and 28 hardware threads, can slot into one, two or four sockets, can clock up to 3.7GHz, and each has 48 PCIe 3.0 lanes, six memory channels handling 2400MHz DDR4 and up to 768GB of RAM, up to 19.25MB of L3 cache, two UPI interconnects, AVX-512 vector processing with a single FMA per core. The power really depends on the part, going up to about 100W.

Xeon Silver 41xx processors

Up to 12 cores and 24 hardware threads, can slot into one, two or four sockets, can clock up to 2.2GHz, and each has 48 PCIe 3.0 lanes, six memory channels handling 2400MHz DDR4 and up to 768GB of RAM, up to 16.5MB of L3, two UPI interconnects, AVX-512 vector processing with a single FMA per core. The power really depends on the part, going up to about 85W.

Xeon Bronze 31xx processors

Up to eight cores and eight hardware threads, can slot into one or two sockets, can clock up to 1.7GHz, and each has 48 PCIe 3.0 lanes, six memory channels handling 2133MHz DDR4 and up to 768GB of RAM, up to 11MB of L3 cache, two UPI interconnects, AVX-512 vector processing with a single FMA per core. The power really depends on the part, going up to about 85W.

Intel has made a handy decoder chart for the part numbers. We note that the old Xeon E5 and E7 family map to the Gold 5xxx group.

How to understand the part numbers … Click to enlarge

So what’s new? What makes these server-grade Skylakes as opposed to the Skylakes in desktops and workstations? The big change is Intel’s new mesh design. Previously, Chipzilla arranged its Xeon coresin a ring structure, spreading the L3 cache across all the cores. If a core needed to access data stored in an L3 cache slice attached to another core, it would request this information over this ring interconnect.

This has been replaced with a mesh design – not unheard of in CPU design – that links up a grid of cores, as seen in the Xeon Phi family . This basically needed to happen in order to support more cores in an efficient manner. The ring approach only worked well up until a point, and that point is now: if you want good bandwidth and low latency when accessing L3 caches, a mesh – while more complex than a ring – is the way forward.

That’s a fine mesh you’ve got me into … For the mesh, the red lines represent bidirectional transfer paths and the yellow squares are switches at intersections (Click to enlarge either picture)

A core accessing an adjacent core’s L3 cache, horizontally or vertically, takes one interconnect clock cycle, unless it has to hop over an intersection, in which case it takes three cycles. The mesh is clocked somewhere between 1.8 and 2.4GHz depending on the part and whether or not turbo mode is engaged. So in the diagram above, a core in the bottom right corner accessing a core’s L3 cache to its immediate left takes one cycle, and four cycles to the next cache on the left (one hop then three hops).

Speaking of caches, the shared L3 blob has been reduced from 2.5MB per core to 1.375MB per core but the per-core private L2 has been increased from 256KB to a fat 2MB. That makes the L2 a primary cache with the L3 as an overflow. The L3 is also now non-inclusive from inclusive, meaning lines of data in the L2 may not exist in the L3. In other words, data fetched from RAM directly fills the core’s L2 rather than the L2 and the L3.

This is supposed to be a tune-up to match patterns in data center application workloads, particularly virtualization where a larger private L2 is more useful than a fat shared L3 cache.

You can also carve up a die into sub-NUMA clusters, a system that supersedes the previous generation’s cluster-on-die design. This – as well as the mesh architecture, various new power usage levels, and the new inter-socket UPI interconnect – is discussed in detail, and mostly spin free, by Intel’s David Mulnix here . UPI is, for what it’s worth, a coherent link between processors that replaces QPI.

There’s also an interesting new feature called VMD aka Intel’s volume management device: this consolidates PCIe-connected SSDs into virtual storage domains. To the operating system, you just have one or more chunks of flash whereas underneath there are various directly connected NVMe devices. This technology can be used to replace third-party RAID cards, and it is configured at the BIOS level. The Purley family also boasts improvements to the previous generation’s memory reliability features for catching bit errors.

While these new Xeons share many features present in desktop Skylake cores, there’s another new thing called mode-based execution (MBE) control. This is supposed to stop malicious changes to a guest kernel during virtualization. It repurposes the execution enable bit in extended page table entries to allow either execution in user mode or execution in supervisor (aka kernel) mode. By ensuring executable kernel pages cannot be writeable, a hypervisor can prevent guest kernels from being tampered with and hijacked by security exploits. This is detailed in section 3.1.1 in this Intel datasheet .

您可能感兴趣的

Intel Won’t Patch Older CPUs to Resolve Spectre Fl... When Spectre and Meltdown hit just after New Years, it kicked off a flurry of responses from companies like Intel, AMD, ARM, and Microsoft. Patc...
实际提速40%!英特尔的第八代cpu将比我们想象的更强大... 今年早些时候,我们对英特尔即将推出的第八代核心处理器没有太多期待。 但在今年5月的台北国际电脑展上,该公司透露,他们将比去年的芯片快30%,这让我们感到惊讶。 仅此一点就值得注意了,但现在英特尔公司却提出了一个更大胆的要求:他们的速度实际上要快40%。 至少,当它在系统中运行时...
英特尔联手华纳兄弟为诺兰新片《敦刻尔克》开发VR体验... 【TechWeb报道】7月8日消息,英特尔联手华纳兄弟和电影制作公司Practical Magic,为克里斯托弗·诺兰自编自导的新片《敦刻尔克》开发了VR(虚拟现实)体验。 英特尔表示,用户现在即可通过ViveVideo和影片官网的其它虚拟现实平台免费下载该虚拟现实体验。 ...
Intel Sued for Allegedly Defective, Exploding Phon... Several years ago, back when Intel was still trying to break into the smartphone market, the company announced a highly unusual partnership. ...
Huge success of Ryzen CPUs puts AMD back in the ga... We finally have a good measure of exactly howRyzen processors have boosted AMD’s share of the desktop CPU market, and the new hardware has made ...
The Register责编内容来自:The Register (源链) | 更多关于

阅读提示:酷辣虫无法对本内容的真实性提供任何保证,请自行验证并承担相关的风险与后果!
本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » Intel bolts bonus gubbins onto Skylake cores, bungs dozens into Purley Xeon chips



专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录