serverintelxeon-phi

Differences between current gen Xeon Processors


What's the actual differences between Xeon W series, Bronze, Silver, Gold and Platinum series?

With earlier versions of Xeons, The E3 were single socket CPU's. whereas E5's could be used in motherboards with two sockets. The E7's were quad sockets supported (probably 8 too)

However, with the current generation Xeon's, Most of the lineup has a scalability of 2S (2 processors in one Motherboard)

If Xeon Silver and Xeon Platinum could be used in a dual-socket motherboard, why would I need a platinum processor, which is atleast 5X more expensive than the silver part? Unless there are other differences.

What are the differences between the current-gen Xeon processors? I see some differences in cache size. Other than that, I couldn't find anything else.


Solution

  • Gold/Platinum has more cores per socket, and/or higher base or turbo clocks. That's most of what you're paying for.

    The extra UPI links that let them work in 4S or higher systems aren't relevant when being used in a 2 socket system, but that's not the only feature. Presumably it's only a small part of the cost. With the change from inclusive L3 cache to non-inclusive, Skylake Xeon and later already need a snoop filter separate from L3 tags even for single-socket, unlike Xeon E5 which just broadcast everything to the other socket. Presumably Xeon-SP's snoop filter can work for filtering snoops to the other socket as well so it didn't need to be a separate feature for 1S vs. 2S.


    e.g. the top-end 2nd-gen (Cascade Lake) Intel® Xeon® Platinum 9282 Processor has 56 cores (112 threads), max turbo = 3.8 GHz, base clock = 2.6 GHz, and 77 MB of L3 Cache.

    The top-end Silver is Intel® Xeon® Silver 4216: 16c/32t 3.2 GHz turbo, 2.10 GHz base, 22 MB L3 cache.

    Despite have almost 4x the cores, sustained and peak turbo clocks are higher on the Platinum. (With a 400W TDP, vs. 100W for the Silver! Less-insane Platinum chips are lower TDP, e.g. a 32c/64t with 2.3GHz base / 3.7GHz turbo is 250W TDP).


    Also, some (all?) Silver / Bronze CPUs only have one AVX512 FMA execution unit so throughput for 512-bit SIMD FP math instructions is reduced, including all FP math and int<->FP conversions, and _mm512_lzcnt_epi32. Look for the # of AVX-512 FMA Unit line on the Ark page for a specific CPU. For integer SIMD, only multiply is affected. (In hardware, SIMD integer multiply uops run on the FMA units.) Shifts, blends, shuffles, add/sub, compare, and boolean all have separate vector ALUs which are 512 bits wide and don't take as much die area as multipliers.

    Even that top-end Silver 4216 Cascade Lake has only 1 512-bit FMA unit.

    Running AVX2 code there's zero difference. Even AVX512 using only 256-bit vectors is fine. (gcc -march=skylake-avx512 defaults to -mprefer-vector-width=256 because using 512-bit vectors at all reduces max turbo temporarily. It wants to avoid the case where one unimportant 512-bit-vectorized loop gimps the clock speed for the rest of the program that spends most of its time in scalar code.)

    But if you're doing heavy AVX-512 FP number crunching you probably want a CPU with 2 FMA units and to compile with 512-bit vectors.


    IDK why you tagged this Xeon Phi; that's a totally different microarchitecture.