avxavx2xeon-phiavx512knights-landing

How to detect a Xeon Phi (Knights Landing)


Intel engineers wrote that we should use VZEROUPPER/VZEROALL to avoid costly transition to non-VEX state on all processors, including future Xeon processor, but not on Xeon Phi: https://software.intel.com/pt-br/node/704023

People have also measured and found out that VZEROUPPER and VZEROALL are expensive on Knights Landing:

36 clock cycles for both instructions in 64-bit mode (30 clock in 32-bit mode).

See the above link.

So my code will be the following, if I have just used ymm0 and ymm1:

if [we are running on a Xeon Phi]
     vpxor       ymm0,ymm0,ymm0
     vpxor       ymm1,ymm1,ymm1
else
     vzeroall
endif

How can I detect Xeon Phi (Knights Landing and later Xeon Phi processors) to implement the above code?

We now have the following situation now about the VZEROUPPER/VZEROALL:

  1. These instructions are not needed and are very costly on Xeon Phi Knight Landing 36 clock cycles for both instructions in 64-bit mode (30 clock in 32-bit mode).
  2. These instructions are very cheap and are needed on Xeon and Core processors (Skylake/Kaby Lake) and will be needed for Xeon in the foreseeble future, to avoid costly transition to non-VEX state.

The advertising materials claim that Xeon Phi (Knights Landing) is fully compatible with other Xeon processors.

Is there a reliable way to detect Xeon Phi, for the purpose of avoiding VZEROUPPER/VZEROALL?

There is an article "How to detect Knights Landing AVX-512 support (Intel® Xeon Phi™ processor)" by James R., Updated February 22, 2016, but it only focuses specific new instructions that became available on the Knights Landing. So it is still not very clear about the VEX transitions.

It would have been good to know whether Intel plans to implement a CPUID bit to show whether non-VEX state are costly? For example:

The above mentioned article about detecting Knights Landing suggests to check the bits AVX-512F+CD+ER+PF as introduced in Knights Landing.

So the code suggests to check all these bits at once, and if all are set, then we are on the Knights Landing:

uint32_t avx2_bmi12_mask = (1 << 16) | // AVX-512F
                           (1 << 26) | // AVX-512PF
                           (1 << 27) | // AVX-512ER
                           (1 << 28);  // AVX-512CD

It would have been good to know whether Intel plans to add these all bits to a simple Xeon (non Phi) or Core processors in the near future, so they will also support the AVX-512F+CD+ER+PF features introduced in the Knight Landding?

In case that Xeon and Core processor will support AVX-512F+CD+ER+PF, we won’t be able to distinguish Xeon from Xeon Phi.

Please advise.


Solution

  • If you specifically want to check for being on a KNL (rather than the more general "Does the CPU I am running on have feature X?") you can do that by looking at the "Extended Family", "Family" and "Model" fields in %eax after calling cpuid with %eax==1 and %ecx == 0. C++ code something like that below will do the job.

    However, as others are implicitly pointing out, this is a very specific test, and will, for instance, fail on future Knights cores, so you would likely be better doing as has been suggested and checking for AVX-512 features that are not in Xeon, so AVX512-ER and AVX512-PF. (Of course, such instructions could appear in future Xeons, so this is not guaranteed in the long term, but, quoting Keynes: "In the long term we're all dead" :-))

    class cpuidState
    {
        uint32_t orig_eax;                      /* Values sent in to the cpuid instruction */
        uint32_t orig_ecx;
    
        uint32_t eax;                           /* Values received back from it. */
        uint32_t ebx;
        uint32_t ecx;
        uint32_t edx;
    
        void cpuid()
        {
            __asm__ __volatile__("cpuid"
                                 : "+a" (eax), "=b" (ebx), "+c" (ecx), "=d" (edx));
        }
    
        void update (uint32_t eaxVal, uint32_t ecxVal)
        {
            orig_eax = eaxVal;
            orig_ecx = ecxVal;
            eax      = eaxVal;
            ecx      = ecxVal;
            cpuid();
        }
    
        void ensureCorrectLeaf(uint32_t eaxVal, uint32_t ecxVal)
        {
            if (orig_eax != eaxVal || orig_ecx != ecxVal)
                update (eaxVal, ecxVal);
        }
    
     public:
        cpuidState() : orig_eax (-1), orig_ecx(-1) { }
    
        // Include the Extended Model in the test. Without it we see some Xeons as KNL :-(
        bool onKNL()            { ensureCorrectLeaf(1,0); return (eax & 0x0f0ff0) == 0x50670; }    
    };