x86sseintrinsicssse4

What's the difference between __popcnt() and _mm_popcnt_u32()?


MS Visual C++ supports 2 flavors of the popcnt instruction on CPUs with SSE4.2:

  1. __popcnt()
  2. _mm_popcnt_u32()

The only difference I found was that the docs for __popcnt() are marked as "Microsoft Specific", and _mm_popcnt_u32() seems to be an intrinsic command name (non-MS-specific).

Is this the only difference, where the MS __popcnt() just calls the HW _mm_popcnt_u32()?


Solution

  • These are two different intrinsic names for the same machine instruction, thanks to Intel and AMD. The instruction is the same on all CPUs that support it, and the different intrinsics also have no difference in C or C++.


    The __popcnt*() builtins are for AMD's Advanced Bit Manipulation (ABM) instructions. See http://blogs.amd.com/developer/2007/09/26/barcelona-processor-feature-advanced-bit-manipulation-abm/

    The _mm_popcnt_u*() intrinsics are for Intel's implementation, which aren't part of SSE4.2 per se, but were implemented around the same time. See http://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT

    According to https://www.chessprogramming.org/Population_Count , both implementations are binary compatible, in spite of their different intrinsic names.

    Intel's architecture manual states that:

    Before an application attempts to use the POPCNT instruction, it must check that the processor supports SSE4.2 (if CPUID.01H:ECX.SSE4_2[bit 20] = 1) and POPCNT (if CPUID.01H:ECX.POPCNT[bit 23] = 1).

    AMD's AMD64 Architecture Programmer's Manual Volume 3: General Purpose and System Instructions says

    Support for the POPCNT instruction is indicated by ECX bit 23 (POPCNT) as returned by CPUID function 0000_0001h. Software MUST check the CPUID bit once per program or library initialization before using the POPCNT instruction, or inconsistent behavior may result.

    I can't see any reason why popcnt would require the presence of SSE4.2, so I think that checking bit 23 of ECX is sufficient to determine popcnt's presence.


    AMD's Barcelona, the first AMD CPU to have popcnt, didn't fully implement SSE4, so it's possible that Intel's architecture manual suggests a method for determine presence which will work on Intel CPUs and fail on even qualified AMD CPUs.

    Intel's current documentation for popcnt in their vol.2 instruction-set reference manual only says #UD If CPUID.01H:ECX.POPCNT [Bit 23] = 0 so the anti-competitive suggestion that would lead to software not taking advantage of popcnt on some AMD CPUs without SSE4.2 is gone.