assemblyx86-64ssefloating-point-conversionsse3

convertion of four packed single precision floating point to unsigned double words in x86-SSE


Is there a way to convert four packed single precision floating point values to four double words in x86 with SSE extension? The closest instruction would be CVTPS2PI, but it cannot be executed on two xmm registers, instead should be given as CVTPS2PI MM, XMM/M64. What if I want something like <conversion_mnemonic> XMM, XMM/M128?

Thanks. Iman.


Solution

  • x86 doesn't have native support for FP<->unsigned until AVX512, with vcvtps2udq (https://www.felixcloutier.com/x86/vcvtps2udq). For scalar you normally just convert to 64-bit signed (cvtss2si rax, xmm0) and take the low 32 bits of that (in EAX), but that's not an option with SIMD.

    Without AVX-512, ideally you can use a signed conversion (cvtps2dq) and get the same result. i.e. if your floats are non-negative and <= INT_MAX (2147483647.0).

    See How to efficiently perform double/int64 conversions with SSE/AVX? for a related double->uint64_t conversion. The full-range one should be adaptable from double->uint64_t to float->uint32_t if you need it.

    Another possibility (for 32-bit float->uint32_t) is just range-shifting to signed in FP, then flipping back in integer. INT32_MIN ^ convert(x + INT32_MIN). But that introduces FP rounding for small integers because INT32_MIN is outside the -224 .. 224 range where a float can represent every integer. e.g. 5 would be rounded to the nearest multiple of 28 during conversion. So that's not usable; you'd need to try straight conversion and range-shifted conversion, and only use the range-shifted conversion if straight conversion gave you 0x80000000. (Perhaps using the straight conversion result as a blend control for SSE4 blendvps?)


    For packed conversion of float->int32_t, there is SSE2 cvtps2dq xmm, xmm/m128 docs. (cvttps2dq converts with truncation toward 0, instead of the current default rounding mode (nearest, if you haven't changed it).)

    Any negative float less than -0.5 will convert to integer -1 or lower; as an uint32_t that bit-pattern represents a huge number. Floats outside the -231..231-1 range get converted to 0x80000000, Intel's "integer indefinite" value.


    If you didn't find that, only cvtps2pi signed conversion into an MMX register, you need better places to search: