assemblyfloating-pointarmcortex-mfloating-point-conversion

Cortex-M7: What's the most efficient way to convert a 64-bit unsigned integer to a single-precision floating point number in assembler?


When I want to convert a 32-bit unsigned integer (e.g. residing in register r0) to a single-precision floating-point number for the VFP (e.g. to be stored in s0), I use:

vmov.f32        s0, r0
vcvt.f32.u32    s0, s0

However, surprisingly (to me at least) there's no assembly instruction for the conversion of 64-bit unsigned or signed integers to single-precision (or double-precision) floating-point numbers.

My way of getting this done looks like this:

bottomInt       .req r0
topInt          .req r1
bottomFloat     .req s0
topFloat        .req s1

@ Convert the 64-bit unsigned int:
vmov.f32         bottomFloat, bottomInt             
vcvt.f32.u32     bottomFloat, bottomFloat
vmov.f32         topFloat, topInt
vcvt.f32.u32     topFloat, topFloat

@ Prepare multiplication with 2^32:
multiplierInt    .req r2                            
multiplierFloat  .req s2
mov              multiplierInt, #0x10000
vmov.f32         multiplierFloat, multiplierInt
vcvt.f32.u32     multiplierFloat, multiplierFloat

@ Multiply the upper word of the unsigned int:
vmul.f32         topFloat, multiplierFloat          
vmul.f32         topFloat, multiplierFloat

@ Add the two floating-point numbers:
vadd.f32         finalFloat, topFloat, bottomFloat

Is there a better, more elegant way to accomplish this?


Solution

  • The method you propose is inexact even when an exact result is representable, so I wouldn't use that.

    The runtime library functions __aeabi_ul2f and __aeabi_ul2d provide the exact behaviour you have requested.

    In a general case I would suggest that you should just call these functions. For example, like: https://godbolt.org/z/j7jT6eWGY

    If (and only if) you need to do this repeatedly in piece of code that is such a hot-spot within your program that you cannot afford the overhead of a function call, then I suggest you disassemble the library code for these functions and place it inline.