assemblyx86llvmbmi

Use a single bit as mask for a word(s)


I am writing an LLVM pass module to instrument every single memory operation in a program, and part of my logic needs to do some very hot binary logic on pointers.

How can I implement "bit ? u64_value : zero" in as few cycles as possible, preferably without using an explicit branch? I have a bit in the least significant bit of a register, and a value (assume u64) in another. If the bit is set, i want the value preserved. If the bit is zero, I want to zero out the register.

I can use x86 BMI instructions.


Solution

  • On AMD, and Intel Broadwell and later, CMOV is only 1 uop, with 1 cycle of latency. Or 2 uops / 2 cycles on Haswell and earlier. It's your best bet for conditionally zeroing a register.

    xor  r10d, r10d   # r10=0.  hoist out of loops if possible
    
    test    al, 1           # test the low bit of RAX, setting ZF
    cmovz   rax, r10        # zero RAX if the low bit was zero, otherwise unmodified
    

    (test r64, imm8 encoding doesn't exist, so you want to use the low-8 register if you're testing a mask that's all zero outside the low 8 bits.)

    If the bit-position is in a register, bt reg, reg only 1 uop on Intel and AMD. (bts reg,reg is 2 uops on AMD K8 through Ryzen, but plain bt that sets CF according to the value of the selected bit is cheap on AMD and Intel.)

    bt     rax, rdx      # CF = RAX & (1<<rdx)
    cmovnc rax, r10
    

    With both of these, the register you test can be different from the CMOV destination.

    See https://agner.org/optimize/ for more performance info, and also https://stackoverflow.com/tags/x86/info