assemblyx86nasmssesse4

Move data from memory(could be of any length) to XMM


I know little much of assembly(NASM), i wanted to perform string operation(substring present or not) using SSE4.2. So i learnt how PCMPESTRI, PCMPISTRM works. I am stuck in the middle i.e data transfer from memory to xmm register. Basically, I wanted to take input via command line (eg: ./a.out ABCD) and transfer to a xmm1 register. Taking input via command line could be of any length string i.e(1 - more than 16), and command line data is stored with appended by 0(i.e ABCD\0) and we get its starting address which is present in stack. So how do i make command line data align to 16 bytes (ABCD\0\0\0\0... Upto 16) ?

Also i don't want to allocate memory using brk system call and copy all the comandline data to it and then transfer to xmm1 register.(Beacuse i wanted to achieve substring check in just one go instead of moving all the data to newly allocated memory and then copy every contents.... which may increase execution time)

I tried to do this:-

section .data
align 16 ; I thought that command line data is stored in data section and may align to 16. :-(
 ...

section .bss
...
section .text
...

But it didn't worked.. So how do i achieve to transfer data to xmm register by considering input could be of varible of length (1 - more than 16)

which move instruction should i use?

How should i solve this data movement where input will be from command line and it can be of any length..?

My CPU info flags(/proc/cpuinfo) is: sse sse2 ssse2 sse4_1 sse4_2


Solution

  • Command line args are on the stack, not in .data. Aligning .data is totally irrelevant.

    Related: Is it safe to read past the end of a buffer within the same page on x86 and x64?. You don't align your buffer, you just check that a 16-byte load won't cross into a new page (i.e. that ptr & 4095 <= (4096-16)).

    If you don't know that, you can't safely use movdqu and have to fall back to another strategy. (Like maybe a 16-byte load that loads the last 16 bytes of the page, and maybe look up a pshufb control vector from a sliding window of db 0,1,2,3,4,...,-1,-1,-1 that will shuffle the bytes you actually want to the bottom of an XMM register).

    Processing unaligned implicit-length strings with SIMD is generally inconvenient because the semantics of what's safe to read depend on looking one byte at a time. (Except for taking advantage of the fact that memory protection has page granularity).