I am trying to use some SSE4.2 intructions in string matching algorithms, coded in c++.
I do not understand how to use these instructions to match smaller patterns, and was hoping somebody could help me out with that.
In the code example, I am trying to find the pattern "ant" within the packed string "i am an antelope". I would hope for the result to be a mask set to all zeros except for a 1 at the index 8.
This is my code now, which has #include for nmmintrin.h to include sse4.2 instructions:
void print128_num(__m128i var)
{
uint8_t *val = (uint8_t*) &var;
printf("Text: %i %i %i %i %i %i %i %i %i %i %i %i %i %i %i %i \n",
val[0], val[1], val[2], val[3], val[4], val[5],
val[6], val[7], val[8], val[9], val[10], val[11],
val[12], val[13], val[14], val[15]);
}
int main(){
__m128i s = _mm_set_epi8('e','p','o','l','e','t','n','a',' ','n','a',' ','m','a',' ','i');
__m128i p = _mm_set_epi8(0,0,0,0,0,0,0,0,0,0,0,0,0,'t','n','a');
print128_num(s);
print128_num(p);
__m128i res = _mm_cmpestrm(s, 16, p, 3, 0);
print128_num(res);
return 0;
}
I added all the zeros because the initializing function won't allow less arguments. I realize this is wrong but didn't know how to do it and made several quite desperate attempts.
Anyway this is how I compiled: g++ -g sse4test.cpp -o sse4test -std=c++11 -msse4.2
and this is my output:
Text: 105 32 97 109 32 97 110 32 97 110 116 101 108 111 112 101
Text: 97 110 116 0 0 0 0 0 0 0 0 0 0 0 0 0
Text: 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
which I do not understand, really. (the last line).
Any help would be very much appreciated.
There are two problems with your code. First off, you have the source and the pattern reversed in the call to _mm_cmpestrm
. Secondly, you are specifying 0
for the last argument, which is a set of flags specifying the operating mode.
A mode of zero comes out as _SIDD_CMP_EQUAL_ANY
, described as For each character c in A, determine whether any character in B is equal to c.
For a substring search the mode should be specified as _SIDD_UBYTE_OPS | _SIDD_CMP_EQUAL_ORDERED | _SIDD_BIT_MASK
.
If you do these changes the output is "0 1", or in other words, a match at the 9:th character.
BTW: You can load from strings by using _mm_loadu_si128((__m128i*)(str));
instead of using _mm_set_epi8
.