I was doing the integration task with the FPU before, now I'm struggling with SSE.
My main problem is that, when I was using the FPU stack, there was the fsin
instruction, which could be used on the number at the top of the stack (st0
).
Now I want to calculate the sine of all four single precision numbers in XMM0
, or calculate it somewhere else and move into XMM0
. I'm using the AT&T syntax.
I think the second idea is actually possible, but I don't know how.
Does anybody know how to do it?
Three options:
sin
on SSE vectors.sin
function using SSE.Store the vector to memory, use fsin
to compute the sine of each element, and load the results. Assuming that your stack is 16-byte aligned and has 16-bytes of space, something like this:
movaps %xmm0, (%rsp)
mov $3, %rcx
0: flds (%rsp,%rcx,4)
fsin
fstps (%rsp,%rcx,4)
sub $1, %rcx
jns 0b
(1) is almost certainly your best bet performance-wise, and is also the easiest. If you have significant experience writing vector code and know a priori that the arguments fall into some range, you may be able to get better performance with (2). Using fsin
will work, but it's ugly and slow and not particularly accurate, if that matters.