I'm wondering if simply compiling my msvc project with sse/sse2 will have any effect. I do for example vector normalization and dot product but I do these wth math, not any specific function. Is there like sse_dot() and sse_normalize() that I should be using to actually take advantage, or will the compiler know?
As I understand it, using the sse2 compiler option will result in the compiler using the scalar not vector sse2 instructions in place of normal fpu code. I don't think it will do any vectorisation. The sse2 scalar stuff is quicker than fpu for sure.
To use the vector unit you need to use either intrinsics directly ( xmmintrin.h ) or use 3rd party libs that do. If you're just doing simple vector/matrix stuff for rendering, the Bullet SDK has an sse optimised vector math lib that's not bad. IIRC the DirectX/XNAmath lib is sse optimised too.
If neither of those take your fancy, Google should turn up a number of alternatives.