So I've been exploring the 12 chapter in the picoCTF primer and suddenly saw difference in my assembly of the program and the picoCTF's in the end of main function, where the stack canary is being checked.
Their is xor rdx,QWORD PTR fs:0x28
and mine is sub rdx,QWORD PTR fs:0x28
I have AMD processor and my assembly uses the sub instruction to check the equality, but in their assembly there is xor. It does the same thing, I understand, but why is it like that? Isn't the xor operation more efficient and is it even because of the processor?
Older GCC used xor
, GCC10 and later use sub
after I suggested that optimization: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568
It's not dependent on -mtune=znver3
or anything, just GCC version.
Intel Sandybridge-family can macro-fuse sub/jcc
into a single uop, but can't for xor
.
On other CPUs, sub
and xor
are equal in performance for this, so it's a win on that family of Intel CPUs with no downside anywhere else.
AMD Zen 3 and later can fuse sub
or xor
.
Earlier AMD can only fuse test
and cmp
.