[SOLVED] Why does AMD processor use sub instruction instead of xor to verify the stack canary?

Why does AMD processor use sub instruction instead of xor to verify the stack canary?

So I've been exploring the 12 chapter in the picoCTF primer and suddenly saw difference in my assembly of the program and the picoCTF's in the end of main function, where the stack canary is being checked. Their is xor rdx,QWORD PTR fs:0x28 and mine is sub rdx,QWORD PTR fs:0x28

I have AMD processor and my assembly uses the sub instruction to check the equality, but in their assembly there is xor. It does the same thing, I understand, but why is it like that? Isn't the xor operation more efficient and is it even because of the processor?

Solution

Older GCC used xor, GCC10 and later use sub after I suggested that optimization: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568
It's not dependent on -mtune=znver3 or anything, just GCC version.

Intel Sandybridge-family can macro-fuse sub/jcc into a single uop, but can't for xor.

On other CPUs, sub and xor are equal in performance for this, so it's a win on that family of Intel CPUs with no downside anywhere else.

AMD Zen 3 and later can fuse sub or xor.
Earlier AMD can only fuse test and cmp.