From the GCC documentation
On the Intel x86, the
force_align_arg_pointer
attribute may be applied to individual function definitions, generating an alternateprologue
andepilogue
that realigns theruntime stack
. This supports mixing legacy codes that run with a 4-byte aligned stack with modern codes that keep a 16-byte stack forSSE compatibility
. The alternate prologue and epilogue are slower and bigger than the regular ones, and the alternate prologue requires ascratch register
; this lowers the number of registers available if used in conjunction with theregparm attribute
. Theforce_align_arg_pointer
attribute is incompatible with nested functions; this is considered a hard error.
Specifically, I want to know what is a prologue, epilogue, and SSE compatibility?
From gcc manual:
void TARGET_ASM_FUNCTION_PROLOGUE (FILE *file, HOST_WIDE_INT size)
The prologue is responsible for setting up the stack frame, initializing the frame pointer register, saving registers that must be saved, and allocating size
additional bytes of storage for the local variables. file
is a stdio stream to which the assembler code should be output.
On machines that have “register windows”, the function entry code does not save on the stack the registers that are in the windows, even if they are supposed to be preserved by function calls; instead it takes appropriate steps to “push” the register stack, if any non-call-used registers are used in the function.
On machines where functions may or may not have frame-pointers, the function entry code must vary accordingly; it must set up the frame pointer if one is wanted, and not otherwise. To determine whether a frame pointer is in wanted, the macro can refer to the variable frame_pointer_needed
. The variable's value will be 1
at run time in a function that needs a frame pointer.
void TARGET_ASM_FUNCTION_EPILOGUE (FILE *file, HOST_WIDE_INT size)
If defined, a function that outputs the assembler code for exit from a function. The epilogue is responsible for restoring the saved registers and stack pointer to their values when the function was called, and returning control to the caller. This macro takes the same arguments as the macro TARGET_ASM_FUNCTION_PROLOGUE
, and the registers to restore are determined from regs_ever_live
and CALL_USED_REGISTERS
in the same way.
SSE (Streaming SIMD Extensions)
is a collection of 128 bit CPU registers. These registers can be packed with 4, 32-bit scalars after which an operation can be performed on each of the 4 elements simultaneously. In contrast it may take 4 or more operations in regular assembly to do the same thing.