cgccstruct

Structure padding clarification for 32-bit and 64-bit architecture


Understanding Struct Alignment and Padding in 32-bit and 64-bit Architectures

Code:

#include <stdio.h>
#include <stddef.h>

struct struct_a {
    int x;
    float y;
    char a;
    double z;
};

int main(void) {
    printf("sizeof struct %zu\n", sizeof(struct struct_a));
    printf("Offset of x: %lu\n", offsetof(struct struct_a, x));
    printf("Offset of y: %lu\n", offsetof(struct struct_a, y));
    printf("Offset of a: %lu\n", offsetof(struct struct_a, a));
    printf("Offset of z: %lu\n", offsetof(struct struct_a, z));
    return 0;
}

Observed Output:

Compiled for 64-bit (default or -m64 option):

sizeof struct 24
Offset of x: 0
Offset of y: 4
Offset of a: 8
Offset of z: 16

Compiled for 32-bit (-m32 option):

sizeof struct 20
Offset of x: 0
Offset of y: 4
Offset of a: 8
Offset of z: 12

Questions:

  1. Why is double placed at offset 12 in the 32-bit version when it's an 8-byte type?

    • I expected double to be aligned to an 8-byte boundary, but in 32-bit mode, it's aligned at 12 instead. I understand that a 32-bit system processes data in 4-byte chunks per cycle—does this influence the alignment of double?
  2. How should I explain structure padding in a job interview?

    • Given that compiler behavior may differ across platforms, what is the best way to frame my answer when asked about struct alignment and padding? Should I focus on general rules (like aligning to the largest type) or explain that alignment can vary depending on architecture and compiler optimizations?

I have checked the following stack overflow discussions, but they seem it more complicated examples for me to understand since I'm trying to understand as a beginner.


Edit:

I compiled -m32 and -m64 output with some code changes on my ubuntu system.

// use the same struct_a form the above code,
a.x = 0xFFFFFF; // 2 nibble should show 0x00 and rest all 0xff
a.c = 'A';
*(uint32_t *)&a.y = 0xFFFFFFFF;  // Force 0xff bytes for float
*(uint64_t *)&a.z = 0xFFFFFFFFFFFFFFFF;  // Force 0xff bytes for double

Memory layout from GDB for -m32 compiled binary

$1 = {x = 16777215, y = -nan(0x7fffff), c = 65 'A', z = -nan(0xfffffffffffff)}
(gdb) x/20bx &a
0x56559008 <a>:         0xff    0xff    0xff    0x00    0xff    0xff    0xff    0xff
0x56559010 <a+8>:       0x41    0x00    0x00    0x00    0xff    0xff    0xff    0xff
0x56559018 <a+16>:      0xff    0xff    0xff    0xff
// 0x56559008 x, 0x56559008 y, 0x56559010 a, < 3 byte padding >, 0x56559010 z.
(gdb) p &a.x
$6 = (int *) 0x56559008 <a>
(gdb) p &a.y
$7 = (float *) 0x5655900c <a+4>
(gdb) p &a.c
$8 = 0x56559010 <a+8> "A"
(gdb) p &a.z
$9 = (double *) 0x56559014 <a+12>

I tried declaring a as an array of 2 elements and there is no strict alignment of data with their multiples.

(gdb) x/40bx &a
0x56559040 <a>: 0xff    0xff    0xff    0x00    0xff    0xff    0xff    0xff
0x56559048 <a+8>:       0x41    0x00    0x00    0x00    0xff    0xff    0xff    0xff
0x56559050 <a+16>:      0xff    0xff    0xff    0xff    0xff    0xff    0xff    0x00
0x56559058 <a+24>:      0xff    0xff    0xff    0xff    0x41    0x00    0x00    0x00
0x56559060 <a+32>:      0xff    0xff    0xff    0xff    0xff    0xff    0xff    0xff
(gdb) p &a[0].z
$2 = (double *) 0x5655904c <a+12>
(gdb) p &a[1].x
$3 = (int *) 0x56559054 <a+20>

Memory layout from GDB for -m64 compiled binary

$1 = {x = 16777215, y = -nan(0x7fffff), c = 65 'A', z = -nan(0xfffffffffffff)}
(gdb) x/24bx &a
0x555555558010 <a>:     0xff    0xff    0xff    0x00    0xff    0xff    0xff    0xff
0x555555558018 <a+8>:   0x41    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x555555558020 <a+16>:  0xff    0xff    0xff    0xff    0xff    0xff    0xff    0xff
// 0x555555558010 x, 0x555555558014 y, 0x555555558018 c, < 7 byte padding >, 0x555555558020 z  
(gdb) p &a.x
$2 = (int *) 0x555555558010 <a>
(gdb) p &a.y
$3 = (float *) 0x555555558014 <a+4>
(gdb) p &a.c
$4 = 0x555555558018 <a+8> "A"
(gdb) p &a.z
$5 = (double *) 0x555555558020 <a+16>

So I tried executing the same for array, by declaring a[2] and assigning the same initialized values.

    a[0].x = 0xFFFFFF; a[1].x = 0xFFFFFF;
    a[0].c = 'A', a[1].c = 'A';
    *(uint32_t *)&a[0].y = 0xFFFFFFFF;  // Force 0xff bytes for float
    *(uint32_t *)&a[1].y = 0xFFFFFFFF;  // Force 0xff bytes for float
    *(uint64_t *)&a[0].z = 0xFFFFFFFFFFFFFFFF;  // Force 0xff bytes for double
    *(uint64_t *)&a[1].z = 0xFFFFFFFFFFFFFFFF;  // Force 0xff bytes for double
// GDB output
(gdb) p a
$1 = {{x = 16777215, y = -nan(0x7fffff), c = 65 'A', z = -nan(0xfffffffffffff)}, {x = 16777215, y = -nan(0x7fffff), c = 65 'A', z = -nan(0xfffffffffffff)}}
(gdb) x/48bx &a
0x555555558040 <a>:     0xff    0xff    0xff    0x00    0xff    0xff    0xff    0xff
0x555555558048 <a+8>:   0x41    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x555555558050 <a+16>:  0xff    0xff    0xff    0xff    0xff    0xff    0xff    0xff
0x555555558058 <a+24>:  0xff    0xff    0xff    0x00    0xff    0xff    0xff    0xff
0x555555558060 <a+32>:  0x41    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x555555558068 <a+40>:  0xff    0xff    0xff    0xff    0xff    0xff    0xff    0xff

Solution

  • Why is double placed at offset 12 in the 32-bit version when it's an 8-byte type?

    Because it is specified that way.

    Because document System V i386 ABI specification says alignment of double is 4 in this https://www.uclibc.org/docs/psABI-i386.pdf document in table2.1 on page 8. So next from 8 is 12.

    But document System V AMD64 ABI specifies alignment of double to be 8 according to this https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf document table 3.1 on page 12.

    The compiler creates structures according to an specification. All compilers use the same specifications, so the generated code can talk to each other.

    does this influence the alignment of double?

    Maybe there is i386 rationale but I wasn't able to find one. Reading Alignment of a struct with two doubles is 4 even though double is aligned to 8 (32bit) it would be better to align doubles on 8 on i386. But i386 ABI is what it is and is very very old and compilers want to produce portable output.