cgccarmatmelstudiomemcmp

How to prevent Atmel Studio gcc 6.3.1 from optimizing 4-byte memcmp() to a 4-byte direct comparison?


Running Atmel Studio with its provided gcc 6.3.1 to build firmware for an Atmel/Microchip SAMV70 (ARM Cortex-M7) chip. I have code that compares a 4-byte input array to a 4-byte local array using memcmp(). When compiled with -O0 to disable optimizations it works fine. When compiled with -Os to optimize for size or with -O3 for max optimization, the compiler is replacing the memcmp() call with a direct 4-byte comparison (verified by examining the disassembly). Unfortunately the optimization also sometimes moves the local 4-byte array to an unaligned starting address, so while memcmp() would work fine the direct comparison triggers a HardFault due to unaligned access.

In my opinion this is 100% a compiler optimization bug (possibly gcc, possibly something Atmel added), but I'm stuck with the provided compiler so updating isn't an option. So here's my actual question: Is there a way to keep optimizations enabled but disable this particular optimization? Otherwise I'm stuck forcing the local 4-byte arrays to be 4-byte aligned or finding some other workaround.

Compiler version: gcc version 6.3.1 20170620 (release) [ARM/embedded-6-branch revision 249437] (Atmel build: 508)

Here's an example function that could trigger the fault:

bool example(uint8_t *input_data)
{
    uint8_t local_data[4] = { 0x00, 0x01, 0x02, 0x03 };

    return (memcmp(input_data, local_data, 4) == 0);
}

My code is always passing in a 4-byte-aligned input_data so that's not an issue, but once again it's bad form for the compiler optimizations to take that for granted.


Solution

  • Answering my own question since Eugene didn't post an official answer:

    From the gcc ARM options:

    By default unaligned access is disabled for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, and enabled for all other architectures.

    That means unaligned access is allowed by default for ARMv7-M. It turns out this makes sense, because from the ARMv7-M Architecture Reference Manual:

    The following data accesses support unaligned addressing, and only generate alignment faults when the CCR.UNALIGN_TRP bit is set to 1, see Configuration and Control Register:

    • Non halfword-aligned LDR{S}H{T} and STRH{T}.

    • Non halfword-aligned TBH.

    • Non word-aligned LDR{T} and STR{T}.

    Which means that ARMv7-M supports a limited set of unaligned accesses. However, it does not support all unaligned accesses:

    The following data accesses always generate an alignment fault:

    • Non halfword-aligned LDREXH and STREXH.

    • Non word-aligned LDREX and STREX.

    • Non word-aligned LDRD, LDMIA, LDMDB, POP, LDC, VLDR, VLDM, and VPOP.

    • Non word-aligned STRD, STMIA, STMDB, PUSH, STC, VSTR, VSTM, and VPUSH.

    And also:

    Accesses to Strongly Ordered and Device memory types must always be naturally aligned

    So here's the failure condition that prompted me to ask the initial question:

    1. gcc with optimization enabled was replacing a 4-byte memcmp() with a direct comparison, which is allowed by default, because unaligned accesses are allowed by default. So that is not a compiler bug.
    2. The area of flash memory that contained the data for that memcmp() was located in an MPU segment that was declared Strongly Ordered, which does not support unaligned access. Thus when the memcmp() was replaced with a direct compare, and the data fell on an unaligned address, the compare was triggering a HardFault.

    The fix, which Eugene got correct in his initial comment, is to add -mno-unaligned-access to the compiler options. In my case this still allowed the compiler to replace the memcmp() with a direct 4-byte comparison but it also forced the data to be 4-byte aligned, allowing the compare to succeed without triggering a fault condition.