I implemented a decision tree with if-else statements involving unsigned 16-bit variables (features) and constants (thresholds), as shown below:
static inline int16_6 tree(const uint16_t *input) {
if ( input[16] <= ((uint16_t)0) ) {
if ( input[15] <= ((uint16_t)71) ) {
if ( input[10] <= ((uint16_t)44) ) {
//...
However, when I disassemble the compiled file, each 16-bit constant seems to occupy 32 bits in ROM.
Disassembly of section .literal.tree:
00000000 <.literal.tree>:
0: 00000d3c
4: 0000138c
8: 000018e5
c: 00001867
10: 0000110c
// and goes on...
Question: Is it possible to make these literals use only 16 bits each in a 32-bit architecture?
My target is an ESP32 device. I ran xtensa-esp32s3-elf-objdump -dS tree.c.obj to look at the literals, and expected each spend only 16 bits of memory, but they seem to be placed into 32-bit entries.
Compiler flags are hidden inside of the SDK build system (IDF), and the sdkconfig file is huge to share, but the build stack is based on gcc and I was optimizing for speed, now for size with -Os (thanks Eric Postpischil), which reduced ROM memory occupied by this file by 1/3 and reduced average inference time. Awesome, but thresholds are still stored as 32-bit literals and the question remains.
No. On Xtensa (ESP32/ESP32-S3), constants that don’t fit in an instruction’s immediate field are materialized from a literal pool and fetched with L32R. A literal-pool entry is a 32-bit word, so each such constant costs 4 bytes even if the value would fit in 16 bits.
Why you’re seeing 4 bytes:
GCC emits L32R to load the constant into a register; L32R is a PC-relative 32-bit load from the pool. There’s no 16-bit “L16R” equivalent for literal pools on these cores. (Small values may be encoded with immediates like MOVI/ADDI, but once the value doesn’t fit, it becomes a pooled literal.)
What you can do instead (to actually use 16-bit storage):
Put thresholds in a table of uint16_t in .rodata (Flash) and load them at run time, instead of writing inline literals in expressions. That lets the linker pack them at 2 bytes each (modulo alignment), and the compiler can load them with 16-bit loads (l16ui) and then compare.