I'm measuring ARM cortex R5f processor performance by running coremark benchmark using different scenarios. one scenario is to set the STACK on ATCM memory.
when compiling without inline
flag, STACK on TCM get better results. and when compiling with inline
flag, STACK on RAM get better result.
how could this be explained given that TCM is faster and closer to processor.
there is no stack-overflow in my program when setting stack on TCM.
How could this be explained given that TCM is faster and closer to processor.
Is your TCM faster than the L1 data cache? It isn't always (many designs have single cycle L1 D cache, but two cycle access to TCM).
The usual purpose of TCM is not performance (although it is nice), but predictability - you can't get cache misses in TCM so real-time systems use it for timing critical code and data sections.