I want to test the performance of a userspace program in linux running on x86. To calculate the performance, it is necessary for me to flush specific cache lines to memory (make sure those lines are invalidated and upon the next request there will be a cache miss).
I've already seen suggestions using cacheflush(2) which supposed to be a system call, yet g++ complains about it is not being declared. Also, I cannot use clflush_cache_range which apparently can be invoked only within a kernel program. Right now what I tried to do is to use the following code:
static inline void clflush(volatile void *__p)
{
asm volatile("clflush %0" : "+m" (*(volatile char __force *)__p));
}
But this gives the following error upon compilation:
error: expected primary-expression before ‘volatile’
Then I changed it as follows:
static inline void clflush(volatile void *__p)
{
asm volatile("clflush %0" :: "m" (__p));
}
It compiled successfully, but the timing results did not change. I'm suspicious if the compiler removed it for the purpose of optimization. Dose anyone has any idea how can I solve this problem?
The second one flushes the memory containing the pointer __p
, which is on the stack, which is why it doesn’t have the effect you want.
The problem with the first one is that it uses the macro __force
, which is defined in the Linux kernel and is unneeded here. (What does the __attribute__((force)) do?)
If you remove __force
, it will do what you want.
(You should also change it to avoid using the variable name __p
, which is a reserved identifier.)