I am trying to improve the performance of the threaded application with real-time deadlines. It is running on Windows Mobile and written in C / C++. I have a suspicion that high frequency of thread switching might be causing tangible overhead, but can neither prove it or disprove it. As everybody knows, lack of proof is not a proof of opposite :).
Thus my question is twofold:
If exists at all, where can I find any actual measurements of the cost of switching thread context?
Without spending time writing a test application, what are the ways to estimate the thread switching overhead in the existing application?
Does anyone know a way to find out the number of context switches (on / off) for a given thread?
While you said you don't want to write a test application, I did this for a previous test on an ARM9 Linux platform to find out what the overhead is. It was just two threads that would boost::thread::yield() (or, you know) and increment some variable, and after a minute or so (without other running processes, at least none that do something), the app printed how many context switches it could do per second. Of course this is not really exact, but the point is that both threads yielded the CPU to each other, and it was so fast that it just didn't make sense any more to think about the overhead. So, simply go ahead and just write a simple test instead of thinking too much about a problem that may be non-existent.
Other than that, you might try like 1800 suggested with performance counters.
Oh, and I remember an application running on Windows CE 4.X, where we also have four threads with intensive switching at times, and never ran into performance issues. We also tried to implement the core threading thing without threads at all, and saw no performance improvement (the GUI just responded much slower, but everything else was the same). Maybe you can try the same, by either reducing the number of context switches or by removing threads completely (just for testing).