grand-central-dispatchosx-yosemitensthreadmachusleep

Delaying Code Execution on OSX 10.10


I've encountered a very strange issue that affects my code running only on OSX 10.10 systems. I've seen this anomaly occur on over 25 OSX 10.10 systems running my code, whereas the exact same code did not exhibit this behavior before upgrading (10.7). Furthermore, this issue is not 100% reproducible in that it occurs randomly ~0-5% of the time. While testing the code, nothing else critical or CPU exhaustive is occurring on the machine. Even if something else was happening, the very fact that the delays I'm experiencing are soooo ridiculously long make that conclusion seem suspicious. Anyhow, without further ado, take a look at the following NSLog prints:

12:00:05.766 MyApp[59957:6540517] Time_To_Wait: 679000000, Fire_Date: 270946738287700, Cur_Date: 270946059302734
12:00:26.446 MyApp[59957:6540517] Resume...

The times are in nanoseconds. According to the NSLog timestamps, we actually ended up waiting 20.68 seconds, rather than the desired 0.679 secs. Now, let's take a look at the code:

NSLog(@"Time_To_Wait: %lld, Fire_Date: %lld, Cur_Date: %lld", time_to_wait, fire_date, mach_absolute_time());
mach_wait_until(fire_date);
NSLog(@"Resume...");

In case you're wondering what mach_wait is, it's a high resolution timer available default. Just include

#include <mach/mach.h>
#include <mach/mach_time.h>

But that doesn't matter because I've experienced the exact same issue if I replace mach_wait_until with:

Doesn't matter what method of delay I try to use, I use NSLog to print my 'delay' value to the console as a sanity check to ensure it's the correct # of ms, then fire one of the above. ~95% of the time it behaves according to specification. But occasionally randomly, I'll get some whack delay like 10-20 seconds. What gives? Is this some Yosemite kernel issue? Again, I've never experienced this w/ the same code running on 10.7. And I've tested all of the above methods for delaying the execution of code.


UPDATE:

Following comment advice, I went ahead and sudo dtruss -f -e sudo -u USER MyApp 2> ~/myapps.log. As usual, most of the mach_wait's and sleeps behaved the way they were supposed to. So to make things easier to identify, I added in a check where if the actual sleep duration is > 3x what it's supposed to be, it prints out DELAY ISSUE. Ran my program and did a search for delay issue. Listed below are the most frequent system calls (dtruss output) that occur between the log print immediately preceding and succeeding the mach_wait statements:

There are about ~250+ of the above two calls. Probably a few more workq than __semwait's. The semwait's all usually take about 30000 'units' of time to complete, whatever that is. Dunno how dtruss times. These make up the bulk of the calls.

These three system calls took the longest # of units of time. And they were each appeared once in the output during our ~20sec wait gap.

Not sure what to do with the above =X


Solution

  • See comments above

    The problem appears to be connected to Apple's Timer Coalescing, a new feature in Mac OS X 10.9 Mavericks that rejigs the firing times of timers by up to a few milliseconds to merge them, thus allowing the processor to make fewer power-state transitions and to remain idle for longer. The benefit is greatly reduced power consumption.

    A white paper by Apple on the topic that mentions it briefly is here.

    The solution that was retained in the comments is to disable TC, as follows:

    sudo sysctl -w kern.timer.coalescing_enabled=0