c++macoskernel-extensionxnu

Kext OSDynamicCast failed during OSObject::free


I've got IOKit driver that derives from IOService base class, and its raw pointer is delivered to some event callback function from kauth framework that may be called very frequently.

In order to extract this instance out of the pointer I uses the safe method OSDynamicCast, and I make sure that during driver teardown, I disable the kauth calls and flush all existing calls BEFORE freeing the driver. However, sometimes I still get kernel panic on the OSDynamicCast :

frame #0: [inlined] OSMetaClass::checkMetaCast(check=0xffffff802b28d3f0)
frame #1: [inlined] OSMetaClassBase::metaCast(OSMetaClass const*) const
frame #2: kernel`OSMetaClassBase::safeMetaCast

When I disabled and flushed the kauth calls before the OSObject::free on the IOService::stop callback, the problem hasn't repeats itself (at least after dozens of tries).

Perhaps anyone have an idea if there's any memory being freed during the period between ::stop and ::free that trigger this panic ?

Here's a small code that emphasis my design when It trigger panic.

kauth_callback(kauth_cred_t credential, 
               void *idata, /* This is the RAW pointer for my IOService based instance */
               kauth_action_t action,
               uintptr_t arg0, 
               uintptr_t arg1, 
               uintptr_t arg2, 
               uintptr_t arg3)
{
   ...
   // taking shared_lock with mutex 
   auto my_inst = OSDynamicCast(com_my_driver, reinterpret_cast<OSObject *>(idata));
   ...
}


void com_my_driver::free(IOService *provider) 
{
     kauth_unlisten_scope(my_listener_);
     // taking unique lock with mutex to make sure no outstanding kauth calls.
     super::free(provider); //calling OSObject free
}

And if I move the logic from ::free to ::stop it works :

void com_my_driver::stop(IOService *provider) 
{
     kauth_unlisten_scope(my_listener_);
     // taking unique lock with mutex to make sure no outstanding kauth calls.
     super::stop(provider); // Calling IOService::stop()
}

Solution

  • There is an inherent race condition in kauth_unlisten_scope. Your mutex solution almost certainly doesn't completely resolve the issue because your code in the callback can actually run after kauth_unlisten_scope() returns - in other words, the kauth callback hasn't locked the mutex yet.

    All you can do is sleep for a while after kauth_unlisten_scope() returns. Hopefully after a second or so, all kauth callbacks have successfully completed.

    If you want to be extra careful, you can also add a global boolean flag which is normally true, but gets set to false just before you unregister your kauth listener. You can test the flag on entering your callback, before locking the mutex etc. If that flag is false, return immediately from the callback. This at least prevents any access to dynamically allocated memory; however, it still doesn't 100% solve the problem in principle because the global variable will of course disappear when the kext is unloaded.

    Apple has known about the problem in the ~7 years I've been using the kauth APIs; they haven't fixed it in that time, and as they're planning on phasing out kexts altogether over the next few years, I don't think this is going to change.

    Side notes:

    Don't use reinterpret_cast<> to cast from opaque void* to a concrete pointer type. static_cast<> is designed for that purpose.

    Additionally, you can use a static cast instead of a dynamic one assuming you always pass objects of type com_my_driver to kauth_listen_scope(). In fact, you should be performing a static_cast<> to the static type that originally degraded to void*, which I suspect isn't OSObject in your case. If that's different from the dynamic type you're expecting, cast the result of that to the derived type.

    E.g. BAD:

    //startup
    com_my_driver* myobj = …;
    // com_my_driver* implicitly degrading to void*
    kauth_listen_scope("com_my_driver", kauth_callback, myobj);
    // …
    int kauth_callback(kauth_cred_t credential, 
        void *idata,
        kauth_action_t action, uintptr_t arg0, uintptr_t arg1, uintptr_t arg2, uintptr_t arg3)
    {
        // the void* came from a com_my_driver*, not an OSObject*!
        auto my_inst = static_cast<com_my_driver*>(static_cast<OSObject *>(idata));
    }
    

    BETTER:

    int kauth_callback(kauth_cred_t credential, 
        void *idata,
        kauth_action_t action, uintptr_t arg0, uintptr_t arg1, uintptr_t arg2, uintptr_t arg3)
    {
        auto my_inst = static_cast<com_my_driver*>(idata);
    }
    

    It's a fairly minor point, and assuming you aren't doing anything with multiple inheritance, which you shouldn't be doing in IOKit anyway, it'll compile down to the same code in practice, but it's skirting undefined behaviour. Additionally, the incorrect code is more confusing to read, and if you're using OSDynamicCast, less efficient - and efficiency is extremely important in kauth callbacks.

    Indeed so much so that I'd be wary of even locking a mutex on the "hot" path, which you suggest you are doing. This means you are literally single-threading all file I/O across all user processes in the whole system. Don't ship that kind of code to customers. Consider using a RW-Lock instead and only locking for reading in the general case, with write-locks reserved for when they're truly necessary.