objective-ccore-servicesjavascript-automation

JXA: Accessing CFString constants from CoreServices


JXA, with its built-in ObjC bridge, exposes enumeration and constants from the Foundation framework automatically via the $ object; e.g.:

$.NSUTF8StringEncoding  // -> 4

However, there are also useful CFString constants in lower-level APIs that aren't automatically imported, namely the kUTType* constants in CoreServices that define frequently-used UTI values, such as kUTTypeHTML for UTI "public.html".

While you can import them with ObjC.import('CoreServices'), their string value isn't (readily) accessible, presumably because its type is CFString[Ref]:

ObjC.import('CoreServices') // import kUTType* constants; ObjC.import('Cocoa') works too
$.kUTTypeHTML  // returns an [object Ref] instance - how do you get its string value?

I have yet to find a way to get at the string at the heart of what's returned: ObjC.unwrap($.kUTTypeHTML) doesn't work, and neither does ObjC.unwrap($.kUTTypeHTML[0]) (nor .deepUnwrap()).

I wonder:


Solution

  • While I don't understand all implications, the following seems to work:

    $.CFStringGetCStringPtr($.kUTTypeHTML, 0) // -> 'public.html'
    
    # Alternative, with explicit UTF-8 encoding specification
    $.CFStringGetCStringPtr($.kUTTypeHTML, $.kCFStringEncodingUTF8) // ditto
    

    The kUTType* constants are defined as CFStringRef, and CFStringGetCStringPtr returns a CFString object's internal C string in the specified encoding, if it can be extracted "with no memory allocations and no copying, in constant time" - or NULL otherwise.

    With the built-in constants, it seems that a C string (rather than NULL) is always returned, which - by virtue of C data types mapping onto JXA data types - is directly usable in JavaScript:

     $.CFStringGetCStringPtr($.kUTTypeHTML, 0) === 'public.html' // true
    

    For background information (as of OSX 10.11.1), read on.


    JXA doesn't natively recognize CFString objects, even though they can be "toll-free bridged" to NSString, a type that JXA does recognize.

    You can verify that JXA does not know the equivalence of CFString and NSString by executing $.NSString.stringWithString($.kUTTypeHTML).js, which should return a copy of the input string, but instead fails with -[__NSDictionaryM length]: unrecognized selector sent to instance.

    Not recognizing CFString is our starting point: $.kUTTypeHTML is of type CFString[Ref], but JXA doesn't return a JS string representation of it, only [object Ref].

    Note: The following is in part speculative - do tell me if I'm wrong.

    Not recognizing CFString has another side effect, namely when invoking CF*() functions that accept a generic type (or Cocoa methods that accept a toll-free bridged CF* type that JXA is unaware of):
    In such cases, if the argument type doesn't exactly match the invoked function's parameter type, JXA apparently implicitly wraps the input object in a CFDictionary instance, whose only entry has key type, with the associated value containing the original object.[1]

    Presumably, this is why the above $.NSString.stringWithString() call fails: it is being passed the CFDictionary wrapper rather than the CFString instance.

    Another case in point is the CFGetTypeID() function, which expects a CFTypeRef argument: i.e., any CF* type.

    Since JXA doesn't know that it's OK to pass a CFStringRef argument as-is as the CFTypeRef parameter, it mistakenly performs the above-mentioned wrapping, and effectively passes a CFDictionary instance instead:

    $.CFGetTypeID($.kUTTypeHTML) // -> !! 18 (CFDictionary), NOT 7 (CFString)
    

    This is what houthakker experienced in his solution attempt.

    For a given CF* function you can bypass the default behavior by using ObjC.bindFunction() to redefine the function of interest:

    // Redefine CFGetTypeID() to accept any type as-is:
    ObjC.bindFunction('CFGetTypeID', ['unsigned long', [ 'void *']])
    

    Now, $.CFGetTypeID($.kUTTypeHTML) correctly returns 7 (CFString).

    Note: The redefined $.CFGetTypeID() returns a JS Number instance, whereas the original returns a string representation of the underlying number (CFTypeID value).

    Generally, if you want to know the specific type of a given CF* instance informally, use CFShow(), e.g.:

    $.CFShow($.kUTTypeHTML) // -> '{\n    type = "{__CFString=}";\n}'
    

    Note: CFShow() returns nothing and instead prints directly to stderr, so you can't capture the output in JS.
    You may redefine CFShow with ObjC.bindFunction('CFShow', ['void', [ 'void *' ]]) so as not to show the wrapper dictionary.

    For natively recognized CF* types - those that map onto JS primitives - you'll see the specific type directly (e.g., CFBoolean for false); for unknown - and therefore wrapped - instances, you'll see the wrapper structure as above - read on for more.


    [1] Running the following gives you an idea of the wrapper object being generated by JXA when passing an unknown type:

    // Note: CFShow() prints a description of the type of its argument
    //  directly to stderr.
    $.CFShow($.kUTTypeHTML) // -> '{\n    type = "{__CFString=}";\n}'
    
    // Alternative that *returns* the description as a JS string:
    $.CFStringGetCStringPtr($.CFCopyDescription($.kUTTypeHTML), 0) // -> (see above)
    

    Similarly, using the known-to-JXA equivalence of NSDictionary and CFDictionary,

    ObjC.deepUnwrap($.NSDictionary.dictionaryWithDictionary( $.kUTTypeHTML ))
    

    returns {"type":"{__CFString=}"}, i.e., a JS object with property type whose value is at this point - after an ObjC-bridge call roundtrip - a mere string representation of what presumably was the original CFString instance.


    houthakker's solution attempt also contains a handy snippet of code to obtain the type name of a CF* instance as a string.

    If we refactor it into a function and apply the necessary redefinition of CFGetTypeID(), we get the following, HOWEVER:

    If anyone has an explanation for why the hack is needed and where the random characters come from, please let me know. The issues may be memory-management related, as both CFCopyTypeIDDescription() and CFStringCreateExternalRepresentation() return an object that the caller must release, and I don't know whether/how/when JXA does that.

    /* 
      Returns the type name of the specified CF* (CoreFoundation) type instance.
      CAVEAT:
       * A HACK IS EMPLOYED to ensure that a value is consistently returned f
         those CF* types that correspond to JS primitives, such as CFNumber, 
         CFBoolean, and CFString:
         THE CODE IS CALLED IN A TIGHT LOOP UNTIL A STRING IS RETURNED.
         THIS SEEMS TO WORK WELL IN PRACTICE, BUT CAVEAT EMPTOR.
         Also, ON OCCASION A RANDOM CHARACTER APPEARS AT THE END OF THE STRING.
       * Only pass in true CF* instances, as obtained from CF*() function
         calls or constants such as $.kUTTypeHTML. Any other type will CRASH the
         function. 
    
      Example:
        getCFTypeName($.kUTTypeHTML) // -> 'CFString'  
    */
    function getCFTypeName(cfObj) {
    
      // Redefine CFGetTypeID() so that it accepts unkown types as-is
      // Caution:
      //  * ObjC.bindFunction() always takes effect *globally*.
      //  * Be sure to pass only true CF* instances from then on, otherwise
      //    the function will crash.
      ObjC.bindFunction('CFGetTypeID', [ 'unsigned long', [ 'void *' ]])
    
      // Note: Ideally, we'd redefine CFCopyDescription() analogously and pass 
      // the object *directly* to get a description, but this is not an option:
      //   ObjC.bindFunction('CFCopyDescription', ['void *', [ 'void *' ]])
      // doesn't work, because, since we're limited to *C* types,  we can't describe
      // the *return* type in a way that CFStringGetCStringPtr() - which expects
      // a CFStringRef - would then recognize ('Ref has incompatible type').
    
      // Thus, we must first get a type's numerical ID with CFGetTypeID() and then
      // get that *type*'s description with CFCopyTypeIDDescription().
      // Unfortunately, passing the resulting CFString to $.CFStringGetCStringPtr()
      // does NOT work: it yields NULL - no idea why.
      // 
      // Using $.CFStringCreateExternalRepresentation(), which yields a CFData
      // instance, from which a C string pointer can be extracted from with 
      // CFDataGetBytePtr(), works:
      //  - reliably with non-primitive types such as CFDictionary
      //  - only INTERMITTENTLY with the equivalent types of JS primitive types
      //    (such as CFBoolean, CFString, and CFNumber) - why??
      //    Frequently, and unpredictably, `undefined` is returned.
      // !! THUS, THE FOLLOWING HACK IS EMPLOYED: THE CODE IS CALLED IN A TIGHT
      // !! LOOP UNTIL A STRING IS RETURNED. THIS SEEMS TO WORK WELL IN PRACTICE,
      // !! BUT CAVEAT EMPTOR.
      //    Also, sometimes, WHEN A STRING IS RETURNED, IT MAY CONTAIN A RANDOM
      //    EXTRA CHAR. AT THE END.
      do {
        var data = $.CFStringCreateExternalRepresentation(
                null, // use default allocator
                $.CFCopyTypeIDDescription($.CFGetTypeID(cfObj)), 
                0x08000100, // kCFStringEncodingUTF8
                0 // loss byte: n/a here
            ); // returns a CFData instance
        s = $.CFDataGetBytePtr(data)
      } while (s === undefined)
      return s
    }