objective-cnsdatac-stringsnsstringencoding

Can stringEncodingForData:encodingOptions:convertedString:usedLossyConversion: return NSUTF16StringEncoding or NSUTF32StringEncoding?


I'd like to know if calling stringEncodingForData:encodingOptions:convertedString:usedLossyConversion: can return NSUTF16StringEncoding, NSUTF32StringEncoding or any of their variants?

The reason I'm asking is because of this documentation note on cStringUsingEncoding::

Special Considerations

UTF-16 and UTF-32 are not considered to be C string encodings, and should not be used with this method—the results of passing NSUTF16StringEncoding, NSUTF32StringEncoding, or any of their variants are undefined.

So I understand that creating a C string with UTF-16 or UTF-32 is unsupported, but I'm not sure if attempting String Encoding Detection with stringEncodingForData:encodingOptions:convertedString:usedLossyConversion: may return UTF-16 and UTF-32 or not.

An example scenario, (adapted from SSZipArchive.m), may be:

// name is a null-terminated C string built with `fread` from stdio.h:
char *name = (char *)malloc(size_name + 1);
size_t read = fread(name, 1, size_name + 1, file);
name[size_name] = '\0';

// dataName is the data object of name
NSData *dataName = [NSData dataWithBytes:(const void *)name length:sizeof(unsigned char) * size_name];

// stringName is the string object of dataName
NSString *stringName = nil;
NSStringEncoding encoding = [NSString stringEncodingForData:dataName encodingOptions:nil convertedString:&stringName usedLossyConversion:nil];

In the above code, can encoding be NSUTF16StringEncoding, NSUTF32StringEncoding or any of their variants?


Platforms: macOS 10.10+, iOS 8.0+, watchOS 2.0+, tvOS 9.0+.


Solution

  • Yes, if the string is encoded using one of those encodings. The notes about C strings are specific to C strings. An NSString is not a C string, and the method you're describing doesn't work on C strings; it works on arbitrary data that may be encoded in a wide variety of ways.

    As an example:

    #import <Foundation/Foundation.h>
    
    int main(int argc, const char * argv[]) {
        @autoreleasepool {
            NSData *data = [@"test" dataUsingEncoding:NSUTF16StringEncoding];
            NSStringEncoding encoding = [NSString stringEncodingForData:data
                                                        encodingOptions:nil
                                                        convertedString:nil
                                                    usedLossyConversion:nil];
            NSLog(@"%ld == %ld", (unsigned long)encoding, 
                                 (unsigned long)NSUTF16StringEncoding);
        }
        return 0;
    }
    // Output:   10 == 10
    

    This said, in your specific example, if name is really what it says it is, "a null-terminated C string," then it could never be UTF-16, because C strings cannot be encoded in UTF-16. C strings are \0 terminated, and \0 is a very common character in UTF-16. Without seeing more code, however, I would not gamble on whether that comment is accurate.

    If your real question here is "given an arbitrary c-string-safe encoding, is it possible that stringEncodingForData: will return a not-c-string-safe encoding," then the answer is "yes, it could, and it's definitely not promised that it won't even if it doesn't today." If you need to prevent that, I recommend using NSStringEncodingDetectionSuggestedEncodingsKey and ...UseOnlySuggestedEncodingsKey to force it to be an encoding you can handle. (You could also use ...DisallowedEncodingsKey to prevent specific multi-byte encodings, but that wouldn't be as robust.)