objective-cnscharacterset

NSArray from NSCharacterSet


Currently I am able to make array of Alphabets like below

[[NSArray alloc]initWithObjects:@"A",@"B",@"C",@"D",@"E",@"F",@"G",@"H",@"I",@"J",@"K",@"L",@"M",@"N",@"O",@"P",@"Q",@"R",@"S",@"T",@"U",@"V",@"W",@"X",@"Y",@"Z",nil];

Knowing that is available over

[NSCharacterSet uppercaseLetterCharacterSet]

How to make an array out of it?


Solution

  • The following code creates an array containing all characters of a given character set. It works also for characters outside of the "basic multilingual plane" (characters > U+FFFF, e.g. U+10400 DESERET CAPITAL LETTER LONG I).

    NSCharacterSet *charset = [NSCharacterSet uppercaseLetterCharacterSet];
    NSMutableArray *array = [NSMutableArray array];
    for (int plane = 0; plane <= 16; plane++) {
        if ([charset hasMemberInPlane:plane]) {
            UTF32Char c;
            for (c = plane << 16; c < (plane+1) << 16; c++) {
                if ([charset longCharacterIsMember:c]) {
                    UTF32Char c1 = OSSwapHostToLittleInt32(c); // To make it byte-order safe
                    NSString *s = [[NSString alloc] initWithBytes:&c1 length:4 encoding:NSUTF32LittleEndianStringEncoding];
                    [array addObject:s];
                }
            }
        }
    }
    

    For the uppercaseLetterCharacterSet this gives an array of 1467 elements. But note that characters > U+FFFF are stored as UTF-16 surrogate pair in NSString, so for example U+10400 actually is stored in NSString as 2 characters "\uD801\uDC00".

    Swift 2 code can be found in other answers to this question. Here is a Swift 3 version, written as an extension method:

    extension CharacterSet {
        func allCharacters() -> [Character] {
            var result: [Character] = []
            for plane: UInt8 in 0...16 where self.hasMember(inPlane: plane) {
                for unicode in UInt32(plane) << 16 ..< UInt32(plane + 1) << 16 {
                    if let uniChar = UnicodeScalar(unicode), self.contains(uniChar) {
                        result.append(Character(uniChar))
                    }
                }
            }
            return result
        }
    }
    

    Example:

    let charset = CharacterSet.uppercaseLetters
    let chars = charset.allCharacters()
    print(chars.count) // 1521
    print(chars) // ["A", "B", "C", ... "]
    

    (Note that some characters may not be present in the font used to display the result.)