pythonunicodeemojichrord

Printing out all unicode emojis to file


It's possible to print the hexcode of the emoji with u'\uXXX' pattern in Python, e.g.

>>> print(u'\u231B')
⌛

However, if I have a list of hex code like 231B, just "adding" the string won't work:

>>> print(u'\u' + ' 231B')
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

The chr() fails too:

>>> chr('231B')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: an integer is required (got type str)

My first part of the question is given the hexcode, e.g. 231A how do I get the str type of the emoji?

My goal is to getting the list of emojis from https://unicode.org/Public/emoji/13.0/emoji-sequences.txt and read the hexcode on the first column.

There are cases where it ranges from 231A..231B, the second part of my question is given a hexcode range, how do I iterate through the range to get the emoji str, e.g. 2648..2653, it is possible to do range(2648, 2653+1) but if there's a character in the hexa, e.g. 1F232..1F236, using range() is not possible.


Thanks @amadan for the solutions!!

TL;DR

To get a list of emojis from https://unicode.org/Public/emoji/13.0/emoji-sequences.txt into a file.

import requests
response = requests.get('https://unicode.org/Public/emoji/13.0/emoji-sequences.txt')

with open('emoji.txt', 'w') as fout:
    for line in response.content.decode('utf8').split('\n'):
        if line.strip() and not line.startswith('#'):
            hexa = line.split(';')[0]
            hexa = hexa.split('..')            
            if len(hexa) == 1:
                ch = ''.join([chr(int(h, 16)) for h in hexa[0].strip().split(' ')])
                print(ch, end='\n', file=fout)
            else:
                start, end = hexa
                for ch in range(int(start, 16), int(end, 16)+1):
                    #ch = ''.join([chr(int(h, 16)) for h in ch.split(' ')])
                    print(chr(ch), end='\n', file=fout)

Solution

  • Convert hex string to number, then use chr:

    chr(int('231B', 16))
    # => '⌛'
    

    or directly use a hex literal:

    chr(0x231B)
    

    To use a range, again, you need an int, either converted from a string or using a hex literal:

    ''.join(chr(c) for c in range(0x2648, 0x2654))
    # => '♈♉♊♋♌♍♎♏♐♑♒♓'
    

    or

    ''.join(chr(c) for c in range(int('2648', 16), int('2654', 16)))
    

    (NOTE: you'd get something very different from range(2648, 2654)!)