python-3.xdictionarymachine-learningtext-analysismultilabel-classification

reading multiple labels as a list or tuples in a dict with id as the key i.e {id:(cat1,cat2,.....)}


I am modelling a multilabel text classifcation algorithm. Below is a snipet of my labels.txt file I want to convert these records into a dictionary consisting of the id with corresponding categories in a tuple or a list i.e {id:(cat1,cat2)}. The records are not new line seperated. I am stuck on how to convert this kind of data into dictionary.

B0027DQHA0
  Movies & TV, TV
  Music, Classical
0756400120
  Books, Literature & Fiction, Anthologies & Literary Collections, General
  Books, Literature & Fiction, United States
  Books, Science Fiction & Fantasy, Science Fiction, Anthologies
  Books, Science Fiction & Fantasy, Science Fiction, Short Stories
B0000012D5
  Music, Blues
  Music, Pop
  Music, R&B

Solution

  • If category names are always indented with spaces and IDs are not, you can use this to distinguish them and append category names to lists in a dict indexed by IDs in a loop:

    r = '''B0027DQHA0
      Movies & TV, TV
      Music, Classical
    0756400120
      Books, Literature & Fiction, Anthologies & Literary Collections, General
      Books, Literature & Fiction, United States
      Books, Science Fiction & Fantasy, Science Fiction, Anthologies
      Books, Science Fiction & Fantasy, Science Fiction, Short Stories
    B0000012D5
      Music, Blues
      Music, Pop
      Music, R&B'''
    d = {}
    for l in r.splitlines():
        if l.startswith(' '):
            d.setdefault(i, []).append(l.lstrip())
        else:
            i = l
    print(d)
    

    This outputs:

    {'B0027DQHA0': ['Movies & TV, TV', 'Music, Classical'], '0756400120': ['Books, Literature & Fiction, Anthologies & Literary Collections, General', 'Books, Literature & Fiction, United States', 'Books, Science Fiction & Fantasy, Science Fiction, Anthologies', 'Books, Science Fiction & Fantasy, Science Fiction, Short Stories'], 'B0000012D5': ['Music, Blues', 'Music, Pop', 'Music, R&B']}