pythonpython-3.xstringdictionary

Is there a cleaner string cleaning for my string in python


I'm trying to make a dict from my sample string but I'm getting a sample source that is VERY dirty

my working python snippet:

    dict(item.split(':', 1) for item in re.sub(' ', '', re.sub('"', '', ','.join(
            list(filter(None, re.sub('\r', '', text_subsection.split('text')[1]).split('\n')))))).split(',')
             )

example of what is in text_subsection is :

\r\n; Count of Something: 3\r\ntext\r\n"Key1: 9999999, Key2: mnkhkljh213, Key3: 593, Key4: 66666"\r\n"Key5 something: sample, Desc: , Date: 4/28/2025, Time: 4:15 PM"\r\n"ANOTHERKEY: 622523, KEY1: 9999999, KEY6: 160305, KEY7: 0, KEY8: 10, KEY11: 1, DATE: 4/28/2025, TIME: 16:15:50"\r\n

notes:

yes, key1 comes in as "Key1" AND "KEY1"

yes, date comes in as "Date" and "DATE"

yes, time comes in as "Time" and "TIME" and multiple ":"

yes, key5 has a space in the name

I'm fine with these key dupes in the logic that later uses this dictionary


Solution

  • You can merge the two outer re.sub, remove the list and merge the inner re.sub with the split:

    dict(item.split(':', 1) for item in re.sub(' |"', '', ','.join(filter(None, text_subsection.split('text')[1].split('\r\n')))).split(','))
    

    re.sub(' |"', '', ...) removes all spaces and double quotes.

    join expects an iterable. It doesn't have to be a list.

    If \n always comes with an \r, removing \r and splitting by \n is the same as splitting by \r\n.