I'm trying to make a dict from my sample string but I'm getting a sample source that is VERY dirty
my working python snippet:
dict(item.split(':', 1) for item in re.sub(' ', '', re.sub('"', '', ','.join(
list(filter(None, re.sub('\r', '', text_subsection.split('text')[1]).split('\n')))))).split(',')
)
example of what is in text_subsection is :
\r\n; Count of Something: 3\r\ntext\r\n"Key1: 9999999, Key2: mnkhkljh213, Key3: 593, Key4: 66666"\r\n"Key5 something: sample, Desc: , Date: 4/28/2025, Time: 4:15 PM"\r\n"ANOTHERKEY: 622523, KEY1: 9999999, KEY6: 160305, KEY7: 0, KEY8: 10, KEY11: 1, DATE: 4/28/2025, TIME: 16:15:50"\r\n
notes:
yes, key1 comes in as "Key1" AND "KEY1"
yes, date comes in as "Date" and "DATE"
yes, time comes in as "Time" and "TIME" and multiple ":"
yes, key5 has a space in the name
I'm fine with these key dupes in the logic that later uses this dictionary
You can merge the two outer re.sub
, remove the list
and merge the inner re.sub
with the split
:
dict(item.split(':', 1) for item in re.sub(' |"', '', ','.join(filter(None, text_subsection.split('text')[1].split('\r\n')))).split(','))
re.sub(' |"', '', ...)
removes all spaces and double quotes.
join
expects an iterable. It doesn't have to be a list.
If \n
always comes with an \r
, removing \r
and splitting by \n
is the same as splitting by \r\n
.