Dataframe column with quoted CSV to named dataframe columns

I am pulling some JSON formatted log data out of my SEIM and into a pandas dataframe. I am able to easily convert the JSON into multiple columns within the dataframe, but there is a "message" field in the JSON that contains a quoted CSV, like this.

# dummy data
dfMyData = pd.DataFrame({"_raw": [\
            """{"timestamp":1691096387000,"message":"20230803 20:59:47,ip-123-123-123-123,mickey,321.321.321.321,111111,10673010,type,,'I am a, quoted, string, with commas,',0,,","logstream":"Blah1","loggroup":"group 1"}""",
            """{"timestamp":1691096386000,"message":"20230803 21:00:47,ip-456-456-456-456,mouse,654.654.654.654,222222,10673010,type,,'I am another quoted string',0,,","logstream":"Blah2","loggroup":"group 2"}"""
            ]})
# Column names for the _raw.message field that is generated.
MessageColumnNames =  ["Timestamp","dest_host","username","src_ip","port","number","type","who_knows","message_string","another_number","who_knows2","who_knows3"]
# Convert column to json object/dict
dfMyData['_raw'] = dfMyData['_raw'].map(json.loads)
# convert JSON into columns within the dataframe
dfMyData = pd.json_normalize(dfMyData.to_dict(orient='records'))

I've seen this done before with str.split() to split on columns and then concat it back to the original dataframe, however the str.split method doesn't handle quoted values within the CSV. pd.read_csv can handle the quoted CSV correctly, but I can't figure out how to apply it across the dataframe and expand the output of that into new dataframe columns.

Additionally, when I split dfMyData['_raw.message'] out into new columns, I'd also like to supply a list of column names for the data and have the new columns be created with those names.

Anyone know of an easy way to split a quoted CSV string in a dataframe column into new named columns within the dataframe?

Solution

Update

Actually all you need is to pass your data into a csv-reader, which in turn is an appropriate data type for pandas.DataFrame:

pd.DataFrame(csv.reader(dfMyData['_raw.message'], quotechar="'"), columns=columns)

Previous answer

We can try to convert the data into a csv and read them back with appropriate parameters:

import csv
from tempfile import TemporaryFile

seq = dfMyData.iloc[:,1]      # column of interest in the original data
columns = [*'ABCDEFGHIJKL']   # custom names of future data columns

with TemporaryFile() as file:

    seq.to_csv(
        file, 
        sep='\N{unit separator}',
        header=False,
        index=False,
        quoting=csv.QUOTE_NONE
    )

    file.seek(0)    # read data from the start

    df = pd.read_csv(
        file, 
        header=None,
        names=columns,
        quotechar="\'"
    )

print(df)

Notes:

quoting=csv.QUOTE_NONE to avoid \" at the ends of each line
sep='\N{unit separator}' to avoid confusion with commas
quotechar="\'" when reading back because of a specific quoting inside the lines
since we are dumping a sequence without indexes, the '\N{unit separator}' delimiter will never make it into the final data

Transformed data: