pandasexport-to-csvkdb+qpython

Saving a kdb table to a dataframe then saving the dataframe to a csv. null and string values outputting to csv incorrectly?


I am saving a kdb table to a dataframe then saving the dataframe to a csv. This works, however, the csv file and if i print(dataframe); null values are showing as " b" ", and all other string values are showing as " b'STRING' ".

Running Python 3.7, pandas 0.24.2 and qpython 2.0.0.

df = pandas.DataFrame(qpython query)
df.to_csv(path_or_buf="",
          sep=",", na_rep='',
          float_format=None,
          columns=None,
          header=True, index=False,
          index_label=None, mode='w+', compression=None, quoting=None, quotechar='"',
          line_terminator="\n", chunksize=50, tupleize_cols=None, date_format=None,
          doublequote=True,
          escapechar=None, decimal='.', encoding='utf-8')

I expected the KDB table to output to the csv correctly, with nulls being an empty column and strings just showing the string, without " b'STRING' ".

Any advice or help would be greatly appreciated. If anyone needs any more information, I'd be happy to provide.

Example in csv:

Null cells show as : b"

Cells containing strings show as:" b'Euro' " when in fact should just show "Euro"


Solution

  • I would have expected strings in kdb to be handled fine, as QPYTHON should convert null strings to python null strings. Null symbols, however, are converted to _QNULL_SYM. In this case, I think the 'b' prefix indicates a byte literal. You can try to decode the byte objects before saving to a csv

    Normally in python I would do something along the following

    df['STRINGCOL'] = df['STRINGCOL'].apply(lambda s: s.decode('utf-8'))
    

    I don't have much experience with QPYTHON but I believe using qnull() will convert the null to a pythonic value.

    df['STRINGCOL'] = df['STRINGCOL'].apply(lambda s: qnull(s))