pythonpandasdataframecsv

Dataframe of dataframes: writing and reading


I have a set of images. In each image, a program finds objects with attributes X and type. The number of objects vary from image to image. Hence for one image I have a df_objects with N_objects rows and 2 columns X and type.

Then I build a df_images with the images as rows, with columns time, objects where the entry for objects is the df_objects above. This works very well inside the program. Of course the interest is to store the structure, and I tried to run, DataFrame.to_csv.

Then I read it by pd.read_csv. It seems to work for example using the read df_images, I can print the df_objects of image 1. But not quite: df_objects["type"] is not accepted and generates an error:

TypeError: string indices must be integers

Although the code is strictly identical to that tested on the original df. See code below. Thanks!

import pandas as pd

df1 = pd.DataFrame({"X":(1.1,1.2),"type":("a_1","b_1")})
print(' df1')
print(df1)
df2 = pd.DataFrame({"X":(2.1,2.2,2.3),"type":("a_2","b_2","c_2")})
print(' df2')
print(df2)
print('  ')
dfT = pd.DataFrame ({"time":(6,7),"dff":(df1,df2)})
df1_test = dfT["dff"][0]
print(' df1_test')
print(df1_test)
df2_test = dfT["dff"][1]
print(' df2_test')
print(df2_test)
print('  ')
type_list_evt_1 = df1_test["type"]
print(' type_list_evt_1')
print(type_list_evt_1)
print('  ')

dfT.to_csv(path_or_buf = "test_dff.csv", index = "False")

read_dfT = pd.read_csv('test_dff.csv')

df1_read = read_dfT["dff"][0]
print(' df1_read')
print(df1_read)
df2_read = read_dfT["dff"][1]
print(' df2_read')
print(df2_read)
print('  ')
type_list_evt_1_read = df1_read["type"]
print(' type_list_evt_1_read')
print(type_list_evt_1_read)

I would like the df read back to behave strictly as the df written


Solution

  • If you prefer format easy ti inspect and edit, you can use JSON. here each df_objects can be stored as JSON within the main DataFrame.

    for ex:

    import pandas as pd
    import json
    from io import StringIO
    
    df1 = pd.DataFrame({"X": [1.1, 1.2], "type": ["a_1", "b_1"]})
    df2 = pd.DataFrame({"X": [2.1, 2.2, 2.3], "type": ["a_2", "b_2", "c_2"]})
    
    df1_json = df1.to_json(orient='split')
    df2_json = df2.to_json(orient='split')
    
    df_images = pd.DataFrame({
        "time": [6, 7],
        "objects": [df1_json, df2_json]
    })
    
    # Save DataFrame as CSV
    df_images.to_csv("df_images.csv", index=False)
    read_df_images = pd.read_csv("df_images.csv")
    
    read_df_images["objects"] = read_df_images["objects"].apply(lambda x: pd.read_json(StringIO(x), orient='split'))
    
    df1_read = read_df_images["objects"][0]
    print("df1_read")
    print(df1_read)
    
    type_list_evt_1_read = df1_read["type"]
    print("type_list_evt_1_read")
    print(type_list_evt_1_read)
    
    

    Hope this helps.