The following table my_table
idx speed duration
0 420 50
1 380 40
2 390 45
is stored in a database file my_db.db
. Loading this table as a list of dictionaries appears straightforward: the code
conn = sqlite3.connect(folder_path + 'my_db.db')
conn.row_factory = sqlite3.Row
cur = conn.cursor()
cur.execute(f'SELECT * FROM my_table')
list_of_objs = cur.fetchall()
list_of_dicts = [dict(obj) for obj in list_of_objs]
cur.close()
conn.close()
yields
list_of_dicts = [
{'idx': 0, 'speed': 420, 'duration': 50},
{'idx': 0, 'speed': 380, 'duration': 40},
{'idx': 0, 'speed': 390, 'duration': 45}
]
However, for my data, I would like to load the data as a dictionary of lists:
dict_of_lists = [
'idx': [0, 1, 2],
'speed': [420, 380, 390],
'duration': [50, 40, 45]
]
Is there a straightforward way to do this?
(Part of my motivation for the dictionary-of-lists representation is that it seems like it would require less RAM than a list of dictionaries when the table has many rows. But does it, or does Python somehow recognize and avoid the redundancy of keys in the list-of-dictionaries representation?)
Similar to the previous answer, using fetchmany()
, but several times faster; just the juicy parts:
data: dict[str, list] = collections.defaultdict(list)
while chunk := cursor.fetchmany(CHUNK_SIZE):
for values, col_name in zip(zip(*chunk), columns):
data[col_name].extend(values)
This is roughly 5x times faster, as it avoids repeated dictionary lookups for every single column in every single row, and uses more dense iterators.