pythonlinuxpandasqlikviewqliksense

(Very)Large QVD file to pandas DataFrame


I tried to load a QVD file to pandas dataframe using this tool as given in the below script. The problem is that it works perfectly but it's not optimized plus it provides only a way to get rows by index which is why I was forced to use a for-loop.

As result, as the number of rows increases so is the complexity. I found the qvd.getRow() function results that complexity but I couldn't find any other way to parse the QVD file. I'm looking for such tool but more efficient, especially in time as I'm dealing with some files with ~1M records.


import qvdfile.qvdfile 
import pandas as pd 

qvd = qvdfile.QvdFile ("file.qvd")

df = pd.DataFrame(columns=qvd.getRow(0).keys())
cols = list(qvd.getRow(0).keys())

for r in range(int(qvd.attribs["NoOfRecords"])):
    df = pd.concat([df, pd.DataFrame([qvd.getRow(r)], columns=cols)], ignore_index=True)


Solution

  • I think this project should fix your performance issue: https://pypi.org/project/qvd/

    I was able to read 750k rows, 55 columns in about 15 seconds.

    pip install qvd
    
    from qvd import qvd_reader
    
    df = qvd_reader.read('test.qvd')
    print(df)