I'm trying to turn my wide (100K+ columns) 2D numpy data into a Vaex Dataframe. I'm reading through the documentation, and I see two relevant functions:
but both give me an entire column x
, where each row is a numpy array. What I expected was for Vaex to intelligently recognize that I want each column of data from the numpy array to be its own separate column in the Vaex DataFrame.
vaex.from_arrays(x=2d_numpy_matrix)
gives me:
x
---
0 np.array(1,2,3)
1 np.array(4,5,6)
when I wanted:
0 | 1 | 2 (Column header)
---
1 | 2 | 3
4 | 5 | 6
My workaround is vaex.from_pandas(pd.DataFrame(2d_numpy_matrix))
but this is embarrassingly slow. Is there a more CPU-time efficient way to do this?
You can unpack a dictionary comprehension like this:
import numpy as np
import vaex
headers = np.array(['1','2','3'])
data = np.array([[1,4],[2,5],[3,6]])
df = vaex.from_arrays(**{header: column for header, column in zip(headers, data)})
This yields:
>>> df
# 0 1 2
0 1 2 3
1 4 5 6