bigdatadata-generationvaex

Does vaex data frame support data generation?


i have a dataset with schema,

BIKE_ID REGN_NUMBER ENGINE_NUMBER CHASSIS_NUMBER BUYED_YEAR
1 XN67TY567 34567ABGN65 145089 2011
2 XN67TM567 34567ABGT65 145085 2011
3 XN67TM569 34567VBGT65 1450867 2013
. . . . .
. . . . .
2870763 XN56RTMN 34786VHGT65 14501236 2016

Now i would like to generate the data from 28,70,764 to some 3,28,70,764 i.e generating around 30 Million rows so as in pandas we can use the below method.

val = 2870764
df3['POLICY_ID'] = range(val ,val+30000000) 

but as it is huge data pandas can't generate, so is there any approach to solve this problem by doing it in Vaex.

But Vaex throws me an error

ValueError: range(2870764, 5870764) is not of string or Expression type, 
but <class 'range'>

Could anyone suggest whether can we do in this way in Vaex?


Solution

  • Yes, vaex has a function called vrange that does exactly what you're looking for, with no memory usage.

    Example:

    import vaex
    
    df = vaex.example()
    df
    

    Here is a dataframe with 330,000 rows (using the example dataset at the time of writing). We can generate a new column, POLICY_ID using vaex.vrange

    df["POLICY_ID"] = vaex.vrange(0, len(df))
    

    vrange docs: https://vaex.io/docs/api.html#vaex.vrange