i have a dataset with schema,
BIKE_ID | REGN_NUMBER | ENGINE_NUMBER | CHASSIS_NUMBER | BUYED_YEAR |
---|---|---|---|---|
1 | XN67TY567 | 34567ABGN65 | 145089 | 2011 |
2 | XN67TM567 | 34567ABGT65 | 145085 | 2011 |
3 | XN67TM569 | 34567VBGT65 | 1450867 | 2013 |
. | . | . | . | . |
. | . | . | . | . |
2870763 | XN56RTMN | 34786VHGT65 | 14501236 | 2016 |
Now i would like to generate the data from 28,70,764 to some 3,28,70,764 i.e generating around 30 Million rows so as in pandas we can use the below method.
val = 2870764
df3['POLICY_ID'] = range(val ,val+30000000)
but as it is huge data pandas can't generate, so is there any approach to solve this problem by doing it in Vaex.
But Vaex throws me an error
ValueError: range(2870764, 5870764) is not of string or Expression type,
but <class 'range'>
Could anyone suggest whether can we do in this way in Vaex?
Yes, vaex has a function called vrange
that does exactly what you're looking for, with no memory usage.
Example:
import vaex
df = vaex.example()
df
Here is a dataframe with 330,000 rows (using the example dataset at the time of writing). We can generate a new column, POLICY_ID
using vaex.vrange
df["POLICY_ID"] = vaex.vrange(0, len(df))
vrange docs: https://vaex.io/docs/api.html#vaex.vrange