So I'm using the database from https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe/downloads/515k-hotel-reviews-data-in-europe.zip/1 and I don't understand why I can't downsize the object size by subsetting the dataset
df = read.csv('Hotel_Reviews.csv')
object.size(df)
200503848 bytes
object.size(df[sample(1:nrow(df),500),])
157225848 bytes
By taking 0.1% of the data, I only downsized the data to 75%. I don't understand why...
Ok after looking more deeply at it, it seems it's because my data frame was made of factors and even by subsetting, it keeps the empty levels
df = read.csv('Hotel_Reviews.csv',stringsAsFactors = FALSE)
object.size(df)
210584168 bytes
object.size(df[sample(1:nrow(df),500),])
394464 bytes