rsubsetobjectsize

Downsize the object memory by subsetting a data frame in R


So I'm using the database from https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe/downloads/515k-hotel-reviews-data-in-europe.zip/1 and I don't understand why I can't downsize the object size by subsetting the dataset

df = read.csv('Hotel_Reviews.csv')
object.size(df)

200503848 bytes

object.size(df[sample(1:nrow(df),500),])

157225848 bytes

By taking 0.1% of the data, I only downsized the data to 75%. I don't understand why...


Solution

  • Ok after looking more deeply at it, it seems it's because my data frame was made of factors and even by subsetting, it keeps the empty levels

    df = read.csv('Hotel_Reviews.csv',stringsAsFactors = FALSE)
    object.size(df)
    

    210584168 bytes

    object.size(df[sample(1:nrow(df),500),])
    

    394464 bytes