rcsvsquare-bracketmultiple-value

Multiple values in square bracket csv in R


I have a csv file like this:

1556891503.326399;16384;340;48188.23529411765;[1618.377685546875, 1620.2911376953125, 1620.1904296874998, 1619.9386596679685, 1620.391845703125, 1620.2911376953125, 1620.794677734375, 1618.1762695312498, 1620.8450317382812, 1621.0968017578125, 1620.3414916992188, 1620.7443237304685, 1620.391845703125, 1620.9457397460935,...]; 155689433.326399;16384;340;48188.23529411765;[1618.377685546875, 1620.2993876953125, 1620.1904296874998, 1619.9386596679685, 1620.391845703125, 1620.2911376953125..];...

There are 5 features and the last one is a huge sensor data in one cell in square brackets and separated with commas. I would like to take mean, mode, sd.. etc. of this sensor data but I don't know how to remove it from brackets and analyze it. I try the modify it as String but the data is huge and the processing time is long! Is there any easier way?


Solution

  • It is probably not the best or prettiest way, but the below method will work.

    It is not clear how your file is formatted, but because you said "There are 5 features and the last one is", I assume your data looks like this:

     df1 <- data.frame(V1=c(1556891503,155689433),V2=c(16384,16384),V3=c(340,340),V4=c(12,12),V5=c("[12,12,12,23]","[8,8,8,8]"))
    
               V1    V2  V3 V4            V5
     1 1556891503 16384 340 12 [12,12,12,23]
     2  155689433 16384 340 12     [8,8,8,8]
    

    You can read that csv using read.csv and sep=";"

    df <- read.csv("myFile.csv",sep= ";", header = FALSE, stringsAsFactors = FALSE)
    df$V5 <- gsub("\\[","",df$V5)
    df$V5 <- gsub("\\]","",df$V5)
    

    You can then split the 5th column using strsplit(df$V5,split = ", "), convert it to numeric

    df$V6 <- strsplit(df$V5,split = ", ")
    df$V6 <- sapply(df$V6, function(x) as.numeric(unlist(x)))
    

    and calculate your statistics

    df$mean <- sapply(df$V6, function(x) mean(unlist(x)))
    df$sd <- sapply(df$V6, function(x) sd(unlist(x)))
    
         mean        sd
    1 1620.201 0.8779917
    2 1619.915 0.7689437