[SOLVED] Multiple values in square bracket csv in R

Multiple values in square bracket csv in R

I have a csv file like this:

1556891503.326399;16384;340;48188.23529411765;[1618.377685546875, 1620.2911376953125, 1620.1904296874998, 1619.9386596679685, 1620.391845703125, 1620.2911376953125, 1620.794677734375, 1618.1762695312498, 1620.8450317382812, 1621.0968017578125, 1620.3414916992188, 1620.7443237304685, 1620.391845703125, 1620.9457397460935,...]; 155689433.326399;16384;340;48188.23529411765;[1618.377685546875, 1620.2993876953125, 1620.1904296874998, 1619.9386596679685, 1620.391845703125, 1620.2911376953125..];...

There are 5 features and the last one is a huge sensor data in one cell in square brackets and separated with commas. I would like to take mean, mode, sd.. etc. of this sensor data but I don't know how to remove it from brackets and analyze it. I try the modify it as String but the data is huge and the processing time is long! Is there any easier way?

Solution

It is probably not the best or prettiest way, but the below method will work.

It is not clear how your file is formatted, but because you said "There are 5 features and the last one is", I assume your data looks like this:

 df1 <- data.frame(V1=c(1556891503,155689433),V2=c(16384,16384),V3=c(340,340),V4=c(12,12),V5=c("[12,12,12,23]","[8,8,8,8]"))

           V1    V2  V3 V4            V5
 1 1556891503 16384 340 12 [12,12,12,23]
 2  155689433 16384 340 12     [8,8,8,8]

You can read that csv using read.csv and sep=";"

df <- read.csv("myFile.csv",sep= ";", header = FALSE, stringsAsFactors = FALSE)
df$V5 <- gsub("\\[","",df$V5)
df$V5 <- gsub("\\]","",df$V5)

You can then split the 5th column using strsplit(df$V5,split = ", "), convert it to numeric

df$V6 <- strsplit(df$V5,split = ", ")
df$V6 <- sapply(df$V6, function(x) as.numeric(unlist(x)))

and calculate your statistics

df$mean <- sapply(df$V6, function(x) mean(unlist(x)))
df$sd <- sapply(df$V6, function(x) sd(unlist(x)))

     mean        sd
1 1620.201 0.8779917
2 1619.915 0.7689437