I have a csv file like this:
1556891503.326399;16384;340;48188.23529411765;[1618.377685546875, 1620.2911376953125, 1620.1904296874998, 1619.9386596679685, 1620.391845703125, 1620.2911376953125, 1620.794677734375, 1618.1762695312498, 1620.8450317382812, 1621.0968017578125, 1620.3414916992188, 1620.7443237304685, 1620.391845703125, 1620.9457397460935,...]; 155689433.326399;16384;340;48188.23529411765;[1618.377685546875, 1620.2993876953125, 1620.1904296874998, 1619.9386596679685, 1620.391845703125, 1620.2911376953125..];...
There are 5 features and the last one is a huge sensor data in one cell in square brackets and separated with commas. I would like to take mean, mode, sd.. etc. of this sensor data but I don't know how to remove it from brackets and analyze it. I try the modify it as String but the data is huge and the processing time is long! Is there any easier way?
It is probably not the best or prettiest way, but the below method will work.
It is not clear how your file is formatted, but because you said "There are 5 features and the last one is", I assume your data looks like this:
df1 <- data.frame(V1=c(1556891503,155689433),V2=c(16384,16384),V3=c(340,340),V4=c(12,12),V5=c("[12,12,12,23]","[8,8,8,8]"))
V1 V2 V3 V4 V5
1 1556891503 16384 340 12 [12,12,12,23]
2 155689433 16384 340 12 [8,8,8,8]
You can read that csv using read.csv
and sep=";"
df <- read.csv("myFile.csv",sep= ";", header = FALSE, stringsAsFactors = FALSE)
df$V5 <- gsub("\\[","",df$V5)
df$V5 <- gsub("\\]","",df$V5)
You can then split
the 5th column using strsplit(df$V5,split = ", ")
, convert it to numeric
df$V6 <- strsplit(df$V5,split = ", ")
df$V6 <- sapply(df$V6, function(x) as.numeric(unlist(x)))
and calculate your statistics
df$mean <- sapply(df$V6, function(x) mean(unlist(x)))
df$sd <- sapply(df$V6, function(x) sd(unlist(x)))
mean sd
1 1620.201 0.8779917
2 1619.915 0.7689437