rimporttime-seriesdata-wranglingraw-data

How do I wrangle messy, raw data and import into R?


I have raw, messy data for time series containing around 1400 observations. Here is a snippet of what it looks like:

[new Date('2021-08-24'),1.67,1.68,0.9,null],[new Date('2021-08-23'),1.65,1.68,0.9,null],[new Date('2021-08-22'),1.62,1.68,0.9,null] ... etc

I want to pull the date and its respective value to form a tsibble in R. So, from the above values, it would be like

Date y-variable
2021-08-24 1.67
2021-08-23 1.65
2021-08-22 1.62

Notice how only the first value is to be paired with its respective date - I don't need the other values. Right now, the raw data has been copied and pasted into a word document and I am unsure about how to approach data wrangling to import into R.

How could I achieve this?


Solution

  • #replace the text conncetion with a file connection if desired, the file should be a txt then
    input <- readLines(textConnection("[new Date('2021-08-24'),1.67,1.68,0.9,null],[new Date('2021-08-23'),1.65,1.68,0.9,null],[new Date('2021-08-22'),1.62,1.68,0.9,null]"))
    
    #insert line breaks
    input <- gsub("],[", "\n", input, fixed = TRUE)
    
    #remove "new Date"
    input <- gsub("new Date", "", input, fixed = TRUE)
    
    #remove parentheses and brackets
    input <- gsub("[\\(\\)\\[\\]]", "", input, perl = TRUE)
    
    #import cleaned data
    DF <- read.csv(text = input, header = FALSE, quote = "'")
    DF$V1 <- as.Date(DF$V1)
    print(DF)
    #          V1   V2   V3  V4   V5
    #1 2021-08-24 1.67 1.68 0.9 null
    #2 2021-08-23 1.65 1.68 0.9 null
    #3 2021-08-22 1.62 1.68 0.9 null