rarrayscsvdata-cube

Extract 3 dimensional data from csv file in R


I am trying to extract my data from a csv file into R. The data is currently formatted in the following:

,"[{time=2014-01-01T00:00:00, NDVI=0.3793765496776215}, {time=2014-02-01T00:00:00, NDVI=...

,"[{time=2014-01-01T00:00:00, NDVI=0.4071076986818826}, {time=2014-02-01T00:00:00, ...

,"[{time=2014-01-01T00:00:00, NDVI=0.3412131556625801}, {time=2014-02-01T00:00:00, NDVI=...

Each of these lines is data for a different region. I want to have it in this format:

Region [1]

Time       NDVI

[1]        [1]

[2]        [2]

[3]        [3]

[...]      [...]


Region [2]

Time       NDVI

[1]        [1]

[2]        [2]

[3]        [3]

[...]      [...]


Region [3]

Time       NDVI

[1]        [1]

[2]        [2]

[3]        [3]

[...]      [...]

How can I do this?


Solution

  • Maybe there is a package that can parse this. However, you could do some data transformation using the tidyverse package.

    You can read in your data with readLines():

    dat <- readLines("test.txt")
    

    Which in this case looks like this:

    dat <- c(",\"[{time=2014-01-01T00:00:00, NDVI=0.3793765496776215}, {time=2014-02-01T00:00:00, NDVI=2}]\"", 
    "", ",\"[{time=2014-01-01T00:00:00, NDVI=0.4071076986818826}, {time=2014-02-01T00:00:00, NDVI=3}]\"", 
    "", ",\"[{time=2014-01-01T00:00:00, NDVI=0.3412131556625801}, {time=2014-02-01T00:00:00, NDVI=4}]\""
    )
    

    Then you can do some data transformations with a for loop and store the result in a list.

    library(tidyverse)
    dat <- c(",\"[{time=2014-01-01T00:00:00, NDVI=0.3793765496776215}, {time=2014-02-01T00:00:00, NDVI=2}]\"", 
             "", ",\"[{time=2014-01-01T00:00:00, NDVI=0.4071076986818826}, {time=2014-02-01T00:00:00, NDVI=3}]\"", 
             "", ",\"[{time=2014-01-01T00:00:00, NDVI=0.3412131556625801}, {time=2014-02-01T00:00:00, NDVI=4}]\""
    )
    
    l <- list()
    counter <- 1
    for (line in dat){
      if(nchar(line)>0){
        line <- as.data.frame(line) %>%
            # We need to remove some unwanted strings
          mutate(line = str_replace_all(line, 
                                        c("\\\""="", ",\\["= "", "\\]" = ""))) %>% 
            # The lines can be separated into rows where the string "}, {" occurs
                   separate_rows(line, line, sep = "\\}, \\{") %>% 
            # again removing some unwanted strings
          mutate(line = str_replace_all(line, c("\\{"="", "\\}"=""))) %>% 
            # add a unique identifier for each observation
          mutate(observation = row_number()) %>% 
            # separete the rows where a "," occurs
          separate_rows(line, line, sep =",") %>% 
          separate(., line, into = c("category", "value"), sep = "=") %>% 
            # put it into the long format
          pivot_wider(names_from = category, values_from = value)
        l[[counter]] <- line
        counter <- counter+1
      }
    }
    
    l
    #> [[1]]
    #> # A tibble: 2 x 3
    #>   observation time                ` NDVI`           
    #>         <int> <chr>               <chr>             
    #> 1           1 2014-01-01T00:00:00 0.3793765496776215
    #> 2           2 2014-02-01T00:00:00 2                 
    #> 
    #> [[2]]
    #> # A tibble: 2 x 3
    #>   observation time                ` NDVI`           
    #>         <int> <chr>               <chr>             
    #> 1           1 2014-01-01T00:00:00 0.4071076986818826
    #> 2           2 2014-02-01T00:00:00 3                 
    #> 
    #> [[3]]
    #> # A tibble: 2 x 3
    #>   observation time                ` NDVI`           
    #>         <int> <chr>               <chr>             
    #> 1           1 2014-01-01T00:00:00 0.3412131556625801
    #> 2           2 2014-02-01T00:00:00 4
    

    Created on 2020-03-24 by the reprex package (v0.3.0)