rtime-seriespivot-tabledata-wranglingnoaa

How to create an time series in R for the .data format?


I´m having some difficulties to read and create an time series objects in R for this datasets:

SOI: https://psl.noaa.gov/data/correlation/soi.data ONI: https://psl.noaa.gov/data/correlation/oni.data

By seen the data we have in the first column the years and in the cols the months (Jan to Dec)

I expect to have something like this for SOI in R:

YearMonth  SOI
  <mth>    <dbl>
Jan 1948   -99.99
Feb 1948   -99.99
...         ...
Sep 2021   -1.3
Oct 2021   -99.99
Nov 2021   -99.99
Dec 2021   -99.99

And in the same way for ONI:

YearMonth  ONI
  <mth>    <dbl>
Jan 1950   -1.53
Feb 1950   -1.34
...         ...
Aug 2021   -0.46
Sep 2021   -99.90
Oct 2021   -99.90
Nov 2021   -99.90
Dec 2021   -99.90

I believe that the arrangement of this dataset may be the source of my difficulty, as I am not being able to correctly pivot this data.

I know that here in the stack we have good R users that will definitely help me with the best practice.


Solution

  • This function seems to work for the two links shared but if there are more standard ways to get the data you can use that since they'll be more reliable.

    library(dplyr)
    library(tidyr)
    
    read_data <- function(link) {
      read.table(link, skip = 1, fill = TRUE) %>%
        slice(-(grep('-99.9', V1):n())) %>%
        mutate(across(.fns = as.numeric)) %>%
        pivot_longer(cols = -V1) %>%
        mutate(name = month.abb[match(name, unique(name))]) %>%
        unite(YearMonth, V1, name, sep = ' ')
    }
    
    d1 <- read_data('https://psl.noaa.gov/data/correlation/soi.data')
    d1
    
    # A tibble: 888 x 2
    #  YearMonth value
    #   <chr>     <dbl>
    # 1 1948 Jan  -100.
    # 2 1948 Feb  -100.
    # 3 1948 Mar  -100.
    # 4 1948 Apr  -100.
    # 5 1948 May  -100.
    # 6 1948 Jun  -100.
    # 7 1948 Jul  -100.
    # 8 1948 Aug  -100.
    # 9 1948 Sep  -100.
    #10 1948 Oct  -100.
    # … with 878 more rows
    

    For the second link -

    d2 <- read_data('https://psl.noaa.gov/data/correlation/oni.data')
    d2
    # A tibble: 864 x 2
    #   YearMonth value
    #   <chr>     <dbl>
    # 1 1950 Jan  -1.53
    # 2 1950 Feb  -1.34
    # 3 1950 Mar  -1.16
    # 4 1950 Apr  -1.18
    # 5 1950 May  -1.07
    # 6 1950 Jun  -0.85
    # 7 1950 Jul  -0.54
    # 8 1950 Aug  -0.42
    # 9 1950 Sep  -0.39
    #10 1950 Oct  -0.44
    # … with 854 more rows