rggplot2time-serieslinegraph

Treating Date and Time for a River Discharge Graph


I downloaded discharge data for a river from a government website, they had formatted the date and time data as so please see format.

This is my code

library(ggplot2)
ggplot(CHEM_RESULTS, aes(x= `Date and Time`, y=`Discharge (cumec)`,  group = 1)) +
  geom_line( color="powderblue", size=1, alpha=0.9, linetype=1)

I produced this graph please see graph .

DATA SAMPLE:

head(CHEM_RESULTS)

Date and Time
<chr>
Discharge (cumec)
<dbl>
2024-03-05T00:00:01.000+10:00   3.202           
2024-03-05T00:35:01.000+10:00   3.124           
2024-03-05T01:00:01.000+10:00   3.040           
2024-03-05T01:30:01.000+10:00   2.956           
2024-03-05T02:00:01.000+10:00   2.919           
2024-03-05T03:00:01.000+10:00   2.867   

I think due to the format of the date and time being so long and having so many entries(1896) it is creating the bar on the x axis rather than displaying the data. I do not think all data needs to be shown but some date/time points are needed to provide context. I think it may be challenging to reformat the way the government has given the date/ time data, again given how many entries there are.

I need to overlay other water quality data onto the graph e.g. pH at 4 sites and 4 different time periods. Once I put these points onto the graph will it highlight them? as that would be useful in providing only the necessary date and time information.

any help on how to approach this is greatly appreciated.

Thank you !

tried to make a line graph of river discharge getting a bar on the x axis instead of displaying time stamps


Solution

  • I've generated a similar data structure with Montjean station between 01-01-2022 and 12-31-2023 (source : GRDC).

    First lines (head(data)):

    # A tibble: 6 × 2
      `Date and Time`               `Discharge (cumec)`
      <chr>                                       <dbl>
    1 2022-01-02T09:00:00.000+10:00               1928.
    2 2022-01-03T09:00:00.000+10:00               2042.
    3 2022-01-04T09:00:00.000+10:00               2161.
    4 2022-01-05T09:00:00.000+10:00               2274.
    5 2022-01-06T09:00:00.000+10:00               2227.
    6 2022-01-07T09:00:00.000+10:00               2052.
    

    To reproduce your error :

    ### Packages
    library(dplyr)
    library(lubridate)
    library(ggplot2)
    
    ### Plot the graph without specifying breaks for the abscissa axis
    ggplot(data, aes(x= `Date and Time`, y=`Discharge (cumec)`, group = 1)) +
      geom_line(color="powderblue", linewidth=1, alpha=0.9, linetype=1)
    

    Output : Error

    To fix this :

    ### Transform the first column to POSIX time :
    data=data %>%
     mutate(`Date and Time`=ymd_hms(`Date and Time`,tz="Etc/GMT-10"))
    
    ### Plot the graph with ggplot2 with `scale_x_date_time` and `date_breaks`
    ggplot(data, aes(x= `Date and Time`, y=`Discharge (cumec)`, group = 1)) +
      scale_x_datetime(date_breaks = "3 months", date_labels = "%b %Y",limits = c(min(data$`Date and Time`), max(data$`Date and Time`)), expand = c(0, 0)) +
      geom_line(color="powderblue", linewidth=1, alpha=0.9, linetype=1)
    

    Output :

    Fixed