rtime-seriesforecastingintermittent

How to I transform half-hourly data that does not span the whole day to a Time Series in R?


This is my first question on stackoverflow, sorry if the question is poorly put.

I am currently developing a project where I predict how much a person drinks each day. I currently have data that looks like this:

1-day sample.

The menge column represents how much water a person has actually drunk in 30 minutes (So first value represents amount from 8:00 till before 8:30 etc..). This is a 1 day sample from 3 months of data. The day starts at 8 AM and ends at 8 PM.

I am trying to forecast the Time Series for each day. For example, given the first one or two time steps, we would predict the whole day and then we know how much in total the person has drunk until 8 PM. I am trying to model this data as a Time Series object in R (Google Colab), in order to use Croston's Method for the forecasting. Using the ts() function, what should I set the frequency to knowing that:

  1. The data is half-hourly
  2. The data is from 8:00 till 20:00 each day (Does not span the whole day)

Would I need to make the data span the whole day by adding 0 values? Are there maybe better approaches for this? Thank you in advance.


Solution

  • When using the ts() function, the frequency is used to define the number of (usually regularly spaced) observations within a given time period. For your example, your observations are every 30 minutes between 8AM and 8PM, and your time period is 1 day. The time period of 1 day assumes that the patterns over each day is of most interest here, you could also use 1 week here.

    So within each day of your data (8AM-8PM) you have 24 observations (24 half hours). So a suitable frequency for this data would be 24.

    You can also pad the data with 0 values, however this isn't necessary and would complicate the model. If you padded the data so that it has observations for all half-hours of the day, the frequency would then be 48.