rggplot2probability-densityecdfcumulative-distribution-function

create and plot a cumulative probability density function with custom bin # and sizes of stock price ROC in R


I want to import daily stock market price data into R from any ticker, and examine one historical time segment of it. Then, from this segment, convert these prices into daily ROC/rateofchange % changes. Next, take this ROC series and create a cumulative probability density function which allows me to set any custom number of sorting bins, and any size limit for each bin. example: 22 bins with .3% limit. Next, plot this CPDF as either a histogram or a scatterplot. The final step would be to do this for 2 different sections of the same stock and plot them next to each other for visual inspection. I have started a code on stock ticker SPY, but I cannot get it to work.

library(quantmod)
library(tidyquant)
library(tidyverse)

# using tidyverse to import a ticker
spy <- tq_get("spy")
spy010422 <- tq_get("spy", get ="stock.prices", from ='2022-01-04', to = '2022-01-24')
str(spy010422)
# getting ROC between prices in the series
spy010422.rtn = ROC(spy010422$close, n = 1, type = c("discrete"), na.pad = TRUE)
str(spy010422.rtn)

# trying to use ggplot and tibble to create an ECDF function
spy010422.rtn %>%
  tibble() %>%
  ggplot() +
  stat_ecdf(aes(.))

# another attempt at running ECDF on the ROC series
spy010422.rtn %>%
  ggplot(spy010422.rtn) +
  stat_ecdf(aes(close))

# trying to set the number of bins and bin size for the ECDF
spy010422.rtn %>%
  mutate(rounded = round(close/.3, 0) *.3,
         bin = min_rank(rounded)) %>%
  ggplot(aes(close, bin)) +
  geom_line()

# next time segment of the ticker spy to compare this to
spy020222 <- tq_get("spy", get ="stock.prices", from ='2022-02-02', to = '2022-02-24')

Solution

  • I couldn't understand what exacly you wanted to plot. Normally a CPDF is just a continuous line, and doesn't have bins to customise. Also "plot this CPDF as either a histogram or a scatterplot" is a weird prhase to me, as one normally plots the histogram/scatterplot of the variable, not of the CPDF of the variable. Given that, I made a function that plots the histogram of the ROC of the ticker, and you can coment if that was what you wanted or not.

    The function takes a list of dates in the format list(c(from1, to1), c(from2, to1), ...) (you can add as many intervals as you want), and loops for each interval on this list (with the purrr::map function). For each interation, it creates the histogram costumizing the bins argument. After the loop, the graphs are binded in one figure using the ggpubr::ggarrange function (you must run install.packages("ggpubr") if you don't have the package installed).

    library(quantmod)
    library(tidyquant)
    library(tidyverse)
    
    gg.roc.hist = function(ticker, dates, bins = 30){
      map(dates, function(dates){ #loop for each interval in the 'dates' list
          df = tq_get(ticker, get ="stock.prices", from = dates[1], to = dates[2]) #get the prices
          df$roc = ROC(df$close, n = 1, type = c("discrete"), na.pad = TRUE) #add a column with the ROC
          
          ggplot(df, aes(x = roc)) + 
            geom_histogram(bins = bins) + #create a histogram changing the bins
            labs(title = paste0(dates[1], " to ", dates[2]))}) %>%
        ggpubr::ggarrange(plotlist = .) #bind the graphs together
    }
    

    Runnig:

    gg.roc.hist('spy', list(c('2022-01-04','2022-01-24'), c('2022-02-02', '2022-02-24')), 22)
    

    Yields this graph:

    enter image description here