rggplot2visualizationcandlestick-charttidyquant

Abnormal graph when plotting my own data with geom_candlestick


I have some 5 min price data on stock securities and as a proof of concept I have summarised them into daily price tables.

Say a 2 week period, for a security coded 603666, if the underling security is bought and sold daily like the table below, then tidyquant::geom_candlestick() worked nicely.

enter image description here

enter image description here

The problem happens when some security had very low trading volume/or only very infrequently traded like the table below where you see lots zeros, the geom_candlestick() plotted something weird: it should be just a horizontal line since open=high=low=0 for the days with no trading volume. but it gave me a bar plot. Is it becuase the close price has been recorded and carried over as $78.49 where in fact it should be zero for the zero volume trading days?

And a follow up question on geom_candlestick(): can I overlay volume data on top of it? Say the left y axis is "close price", I would like to add a right y axis so I can plot barplot for the trading volume, and only highlight the big buy or sell volume.

Thank you very much, Somehow in a world of ChatGPT, I still like StackOverflow

for the above data:

dput(daily.close.data)

structure(list(symbol = c(127654L, 127654L, 127654L, 127654L, 
127654L, 127654L, 127654L, 127654L, 127654L, 127654L, 127654L, 
127654L, 127654L, 127654L, 127654L, 127654L, 127654L), date = structure(c(18753, 
18754, 18757, 18758, 18759, 18760, 18761, 18764, 18766, 18767, 
18768, 18771, 18772, 18773, 18774, 18775, 18778), class = "Date"), 
    UpdateTime = new("Period", .Data = c(44, 40, 13, 5, 10, 41, 
    43, 8, 7, 35, 13, 34, 8, 2, 38, 20, 2), year = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), day = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), hour = c(15, 
    15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 
    15), minute = c(40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 
    40, 40, 40, 40, 40, 40)), PreCloPrice = c(78.3, 78.3, 78.49, 
    78.49, 78.49, 78.49, 78.49, 78.49, 78.49, 78.49, 78.49, 78.49, 
    78.49, 78.49, 78.49, 78.49, 78.49), OpenPrice = c(0, 78.49, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 78.49, 0, 78.49), HighPrice = c(0, 
    78.49, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 78.49, 0, 78.49
    ), LowPrice = c(0, 78.49, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 78.49, 0, 78.49), LastPrice = c(0, 78.49, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 78.49, 0, 78.49), close = c(78.3, 
    78.49, 78.49, 78.49, 78.49, 78.49, 78.49, 78.49, 78.49, 78.49, 
    78.49, 78.49, 78.49, 78.49, 78.49, 78.49, 78.49), volume = c(0L, 
    49L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 143L, 
    0L, 96L)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = 
c(NA, -17L), groups = structure(list(date = structure(c(18753, 
18754, 18757, 18758, 18759, 18760, 18761, 18764, 18766, 18767, 
18768, 18771, 18772, 18773, 18774, 18775, 18778), class = "Date"), 
    .rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
        10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L), ptype = integer(0), class = 
c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -17L), .drop = TRUE, class = 
c("tbl_df", 
"tbl", "data.frame")))

the ploting code is pretty standard:

    daily.close.data %>%
    ggplot(aes(x = date, y = close)) +
    geom_candlestick(aes(open = OpenPrice, high = HighPrice, low = LowPrice, close = 
close)) +
    labs(title = paste0(str_remove_all(name, ".csv")," Candlestick Chart"), 
         subtitle = "From sample market",
         y = "Closing Price", x = "") 
   theme_tq()

enter image description here enter image description here


Solution

  • I think the expected result will be obtained when open=low=high=close, here even though we have open=low=high=0, the variable close has nonzero high values, that's why the bars are plotted instead.

    If we use the variable LastPrice instead of close and use a small jitter (just for the sake of plotting), we shall obtain the following figure (note that for all days except last couple of days, all the values remain close to 0, whereas for those couple of days, all the values stay close to 80):

    nrows <- nrow(daily.close.data)
    pcols <- c('OpenPrice', 'HighPrice', 'LowPrice', 'LastPrice')
    daily.close.data[,pcols] <- daily.close.data[,pcols] + 
                                matrix(rnorm(nrows*length(pcols)), nrow=nrows) # jitter
    
    daily.close.data %>%
      ggplot(aes(x = date, y = close)) +
      geom_candlestick(aes(open = OpenPrice, high = HighPrice, low = LowPrice, 
                           close = LastPrice))
    

    enter image description here