rlagcross-correlation

Wrong values collected from custom CCF function in R


This is a follow up question to what was asked here... Quadrant plot based on ccf value outputs in R

The original question and this one use the same data...

df <- structure(list(Date = structure(c(16222, 16617, 14518, 15156, 
15918, 17075, 15522, 16679, 16010, 15187, 15461, 16283, 17379, 
15553, 17410, 15553, 16191, 16314, 14549, 15979), class = "Date"), 
    Commonname = c("Black Sea Bass", "Pinfish", "Pigfish", "Pinfish", 
    "Silver Perch", "Black Sea Bass", "Pigfish", "Pinfish", "Pigfish", 
    "Silver Perch", "Silver Perch", "Black Sea Bass", "Pinfish", 
    "Pinfish", "Silver Perch", "Pigfish", "Black Sea Bass", "Silver Perch", 
    "Silver Perch", "Black Sea Bass"), CPUE = c(1.25513090974505, 
    9.41478783154444, 1.63667465565289, 3.13779141143018, 4.26313144106683, 
    2.32564938844104, 2.70394855189782, 8.49969670589948, 1.7329255861366, 
    2.0845409179642, 0.269832703723692, 1.21288437532366, 11.8739506505966, 
    8.55504246458105, 2.21256794002004, 4.51336797979511, 1.47695928524315, 
    1.10425042966867, 0.632732705722451, 1.59167844861806), Discharge = c(14.8521616, 
    5.23042759111111, 1.42663083211115, 0.184551018105263, 48.9156538971429, 
    2.29765846588235, 33.25524992, 4.06629248, 1.610659584, 0.21166808, 
    0.0607489749333333, 2.22029454545455, 12.90821328, 31.9696672, 
    8.05754544, 32.7267690105263, 43.493472128, 6.77337856, 1.10646621744, 
    4.37803470545454)), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

Now using the following code and filtering for positive lags only...


library(tidyverse)

fccf = function(data) ccf(data$CPUE,data$Discharge, lag.max = 5, plot = FALSE)
facf = function(acf) tibble(aacf = acf$acf[,,1], lag = acf$lag[,,1])

test.df <- df %>% group_by(Commonname) %>%
  nest() %>%  #Step 1
  mutate(ccf = map(data, ~fccf(.x))) %>% #Step 2
  mutate(acf = map(ccf, ~facf(.x))) %>%  #Step 3
  unnest(acf)

#removes the nested columns
test.df <- test.df[,c("Commonname", "aacf", "lag")]

#removes negative lags
test.df <- subset(test.df, lag > 0)
test.df

I get different results from what the ccf for each individual shows.

For example the plot for Black Sea Bass is completely different that what these custom functions pull out for values....

df <- df %>% 
  filter(Commonname == "Black Sea Bass")
ccf(df$Discharge, df$CPUE, lag.max = 5)

CCF plot for Black Sea Bass

The ccf plot shows a value of almost 1.0 at the 2nd positive lag step, but the custom functions pull a value of about 0.01

If someone can help me figure out what's causing the differences, or how to get these custom functions to pull the correlations seen on the plot i would be most grateful. Thanks.


Solution

  • Oh, everything is fine! The difference, however, is that in your first query to the ccf function, you entered data$CPUE as x and data$Discharge as y. So I did the fccf function too. Now, however, you have swapped arguments! Note that you are calling ccf(df$Discharge, df$CPUE, lag.max = 5). Hence the difference.

    fccf = function(data) ccf(data$Discharge, data$CPUE, lag.max = 5, plot = FALSE)
    facf = function(acf) tibble(lag = acf$lag[,,1], acf = acf$acf[,,1])
    
    df %>% nest_by(Commonname) %>% 
      mutate(ccf = list(fccf(data)),
             acf = list(facf(ccf))) %>% 
      select(c(-ccf, -data)) %>% 
      unnest(acf) %>% 
      ggplot(aes(lag, acf)) +
      geom_segment(aes(x=lag, xend=lag, y=0, yend=acf)) +
      geom_point(size=2, color="red", fill=alpha("orange", 0.3), alpha=0.7, shape=21, stroke=2)+
      facet_wrap(vars(Commonname), 2, 2)
    

    enter image description here

    Compare the chart (facet for Black Sea Bass) above with what you get.

    df2 = df %>% filter(Commonname == "Black Sea Bass") 
    acc1 = ccf(df2$Discharge, df2$CPUE, lag.max = 5)
    

    enter image description here