Failed to set trace monitor for X. Variable X not found

I'm trying to implement a Bayesian model for football score prediction (English Premier League). I assume the number of scores follows the Poisson distribution with parameter lambda and this is the general formula:

log(home_lambda) = mu + hfa + home_attack - away_def

log(away_lambda) = mu + away_attack - home_def

where mu and hfa are the intercept and home field advantage respectively. And this is my data (380 x 8)

So basically I have 380x2 equations and 42 parameters (mu, hfa, attack ability of 20 teams and defence ability of 20 teams). Take the first match as an example:

FTHG[1] ~ dpois(home_lambda[1]) # FTHG is full time home goal
log(home_lambda[1]) = mu + hfa + att_Crystal_Palace - att_Arsenal

FTAG[1] ~ dpois(lam_a[1]) # FTAG is full time away goal
log(away_lambda[1]) = mu + att_Arsenal - def_Crystal_Palace

Below is my entire code

bayes_mod_string = " model{
  
  for (i in 1:380){
  
    FTHG[i] ~ dpois(lam_h[i])
    log(lam_h[i]) = mu + hfa + h_att[i] - a_def[i]
    FTAG[i] ~ dpois(lam_a[i])
    log(lam_a[i]) = mu + a_att[i] - h_def[i]
    
    h_att[i] ~ dnorm(0, 1/1e6) # the attack and defence ability of each team follows a Normal distribution
    a_def[i] ~ dnorm(0, 1/1e6)
    a_att[i] ~ dnorm(0, 1/1e6)
    h_def[i] ~ dnorm(0, 1/1e6)

  }
  
  mu ~ dnorm(0, 1/1e6)
  hfa ~ dnorm(0, 1/1e6)
  
}"

data_jags = as.list(test_set)

params = c('mu', 'hfa', unique(test_set$h_att), unique(test_set$a_def)) # 42 in total

bayes_mod_string = jags.model(textConnection(bayes_mod_string), 
                              data = data_jags, n.chains = 3)
update(bayes_mod_string, 1e3)

bayes_mod_sim = coda.samples(model = bayes_mod_string,
                             variable.names = params, n.iter = 1e3)
bayes_mod_csim = as.mcmc(do.call(rbind, bayes_mod_sim))

The error message is

Warning: Failed to set trace monitor for att_Crystal_Palace Variable att_Crystal_Palace not found

Not only with the "att_Crystal_Palace" parameter but also with all parameters.

I tried typing out each team attack and defence ability separately, putting them in and out the for loop

att_Crystal_Palace  ~ dnorm(0, 1/1e6)
  att_Fulham  ~ dnorm(0, 1/1e6)
  att_Bournemouth  ~ dnorm(0, 1/1e6)
  att_Leeds  ~ dnorm(0, 1/1e6)
  att_Newcastle  ~ dnorm(0, 1/1e6)
  att_Tottenham  ~ dnorm(0, 1/1e6)
  att_Everton  ~ dnorm(0, 1/1e6)
  att_Leicester  ~ dnorm(0, 1/1e6)  
  att_Man_United  ~ dnorm(0, 1/1e6)
  att_West_Ham  ~ dnorm(0, 1/1e6)
  att_Aston_Villa  ~ dnorm(0, 1/1e6)
  att_Arsenal  ~ dnorm(0, 1/1e6)
  att_Brighton  ~ dnorm(0, 1/1e6)
  att_Man_City  ~ dnorm(0, 1/1e6)
  att_Southampton  ~ dnorm(0, 1/1e6)
  att_Wolves  ~ dnorm(0, 1/1e6)
  att_Brentford  ~ dnorm(0, 1/1e6)
  att_Nottm_Forest  ~ dnorm(0, 1/1e6)
  att_Chelsea  ~ dnorm(0, 1/1e6)
  att_Liverpool ~ dnorm(0, 1/1e6)
  
  def_Crystal_Palace  ~ dnorm(0, 1/1e6)
  def_Fulham  ~ dnorm(0, 1/1e6)
  def_Bournemouth  ~ dnorm(0, 1/1e6)
  def_Leeds  ~ dnorm(0, 1/1e6)
  def_Newcastle  ~ dnorm(0, 1/1e6)
  def_Tottenham  ~ dnorm(0, 1/1e6)
  def_Everton  ~ dnorm(0, 1/1e6)
  def_Leicester  ~ dnorm(0, 1/1e6)  
  def_Man_United  ~ dnorm(0, 1/1e6)
  def_West_Ham  ~ dnorm(0, 1/1e6)
  def_Aston_Villa  ~ dnorm(0, 1/1e6)
  def_Arsenal  ~ dnorm(0, 1/1e6)
  def_Brighton  ~ dnorm(0, 1/1e6)
  def_Man_City  ~ dnorm(0, 1/1e6)
  def_Southampton  ~ dnorm(0, 1/1e6)
  def_Wolves  ~ dnorm(0, 1/1e6)
  def_Brentford  ~ dnorm(0, 1/1e6)
  def_Nottm_Forest  ~ dnorm(0, 1/1e6)
  def_Chelsea  ~ dnorm(0, 1/1e6)
  def_Liverpool ~ dnorm(0, 1/1e6)

This time the error is

RUNTIME ERROR: Compilation error on line 50. Attempt to redefine node def_Liverpool[1]

What the problem actually is and how can I tackle it? If you'd like to get the data, it's here https://www.football-data.co.uk/englandm.php, season 2022/2023

Solution

Your primary problem is here

log(lam_h[i]) = mu + hfa + h_att[i] - a_def[i]

h_att[i] resolves to, say, att_Arsenal, as you say, but you don't define a node (variable) called att_Arsenal. You have similar problems for h_def, a_att and a_def.

The easiest way to deal with this number of predictor variables is to use arrays rather than stand-alone scalars. I've used data from the 2023/24 season from the link you provided. My input data looks like this:

df
# A tibble: 39 × 7
   Div   Date       Time  HomeTeam AwayTeam  FTHG  FTAG
   <chr> <chr>      <chr>    <int>    <int> <int> <int>
 1 E0    11/08/2023 20:00        6       13     0     3
 2 E0    12/08/2023 12:30        1       16     2     1
 3 E0    12/08/2023 15:00        3       19     1     1
 4 E0    12/08/2023 15:00        5       12     4     1
 5 E0    12/08/2023 15:00        9       10     0     1
 6 E0    12/08/2023 15:00       17        8     0     1
 7 E0    12/08/2023 17:30       15        2     5     1
 8 E0    13/08/2023 14:00        4       18     2     2
 9 E0    13/08/2023 16:30        7       11     1     1
10 E0    14/08/2023 20:00       14       20     1     0
# … with 29 more rows

where I have indexed both HomeTeam and AwayTeam alphabetically. So, Arsenal are 1 and Wolves are 20.

Then I can write my model string as

bayes_mod_string = " 
data {
  mu ~ dnorm(0, 1/1e6)
  hfa ~ dnorm(0, 1/1e6)
  for (j in 1:nTeams) {
    h_att[j] ~ dnorm(0, 1/1e6) 
    a_def[j] ~ dnorm(0, 1/1e6)
    a_att[j] ~ dnorm(0, 1/1e6)
    h_def[j] ~ dnorm(0, 1/1e6)
    log(lam_h[j]) = mu + hfa + h_att[j] - a_def[j]
    log(lam_a[j]) = mu + a_att[j] - h_def[j]
  }
}

model {
  for (i in 1:nMatches) {
    FTHG[i] ~ dpois(lam_h[home[i]])
    FTAG[i] ~ dpois(lam_a[away[i]])
  }
}
"

Note the double-indexing of lam_a and lam_h in the model block.

I update the model and obtain samples with

library(rjags)

bayes_model = jags.model(textConnection(bayes_mod_string), 
                              data = data_jags, n.chains = 3)
data_jags <- list(
  nTeams = 20,
  nMatches = nrow(df),
  FTHG = df$FTHG,
  FTAG = df$FTAG,
  home = df$HomeTeam,
  away = df$AwayTeam
)

update(bayes_model, 1e3)

samples <- jags.samples(
  model = bayes_model,
  variable.names = c('mu', 'hfa', paste0("h_att[", 1:20, "]"), paste0("a_def[", 1:20, "]")), 
  n.iter = 1e3
)

Note that I've changed coda.samples to jags.samples.

I omit printing samples because the output is rather long.

Personally, I prefer runjags to rjags. You can use the model string directly rather than passing through a text connection or writing to a file, the model fitting code is more compact and I feel the output is both more compact and more informative.

library(runjags)

run.jags(
  bayes_mod_string, 
  data = data_jags,
  monitor = c('mu', 'hfa', paste0("h_att[", 1:20, "]"), paste0("a_def[", 1:20, "]"))
)
Compiling rjags model...
Calling the simulation using the rjags method...
Note: the model did not require adaptation
Burning in the model for 4000 iterations...
  |**************************************************| 100%
Running the model for 10000 iterations...
  |**************************************************| 100%
Simulation complete
Calculating summary statistics...
Note: The monitored variables 'mu', 'hfa', 'h_att[1]', 'h_att[2]', 'h_att[3]', 'h_att[4]', 'h_att[5]', 'h_att[6]', 'h_att[7]', 'h_att[8]', 'h_att[9]',
'h_att[10]', 'h_att[11]', 'h_att[12]', 'h_att[13]', 'h_att[14]', 'h_att[15]', 'h_att[16]', 'h_att[17]', 'h_att[18]', 'h_att[19]', 'h_att[20]', 'a_def[1]',
'a_def[2]', 'a_def[3]', 'a_def[4]', 'a_def[5]', 'a_def[6]', 'a_def[7]', 'a_def[8]', 'a_def[9]', 'a_def[10]', 'a_def[11]', 'a_def[12]', 'a_def[13]',
'a_def[14]', 'a_def[15]', 'a_def[16]', 'a_def[17]', 'a_def[18]', 'a_def[19]' and 'a_def[20]' appear to be non-stochastic; they will not be included in the
convergence diagnostic
Finished running the simulation

JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
                                                                         
          Lower95 Upper95    Mean SD Mode MCerr MC%ofSD SSeff Lag.10 psrf
mu         827.99  827.99  827.99  0   --    --      --    --     --   --
hfa        1122.6  1122.6  1122.6  0   --    --      --    --     --   --
h_att[1]   287.62  287.62  287.62  0   --    --      --    --     --   --
h_att[2]  -696.89 -696.89 -696.89  0   --    --      --    --     --   --
h_att[3]   332.89  332.89  332.89  0   --    --      --    --     --   --
h_att[4]   941.39  941.39  941.39  0   --    --      --    --     --   --
h_att[5]   795.37  795.37  795.37  0   --    --      --    --     --   --
<further output omitted>

IMPORTANT The lack of variability in the output indicates a problem with either the model specification or my implementation. Personally, I'd have fitted a hierarchical model. I don't have time to explore further right now, but at least I've given you clues about how to proceed.