I'm trying to find out why these results of exp_purch
variable differ from each other.
This seems to work.
library(dplyr)
data3 <- tibble(
customer = c(1,2,3),
frequency = c(30,32,36),
recency = c(72,71,74),
T = c(74,72,77),
monetary_value = c(35.654,47.172187,30.603611)
)
a <- 0.6866195
b <- 2.959643
r <- 0.2352725
alpha <- 4.289764
log_div_mean <- function(customer,dt) {
data <- dt
log_div_ <- (r + data$frequency[customer]) *
log((alpha + data$T[customer]) / (alpha + data$recency[customer])) +
log(a / (b + max(data$frequency[customer], 1) - 1))
xd <- 1/(1+exp(-(-log_div_)))
return(xd)
}
data3 %>% mutate(exp_purch = log_div_mean(customer,data3))
When I do it outside dplyr
the results differ, however.
customer <- 2
log_div_ <- (r + data3$frequency[customer]) *
log((alpha + data3$T[customer]) / (alpha + data3$recency[customer])) +
log(a / (b + max(data3$frequency[customer], 1) - 1))
xd <- 1/(1+exp(-(-log_div_)))
xd
Looks like the dplyr
code is using the last customer id for all three rows.
Here is a simple base R implementation, see my comment for details.
I will leave it to you to work out a version which works well with {dplyr}
syntax. The data masking is different. "dplyr
-style" is close to subset()
.
If you need assistance, do not hesitate to comment.
Data
data3 = data.frame(
customer = c(1,2,3),
frequency = c(30,32,36),
recency = c(72,71,74),
TX = c(74,72,77),
monetary_value = c(35.654,47.172187,30.603611))
Implementation
of log_div_mean()
(do you have a reference for the calculation?)
log_div_mean = \(.data, # data
.x, .y, .z, # columns of interest
a = .6866195, b = 2.959643, # default values
r = .2352725, alpha = 4.289764 # which can be overwritten
) {
.u = .data[[.x]]
r1 = r + .u
r2 = log( (alpha + .data[[.y]]) / (alpha + .data[[.z]]) )
r3 = log(a / (b + max(c(.u, 1)) - 1)) # typo in your max?
rr = r1 * r2 + r3
1 / (1 + exp(rr))
}
where we use the variable naming routine present in the {tidyverse}.
Application
> log_div_mean(.data = data3, .x = "frequency", .y = "TX", .z = "recency")
[1] 0.9619502 0.9730688 0.9340070
Correct results?