rdplyrinventory-management

Aged inventory schedule using dplyr, cumsum and lag


This is my first stackoverflow post after years of passive browsing. I am stuck on this problem and it's driving me nuts! Thanks so much for helping.

I have a data frame of inventory supply and demand bucketed by age. I have collected this data across time for many products. Demand for inventory of a certain age can be met with supply of the same age or younger. I am trying to calculate how much demand can be met with supply within each age bucket, filled youngest to oldest.

The data frame will be large (10^7 rows) so I am trying to do this with dplyr, mutate, lag, and cumsum versus a loop, which I suspect will be slow.

Here is a sample group within my data set (product and date groupings omitted):

library(dplyr)

Inventory <- data.frame(
  Age = c(90, 120, 270, 365, Inf),
  Demand = c(0, 5000, 25, 5000, 10),
  Supply = c(4000, 50, 4000, 300, 0))

View(Inventory)

The result I am expecting is:

Result <- Inventory
Result$Start = c(0, 4000, 0, 3975, 0)
Result$In = c(4000, 50, 4000, 300, 0)
Result$Out = c(0, 4050, 25, 4275, 0)
Result$End = c(4000, 0, 3975, 0, 0)
Result$Short = c(0, 950, 0, 725, 10)

View(Result)

I applied standard inventory calculations above:

I am having no luck using dplyr, but I think there is a solution using clever combinations of max, min, lag, and cumsum.


Solution

  • If speed becomes an issue over a large number of rows then potentially the fastest way to deal with iterating calculations is via Rcpp.

    You essentially need a cummulative-sum-but-floored-at-zero function, which adds the Supply - Demand outcome of each day to the last total and zeros it if it's negative. Here's a cumnominus trial function which gives the right table and can be used in dplyr:

    library(dplyr)
    
    cumnominus <- Rcpp::cppFunction("NumericVector cumnominus(NumericVector x) {
      int n = x.size();
      
      NumericVector sumout(n);
      
      sumout[0] = (x[0] < 0) ? 0 : x[0];
      
      for(int i = 1; i < n; i++) {
          
          sumout[i] = (x[i] < 0) ? 0 : x[i] + sumout[i - 1];
        
        }
      
      return sumout;
    }")
    
    Inventory |> 
      mutate(In = Supply,
             End = cumnominus(Supply - Demand),
             Start = lag(End, default = 0),
             Short = pmax(0, Demand - (Start + Supply)),
             Out = pmin(Demand, Start + In)) |> 
      select(Age, Demand, Supply, Start, In, Out, End, Short)
    
    #>   Age Demand Supply Start   In  Out  End Short
    #> 1  90      0   4000     0 4000    0 4000     0
    #> 2 120   5000     50  4000   50 4050    0   950
    #> 3 270     25   4000     0 4000   25 3975     0
    #> 4 365   5000    300  3975  300 4275    0   725
    #> 5 Inf     10      0     0    0    0    0    10
    
    Result
    #>   Age Demand Supply Start   In  Out  End Short
    #> 1  90      0   4000     0 4000    0 4000     0
    #> 2 120   5000     50  4000   50 4050    0   950
    #> 3 270     25   4000     0 4000   25 3975     0
    #> 4 365   5000    300  3975  300 4275    0   725
    #> 5 Inf     10      0     0    0    0    0    10
    

    As a bit of a test against an R-only loop on a dataframe of 5m rows it takes around 0.05s compared to R-loop of 8.5s:

    cumnominus_r <- function(x) {
      out_sum <- integer(length(x))
      out_sum[1] <- max(0, x[1])
      for (i in 2:length(x)) {
        out_sum[i] <- ifelse(x[i] < 0, 0, out_sum[i - 1] + x[i])
      }
      
      out_sum
    }
    
    big_df <- tibble(
      Demand = sample(seq(1000, 6000, 500), 5000000, replace = TRUE),
      Supply = sample(seq(1000, 6000, 500), 5000000, replace = TRUE)
    ) 
    
    
    bench::mark(
      Rcpp_fun = big_df |> 
        mutate(End = cumnominus(Supply - Demand)),
      R_only_fun = big_df |> 
        mutate(End = cumnominus_r(Supply - Demand))
    )
    
    
    #> # A tibble: 3 × 6
    #>   expression       min   median `itr/sec` mem_alloc `gc/sec`
    #>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
    #> 1 Rcpp_fun     43.15ms  52.95ms    16.1      77.7MB     8.94
    #> 2 R_only_fun     8.59s    8.59s     0.116    95.4MB    20.8