roptimizationtransport

transport problem between classes - identify (optimise) flow of individual entities


I have a large table of cells (rows) that belong to a certain class. In a next step, the distribution of classes has to change and I want to identify which cells need to be moved between classes minimising the costs that occur by moving cells between classes.

library(tidyverse)
library(transport)

set.seed(19)

## classes: 1 - 5
## overall cost matrix for transport
costs <- matrix(expand_grid(a = 1:5, b = 1:5) %>% mutate(cost = abs(b - a)) %>% 
    pull(cost), ncol = 5, nrow =5)

## dataset with initial classes and individual transport costs into other classes
dat   <- tibble(class = sample(rep(1:5, each = 1000), 200)) %>% 
    bind_cols(t(apply(., 1, function(x) abs(x - 1:5)*runif(5))) %>% 
    as_tibble() %>% setNames(paste0("cl", 1:5)))

The dataset with initial classes and costs to move the row into the other classes (cl1-5).

head(dat)
# A tibble: 6 × 6
  class   cl1   cl2   cl3   cl4   cl5
  <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1     2 0.298 0     0.353 1.52  2.57 
2     5 0.129 1.12  0.986 0.867 0    
3     3 0.940 0.446 0     0.811 0.670
4     5 0.759 2.82  1.65  0.582 0    
5     3 1.39  0.134 0     0.438 1.39 
6     5 2.89  0.492 0.870 0.954 0  

I first solved the mass transport problem, identifying the number of cells that need to be moved from a certain class into another:

## mass distribution of classes
prob0 <- dat %>% group_by(class) %>% summarise(p = n()/nrow(dat))

## new distribution of classes
prob1 <- tibble(class = sample(rep(1:5, each = 1000), 200)) %>% 
   group_by(class) %>% summarise(p = n()/nrow(dat))

## Solving transport problem of masses
trans <- transport(prob0$p, prob1$p, costs) %>% filter(from!=to) %>%
    left_join(dat %>% group_by(class) %>% 
    summarise(rows = n()), by = join_by("from"=="class")) %>%
    mutate(nrRows = round(rows*mass,0))
trans
  from to mass rows nrRows
1    1  4 0.03   39      1
2    3  4 0.06   46      3
3    5  4 0.04   48      2

With this, I know how many cells should be moved from the initial class into a new class.

QUESTION: What method can I use to efficiently find the best rows to move according to the costs (minimisation costs for all changes of classes) that occur by moving the row?

I wonder of that problem can be solved directly without calculating the mass flow between classes, i.e. moving rows directly minimising the costs until the desired class distribution is achieved?


Solution

  • As already mentioned you can model it as binary transportation problem where sum of each row must be 1 and sum of each column #rows * wanted distribution.

    m <- as.matrix(dat[,-1])
    p0 <- prob1[["p"]]
    
    res <- lpSolve::lp.transport(
      cost.mat = m,
      direction = "min",
      row.signs = rep("==", nrow(m)),
      row.rhs = rep(1, nrow(dat)),
      col.signs = rep("==", ncol(m)),
      col.rhs = nrow(dat) * p0
    )
    
    res[["objval"]] # cost of changes
    
    new_groups <- apply(res[["solution"]], 1, function(x) which(x == 1)) # new group for each row
    
    table(dat[["class"]], new_groups) # change matrix
    
    change_rows <- which(new_groups != dat[["class"]])
    if(length(change_rows) > 0)
      changes <- data.frame(
        row = change_rows,
        old_group = dat[["class"]][change_rows],
        new_group = new_groups[change_rows]
      )
    

    This obviously wont work if it is impossible to fit new distribution - nrow(dat) * p0 must be integer vector. In other cases you can use mco::nsga2 to minimize cost of group change and distribution error.