rdesctools

computing the vector of values given the expected Gini indices


I am using DescTools to compute Gini indices, a measure of inequality, which works fine. But I can't seem to figure out how I can use it to compute the inverse: what values should I enter so that the Gini indices are equal.

data

For reproducibility, here is the data I am working with-

# setup
set.seed(123)
library(DescTools)
library(dplyr)

# data
df <-
  structure(list(share = c(
    1.0927902450891e-05, 1.15255254587552e-05,
    1.17490961074116e-05, 2.94139776697196e-05, 0.00011539470233412,
    1.9005230595808e-05, 1.30015962776165e-05, 2.78830621259284e-05,
    3.60539655756737e-06, 3.52621581472531e-06, 2.08516461722044e-06,
    3.71562392174051e-06, 5.9923585443842e-06, 1.81981353418487e-06,
    4.34979294985559e-06, 3.02671726234962e-06, 2.12453772387389e-06,
    2.11908550914134e-06, 1.00308086256127e-06, 1.80107488148927e-06,
    2.60305223492859e-06, 6.26982073798782e-07, 9.59182708805635e-07,
    2.94622403616777e-06, 6.90271043800262e-07, 2.93824099499653e-07,
    8.21549067353436e-07, 2.72552493097834e-07, 7.89679523466669e-07,
    3.48883857629005e-07, 8.09840547160032e-07, 2.15137191096772e-07,
    1.64298848805113e-06, 3.97217885926968e-08, 7.77111892663095e-07,
    6.98248286041764e-07, 6.63616790078154e-07, 2.27849808697301e-07,
    7.89749220781519e-07, 6.66388374298488e-07
  ), share_hr = c(
    19488,
    18316, 16035, 6052, 1025, 6318, 17448, 5086, 30818, 13213, 58788,
    15319, 8972, 136088, 35123, 6874, 79538, 75868, 152369, 138806,
    72289, 131665, 241332, 53906, 633809, 236347, 616133, 276469,
    604729, 168079, 562280, 277543, 376314, 541400, 543215, 182714,
    523227, 182869, 454487, 479647
  ), mode = structure(c(
    1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
    4L, 4L, 4L, 4L, 4L
  ), .Label = c(
    "mode-1", "mode-2", "mode-3", "mode-4"
  ), class = "factor")), row.names = c(NA, -40L), class = c("tbl_df", "tbl", "data.frame"))

problem

I can now compute Gini indices (as a measure of inequality) for each mode using DescTools:

df %>%
  dplyr::group_by(mode) %>%
  dplyr::summarise(Gini = DescTools::Gini(x = share, n = share_hr)) %>%
  dplyr::ungroup(.)

#> # A tibble: 4 x 2
#>   mode    Gini
#>   <fct>  <dbl>
#> 1 mode-1 0.229
#> 2 mode-2 0.208
#> 3 mode-3 0.264
#> 4 mode-4 0.261

But then I also want to compute the reverse:
What should the values in the share column be so that this inequality is not observed (which means identical Gini indices). Note that I want the share_hr to remain the same.

#> # A tibble: 4 x 2
#>   mode    Gini
#>   <fct>  <dbl>
#> 1 mode-1  0.25
#> 2 mode-2  0.25
#> 3 mode-3  0.25
#> 4 mode-4  0.25

Is there any way to do this using the DescTools package or any other package?


Solution

  • Did I get your question correctly so, that you have a Gini coefficient and are looking for values to generate it? If so, I'm quite sure that your plan cannot work. The Gini coefficient is a scalar calculated from a quotient of areas, and so there is no unambiguous assignment of a Gini coefficient to a vector of values that determine the Lorenz curve. You might find an infinity of vectors fulfilling your condition.

    What you can have is the inverse of the Lorenzcurve, as in the following example:

    d.frm <- filter(as.data.frame(df), mode=="mode-1")
    
    # find specific function values using predict
    lx <- with(d.frm, Lc(x = share, n = share_hr))
    plot(lx)
    
    # get interpolated function value at p=0.55
    (y0 <- predict(lx, newdata=0.45))
    abline(v=0.45, h=y0$L, lty="dotted")
    
    # and for the inverse question use approx
    (y0 <- approx(x=lx$L, y=lx$p, xout=0.6))
    abline(h=0.6, v=y0$y, col="red")