rmeanbinningexponential-distribution

Binning an unevenly distributed column in R


I have to a column in R which has uneven distribution like an exponential distribution. I want to normalize the data and then bin the data in subsequent buckets.

Saw following links which helps in normalizing the data but nothing with binning the data to different categories.

Normalizing data in R

Standardize data columns in R

Example: of how eneven distributed column would look like but with lot of rows.

dat <- data.frame(Id = c(1,2,3,4,5,6,7,8),
                  Qty = c(1,1,1,2,3,13,30,45))

I want it binned the column in 5 categories which may look like:

dat <- data.frame(Id = c(1,2,3,4,5,6,7,8),
                      Qty = c(1,1,1,2,3,13,30,45),
                      Binned_Category = c(1,1,1,1,2,3,4,5))

Above binned_Category is sample, the values may not look like this for the given data in real world. I just wanted to showcase how I want the output to look like.


Solution

  • This will help:

    num_bins <- 5
    findInterval(Qty, unique(quantile(Qty, prob = seq(0, 1, 1/num_bins))))