rrescale

Rescaling the sum of 3 variables in R to equal exactly 1


I have a dataframe as below where there are 3 columns, each representing a proportion of time spent in a singlular activity.

df <- data.frame(ID = c(1, 2, 3, 4),
                (time_1 = c(0.2500, 0.2501, 0.2499, 0.2500),
                (time_2 = c(0.5000, 0.5000, 0.5001, 0.5001),
                (time_3 = c(0.2501, 0.2499, 0.5001, 0.2498),
                (sum_time = c(1.0001, 1.0000, 1.0001, 0.9999))

  ID    time_1   time_2   time_3   sum_time
  1     0.2500   0.5000   0.2501   1.0001
  2     0.2501   0.5000   0.2499   1.0000
  3     0.2499   0.5001   0.5001   1.0001
  4     0.2500   0.5001   0.2498   0.9999

I intend to extract the compositional means of this data, however cannot do so if all of the values for sum_time do not exactly equal 1.

I have attempted to round to fewer decimal places using round(data$time_1, digits = 3) however this returns values of 0.999 and 1.001 in the instances that do not already equal 1.

I have also attempted to create a function whereby if the sum is either 1.0001 or 0.9999 then I subtract or add 0.0001 to one of the variables as the time difference in minutes is insignificant. However I cannot get these functions to work.

scale_compositions <- function(x){
  
if(df$sum_time== 1.0001) {df$time_1 - 0.0001}
if(df$sum_time == 0.9999) {df$time_1 + 0.0001}  
  
}

scale_compositions(x)

Ideally I would be able to rescale those variables that equal 1.0001 and 0.9999 such that each of the time_ intervals is either increased or reduced by an appropriate amount to ensure the proportions displayed remain as accurate as possible but have been unable to figure this out so far. I have been playing around with the rescale functions in various R packages to no avail currently.

Given the insignificance of the 0.0001 to the overall time being investigated, it is unlikely that removing or adding that value to ensure every proportion is equal to 1 will meaningfully impact results (although this will be tested) and I am happy to do that for the time being.

Any assistance would be greatly appreciated


Solution

  • I hope I haven't misunderstood your question but could this work?

    df <- data.frame(
      ID = c(1, 2, 3, 4),
      time_1 = c(0.2500, 0.2501, 0.2499, 0.2500),
      time_2 = c(0.5000, 0.5000, 0.5001, 0.5001),
      time_3 = c(0.2501, 0.2499, 0.2500, 0.2498)
    )
    
    df$sum_time <- rowSums(df[, c("time_1", "time_2", "time_3")])
    df$sum_time <- round(rowSums(df[, c("time_1", "time_2", "time_3")]), 3)
    df