rpasteanonymize

Anonymize data for each distinct row in R


Example

Value

15   
15   
15   
4   
37   
37   
37  

There's three distinct values but 7 rows, below is what I want. Since I want to Anonymize my data. I keep getting the error "replacement has 3 rows, data has 7"

This is the code I'm using

final_df$Value <- paste("Value",seq(1:length(unique(final_df$Value))))

Value

Value 1
Value 1   
Value 1   
Value 2   
Value 3   
Value 3   
Value 3  

Solution

  • create function that does the job:

    anon <- function(x) {
        rl <- rle(x)$lengths
        ans<- paste("Value", rep(seq_along(rl), rl))
        return(ans)
    }
    

    call function:

    anon(final_df$Value)
    

    result:

    # [1] "Value 1" "Value 1" "Value 1" "Value 2" "Value 3" "Value 3" "Value 3"
    

    generalization:

    df1 <- mtcars
    df1[] <- lapply(df1, anon)
    names(df1)    <- paste0("V", seq_along(names(df1)))
    rownames(df1) <- NULL
    
    df1