rreplaceindexing

Replacing occurrences of a number in multiple columns of data frame with another value in R


ETA: the point of the below, by the way, is to not have to iterate through my entire set of column vectors, just in case that was a proposed solution (just do what is known to work once at a time).


There's plenty of examples of replacing values in a single vector of a data frame in R with some other value.

And also how to replace all values of NA with something else:

What I'm looking for is analogous to the last question, but basically trying to replace one value with another. I'm having trouble generating a data frame of logical values mapped to my actual data frame for cases where multiple columns meet a criteria, or simply trying to do the actions from the first two questions on more than one column.

An example:

data <- data.frame(name = rep(letters[1:3], each = 3), var1 = rep(1:9), var2 = rep(3:5, each = 3))

data
  name var1 var2
1    a    1    3
2    a    2    3
3    a    3    3
4    b    4    4
5    b    5    4
6    b    6    4
7    c    7    5
8    c    8    5
9    c    9    5

And say I want all of the values of 4 in var1 and var2 to be 10.

I'm sure this is elementary and I'm just not thinking through it properly. I have been trying things like:

data[data[, 2:3] == 4, ]

That doesn't work, but if I do the same with data[, 2] instead of data[, 2:3], things work fine. It seems that logical test (like is.na()) work on multiple rows/columns, but that numerical comparisons aren't playing as nicely?


Solution

  • you want to search through the whole data frame for any value that matches the value you're trying to replace. the same way you can run a logical test like replacing all missing values with 10..

    data[ is.na( data ) ] <- 10
    

    you can also replace all 4s with 10s.

    data[ data == 4 ] <- 10
    

    at least i think that's what you're after?

    and let's say you wanted to ignore the first row (since it's all letters)

    # identify which columns contain the values you might want to replace
    data[ , 2:3 ]
    
    # subset it with extended bracketing..
    data[ , 2:3 ][ data[ , 2:3 ] == 4 ]
    # ..those were the values you're going to replace
    
    # now overwrite 'em with tens
    data[ , 2:3 ][ data[ , 2:3 ] == 4 ] <- 10
    
    # look at the final data
    data