I have a dataframe with names I have a second dataframe with a dictionary of names and the sex of those names I want to check if the name is in the dictionary if it is then add the sex from the dictionary table to the names dataframe
My code looks like below
# Sets everything to -1, 1 for male, 0 for female
train$sex <- "-1"
train$sex[toupper(train$fname) == nam_dict$Name]<-nam_dict$Sex
I am getting the following error
Error in train$sex[toupper(train$fname) == nam_dict$Name] <- nam_dict$Sex :
NAs are not allowed in subscripted assignments
In addition: Warning message:
In toupper(train$fname) == nam_dict$Name :
longer object length is not a multiple of shorter object length
I have a work around - i think - where i can split the dictionary into male and female and simply replace the <-nam_dict$Sex portion of the code with the character 'F' or 'M' depending on the dictionary
I just thought there would be a better way
So your problem can be solved by a quick and simple implementation of match
.
First, here's a quick reproducible example
(train <- data.frame(fname = c("Alex", "Jennifer", "David", "Alice")))
# fname
# 1 Alex
# 2 Jennifer
# 3 David
# 4 Alice
(nam_dict <- data.frame(Name = c("alice", "alex"), Sex = 0:1))
# Name Sex
# 1 alice 0
# 2 alex 1
A possible solution
train$sex <- nam_dict$Sex[match(tolower(train$fname), tolower(nam_dict$Name))]
train
# fname sex
# 1 Alex 1
# 2 Jennifer NA
# 3 David NA
# 4 Alice 0
Al bit more advanced solution (if you''ll ever work with huge data sets) is to try data.table
s binary join which allows you to update your data by reference, chose which columns to update while everything is done in a lightning speed.
First we will convert both data sets to data.table
class and make both columns lower case, then we will key
them by the column we want join by, finally we do a left join to train
while creating the sex
column by reference and pulling the data from i
(Sex column in nam_dict
)
library(data.table)
setDT(train)[, fname := tolower(fname)]
setDT(nam_dict)[, Name := tolower(Name)]
setkey(train, fname) ; setkey(nam_dict, Name)
train[nam_dict, sex := i.Sex]
train
# fname sex
# 1: alex 1
# 2: alice 0
# 3: david NA
# 4: jennifer NA