I have a data set in R with a column yr_renovated that has either 0 or a integer (i.e 1998)for the year a house was renovated in. How would i create a factor variable with the levels yes and no for if the house was renovated or not.
head(House_Data$yr_renovated,n=20)
[1] 0 0 0 0 0 0 0 0 0 0 0 0 1998 0 0 0 0 0 0
I was thinking of something along of lines of
levels(renovated)[levels(renovated) <= 0] <- "no"
levels(renovated)[levels(renovated) > 0] <- "yes"
but i saw this used online and i dont know how exactly this works, also i realised that if i make a mistake in the assignment of levels lets say
levels(renovated)[levels(renovated) <= 0] <- "yes"
levels(renovated)[levels(renovated) > 0] <- "yes"
levels(renovated)[levels(renovated) <= 0] <- "no"
the last levels will not override the first one my only level will be yes, how would i remove that first wrongly assigned level?
no no no no no no no no no no no no yes no no no no no no no
Levels: no yes
This is what the final answer should look like or if using table()
renovated
no yes
5762 238
but sometimes it will give me this result
renovated
Yes
6000
Excuse my rookie knowledge of R, we haven't worked much on R during our statistics module in university so far
You can do it using factor
and assigning desired labels
:
yr_renovated <- c(0, 0, 1998, 0, 2010, 0)
renovated <- factor(yr_renovated == 0, labels=c("Yes", "No"))
table(renovated)
#> renovated
#> Yes No
#> 2 4