variablesdatasetconditional-statementscreation

How to create a variable in R based on another dataset of different length


I'm trying to create a variable STATE, which is present in another dataset of different length than mine.

Both objects have a state coding variable GESTFIPS. So, I just want to let R check if the GESTFIPSmatch and then create the variable STATE in my dataset accordingly.

I tried:

> state_1865_base$STATE[state_1865_base$GESTFIPS==urate2$GESTFIPS] < - 
+ urate2$STATE[state_1865_base$GESTFIPS==urate2$GESTFIPS] 

And got the error message:

Error in -urate2$STATE[state_1865_base$GESTFIPS == urate2$GESTFIPS] : 
  invalid argument to unary operator
In addition: Warning messages:
1: In state_1865_base$GESTFIPS == urate2$GESTFIPS :
  longer object length is not a multiple of shorter object length
2: In state_1865_base$GESTFIPS == urate2$GESTFIPS :
  longer object length is not a multiple of shorter object length

My Dataset looks like (132990 obs. of 117 variables):

data.frame':    132990 obs. of  117 variables:
 $ IDENTIFIER          : chr  "20030100013280" "20030100013344" "20030100013352" "20030100013848" ...
 $ AGE                 : num  60 41 26 36 51 32 44 21 33 39 ...
 $ MALE                : num  1 0 0 0 1 0 0 0 0 0 ...
 $ BLACK               : num  1 0 0 1 0 0 0 0 0 1 ...
 $ MARRIED             : num  1 1 1 1 1 0 1 0 1 1 ...
 $ NUM_CHILD           : num  0 2 0 2 2 1 1 1 3 4 ...
 $ HV_CHILD            : num  0 1 0 1 1 1 1 1 1 1 ...
 $ AGE_YOUNGEST        : num  NA 0 NA 9 14 2 9 14 3 4 ...
 $ CHILD_4             : num  0 1 0 0 0 1 0 0 1 0 ...
 $ CHILD_5             : num  0 1 0 0 0 1 0 0 1 1 ...
 $ GRADE               : num  17 13 13 12 17 16 12 13 13 13 ...
 $ SPOUSE_EMP          : num  0 1 0 1 0 1 1 NA 1 0 ...
 $ SPOUSE_WORKHOURS    : num  NA 50 NA 40 NA 40 50 NA 40 NA ...
 $ WORKING             : num  1 1 1 0 1 1 1 1 1 1 ...
 $ UNEMP               : num  0 0 0 1 0 0 0 0 0 0 ...
 $ RETIRED             : num  0 0 0 0 0 0 0 0 0 0 ...
 $ DISABLED            : num  0 0 0 0 0 0 0 0 0 0 ...
 $ STUDENT             : num  0 0 0 0 0 0 0 0 0 0 ...
 $ HOMEMAKER           : num  0 0 0 0 0 0 0 0 0 0 ...
 $ WORK_PART           : num  1 1 1 0 0 0 0 0 0 0 ...
 $ HH_INCOME_03        : num  660 200 200 NA NA ...
 $ WAGE_03             : num  22 6.67 16.67 NA NA ...
 $ WAGE_03_ALT         : num  22 NA 12.5 NA NA NA NA 9.5 14 12 ...
 $ YEAR                : num  2003 2003 2003 2003 2003 ...
 $ DATASET             : num  2003 2003 2003 2003 2003 ...
 $ INTERVIEW_DAY       : num  5 6 6 4 4 4 1 2 6 4 ...
 $ INTERVIEW_DATE      : Date, format: "2003-01-03" "2003-01-04" "2003-01-04" "2003-01-02" ...
 $ GESTFIPS            : num  6 6 6 13 21 21 22 26 27 34 ...
[list output truncated]

This is the dataset urate, where the states are stored. (204 obs. of 6 variables)

STATE GESTFIPS NOBS TWOYEAR UNEMP       URATE
   AL   1      434    1     0.05392952  5.19585
   AL   1      288    2     0.02666941  3.63750
   AL   1      266    3     0.03848163  4.24585
   AL   1      248    4     0.11545039  9.59580
   AK   2       62    1     0.07917716  7.52915
   AK   2       41    2     0.12782212  6.70415
   AK   2       38    3     0.00000000  6.25835

Solution

  • state_1865_base$STATE <- urate2$STATE[match(state_1865_base$GESTFIPS, urate2$GESTFIPS)] should work.

    EDIT: My original, incorrect answer was

    It looks as though you are using < - for assignment. If you use <- instead, I think that your code will work.