rnorm

Adding a column according to norm data in R


I have a longitudinal dataset in the long form with the length of around 2800, with around 400 participants in total. Here's a sample of my data.

#    ID  wave score sex age edu 
#1  1001 1   28     1 69  12
#2  1001 2   27     1 70  12
#3  1001 3   28     1 71  12
#4  1001 4   26     1 72  12
#5  1002 1   30     2 78   9
#6  1002 3   30     2 80   9
#7  1003 1   30     2 65  16
#8  1003 2   30     2 66  16
#9  1003 3   29     2 67  16
#10 1003 4   28     2 68  16
#11 1004 1   22     2 85   4
#12 1005 1   20     2 60   9
#13 1005 2   18     1 61   9
#14 1006 1   22     1 74   9
#15 1006 2   23     1 75   9
#16 1006 3   25     1 76   9
#17 1006 4   19     1 77   9

I want to create a new column "cutoff" with values "Normal" or "Impaired" because my outcome data, "score" has a cutoff score indicating impairment according to norm. The norm consists of different -1SD measures(the cutoff point) according to Sex, Edu(year of education), and Age.

Below is what I'm currently doing, checking an excel file myself and putting in the corresponding cutoff score according to the three conditions. First of all, I am not sure if I am creating the right column.

data$cutoff <- ifelse(data$sex==1 & data$age<70
                  & data$edu<3
                  & data$score<19.91, "Impaired", "Normal")
data$cutoff <- ifelse(data$sex==2 & data$age<70
                  & data$edu<3
                  & data$score<18.39, "Impaired", "Normal")

Additionally, I am wondering if I can import an excel file stating the norm, and create a column according to the values in it.

The excel file has a structure as shown below.

#      Sex  Male                      Female            
#60-69 Edu(yr)  0-3 4-6 7-12   13>=   0-3   4-6 7-12    13>=
#Age   Number   22  51  119    72     130   138 106     51
#      Mean   24.45 26.6 27.06 27.83  23.31 25.86   27.26   28.09
#      SD     3.03  1.89    1.8 1.53  3.28  2.55    1.85    1.44
#     -1.5SD' 19.92 23.27   23.76   24.8    18.53   21.81   23.91   25.15
#70-79 Edu(yr)  0-3 4-6 7-12   13>=   0-3   4-6 7-12    13>=
....

I have created new columns "agecat" and "educat," allocating each ID into a group of age and education used in the norm. Now I want to make use of these columns, matching it with rows and columns of the excel file above. One of the motivations is to create a code that can be used for further research using the test scores of my data.


Solution

  • I think your ifelse statements should work fine, but I would definitely import the Excel file rather than hardcoding it, though you may need to structure it a bit differently. I would structure it just like a dataset, with columns for Sex, Edu, Age, Mean, SD, -1.5SD, etc., import it into R, then do a left outer join on Sex+Edu+Age:

    merge(x = long_df, y = norm_df, by = c("Sex", "Edu(yr)", "Age"), all.x = TRUE)
    

    Then you can compare the columns directly.