I have the following dataset from a crossover design study with participant_id
, treatment_arm
, and date_of_treatment
as follows:
participant_id | treatment_arm | date_of_treatment |
---|---|---|
1 | A | Jan 1 2022 |
1 | B | Jan 2 2022 |
1 | C | Jan 3 2022 |
2 | C | Jan 4 2022 |
2 | B | Jan 5 2022 |
2 | A | Jan 6 2022 |
So for participant_id
1, based on the order of the date_of_treatment
, the sequence would be ABC. For participant_id
2, it would be CBA.
Based on the above, I want to create column seq
as follows:
participant_id | treatment_arm | date_of_treatment | seq |
---|---|---|---|
1 | A | Jan 1 2022 | ABC |
1 | B | Jan 2 2022 | ABC |
1 | C | Jan 3 2022 | ABC |
2 | C | Jan 4 2022 | CBA |
2 | B | Jan 5 2022 | CBA |
2 | A | Jan 6 2022 | CBA |
How do I go about creating the column using the 3 variables participant_id
, treatment_arm
, and date_of_treatment
in datastep?
You could use a double DoW Loop
data want;
do until (last.participant_id);
set have;
length seq :$3.;
by participant_id;
seq = cats(seq, treatment_arm);
end;
do until (last.participant_id);
set have;
by participant_id;
output;
end;
run;
Remember to change the length of seq
should there be more than 3 treatments for each participant.
participant_id treatment_arm date_of_treatment seq
1 A 01JAN2022 ABC
1 B 02JAN2022 ABC
1 C 03JAN2022 ABC
2 C 04JAN2022 CBA
2 B 05JAN2022 CBA
2 A 06JAN2022 CBA