I have the following dataset from a crossover design study with participant_id, treatment_arm, and date_of_treatment as follows:
| participant_id | treatment_arm | date_of_treatment |
|---|---|---|
| 1 | A | Jan 1 2022 |
| 1 | B | Jan 2 2022 |
| 1 | C | Jan 3 2022 |
| 2 | C | Jan 4 2022 |
| 2 | B | Jan 5 2022 |
| 2 | A | Jan 6 2022 |
So for participant_id 1, based on the order of the date_of_treatment, the sequence would be ABC. For participant_id 2, it would be CBA.
Based on the above, I want to create column seq as follows:
| participant_id | treatment_arm | date_of_treatment | seq |
|---|---|---|---|
| 1 | A | Jan 1 2022 | ABC |
| 1 | B | Jan 2 2022 | ABC |
| 1 | C | Jan 3 2022 | ABC |
| 2 | C | Jan 4 2022 | CBA |
| 2 | B | Jan 5 2022 | CBA |
| 2 | A | Jan 6 2022 | CBA |
How do I go about creating the column using the 3 variables participant_id, treatment_arm, and date_of_treatment in datastep?
You could use a double DoW Loop
data want;
do until (last.participant_id);
set have;
length seq :$3.;
by participant_id;
seq = cats(seq, treatment_arm);
end;
do until (last.participant_id);
set have;
by participant_id;
output;
end;
run;
Remember to change the length of seq should there be more than 3 treatments for each participant.
participant_id treatment_arm date_of_treatment seq
1 A 01JAN2022 ABC
1 B 02JAN2022 ABC
1 C 03JAN2022 ABC
2 C 04JAN2022 CBA
2 B 05JAN2022 CBA
2 A 06JAN2022 CBA