I am trying to parse out medication names from the dosage in a string variable. My end goal is to create two variables one being medication change and the other being dosage change. Here is a small example of my data:
frame create test
frame change test
input str109 med_name
"NITROFURANTOIN MACROCRYSTAL 100 MG CAPSULE"
"ACETAMINOPHEN 500 MG TABLET"
"APIXABAN 5 MG TABLET"
"ATOVAQUONE 500 MG/5 ML ORAL SUSPENSION""
"ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION"
"ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION"
I have tried to install and use strkeep
package but it would split "ACETAMINOPHEN 500 MG TABLET" into "500" and "ACETAMINOPHENMGTABLET".
I used moss
from SSC to find the first instance of a space followed by a number.
clear
input str109 med_name
"NITROFURANTOIN MACROCRYSTAL 100 MG CAPSULE"
"ACETAMINOPHEN 500 MG TABLET"
"APIXABAN 5 MG TABLET"
"ATOVAQUONE 500 MG/5 ML ORAL SUSPENSION""
"ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION"
"ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION"
end
moss med_name, match("( [0-9])") regex
gen wanted1 = substr(med_name, 1, _pos1 - 1)
gen wanted2 = substr(med_name, _pos1, .)
l wanted?, sep(0)
+------------------------------------------------------------+
| wanted1 wanted2 |
|------------------------------------------------------------|
1. | NITROFURANTOIN MACROCRYSTAL 100 MG CAPSULE |
2. | ACETAMINOPHEN 500 MG TABLET |
3. | APIXABAN 5 MG TABLET |
4. | ATOVAQUONE 500 MG/5 ML ORAL SUSPENSION |
5. | ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION |
6. | ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION |
+------------------------------------------------------------+
This could be frustrated by any drug name including numerals at the start of any word.