I have a dataset in Stata which looks as follows
ID Product Region Sales
001 JSF045 East~West~North 45~78~14
002 JSF056 East~West 56~35
003 GWW1 West~North~South~South East 11~16~45~36
I want to split the Region and Sales variables into separate variables (separation at ~). Each row will have different number of regions and sales figures but for every row, a region will correspond to a sales figure. I present below how I want the final data to look.
I want the final data to look like as below
ID Product Region Sales
001 JSF045 East 45
001 JSF045 West 78
001 JSF045 North 14
002 JSF056 East 56
002 JSF056 West 35
003 GWW1 West 11
003 GWW1 North 16
003 GWW1 South 45
003 GWW1 South East 36
* Example generated by -dataex-. For more info, type help dataex
clear
input str3 ID str6 Product str27 Region str11 Sales
"001" "JSF045" "East~West~North" "45~78~14"
"002" "JSF056" "East~West" "56~35"
"003" "GWW1" "West~North~South~South East" "11~16~45~36"
end
split Region, parse("~")
split Sales, parse("~") destring
drop Region Sales
reshape long Region Sales, i(ID) j(which)
drop if missing(Region)
list, sepby(ID)
+--------------------------------------------+
| ID which Product Region Sales |
|--------------------------------------------|
1. | 001 1 JSF045 East 45 |
2. | 001 2 JSF045 West 78 |
3. | 001 3 JSF045 North 14 |
|--------------------------------------------|
4. | 002 1 JSF056 East 56 |
5. | 002 2 JSF056 West 35 |
|--------------------------------------------|
6. | 003 1 GWW1 West 11 |
7. | 003 2 GWW1 North 16 |
8. | 003 3 GWW1 South 45 |
9. | 003 4 GWW1 South East 36 |
+--------------------------------------------+