Say that I have these data:
clear all
set obs 2
gen title = "dog - cat - horse" in 1
replace title = "chicken - frog - ladybug" in 2
tempfile data
save `data'
I can split these into three parts:
use `data', clear
split title, p(" - ")
And I can split them into two parts, discarding the third part:
use `data', clear
split title, p(" - ") limit(2)
Is there an off-the-shelf solution to split into only two parts, but to group everything after the first splitting character (dash in this case) into the second variable? In R, I would use separate
with the extra="merge"
option (see tidyr separate only first n instances).
In other words, for the first row, I would like the first observation's title1
to be dog
and for title2
to be cat - horse
.
I realize that this is possible using custom code (see Stata split string into parts), but I am hoping for a simple command along the lines of Stata's split
/R's separate
to accomplish my goal.
This isn't at present an option in the official split
command. (Full disclosure: I was the previous author.)
You could just write your own command. This one needs more generality and more error checks, but it does what I think you want with your data example. Detail: is trimming spaces desired?
clear all
set obs 2
gen title = "dog - cat - horse" in 1
replace title = "chicken - frog - ladybug" in 2
gen title1 = trim(substr(title, 1, strpos(title, "-") - 1))
gen title2 = trim(substr(title, strpos(title, "-") + 1, .))
program split2
syntax varname(string), parse(str) [suffixes(numlist int min=2 max=2)]
if "`suffixes'" == "" local suffixes "1 2"
tokenize "`suffixes'"
gen `varlist'`1' = trim(substr(`varlist', 1, strpos(`varlist', "`parse'") - 1))
gen `varlist'`2' = trim(substr(`varlist', strpos(`varlist', "`parse'") + strlen("`parse'"), .))
end
split2 title, parse("-") suffixes(3 4)
list
+--------------------------------------------------------------------------------+
| title title1 title2 title3 title4 |
|--------------------------------------------------------------------------------|
1. | dog - cat - horse dog cat - horse dog cat - horse |
2. | chicken - frog - ladybug chicken frog - ladybug chicken frog - ladybug |
+--------------------------------------------------------------------------------+
Note also the egen
function ends()
and its head
and tail
options. Using that would need two calls. It generates just one variable at a time.