If I have a dataset like this one:
Company Year Puzzles Racquets Choices Business
Allegra 2012 10 5 3 F
Allegra 2013 . 4 . G
Allegra 2014 . . .
Sapota 2012 7 . 8
Sapota 2013 . 14 . A
Sapota 2014 . . .
Genvin 2012 2 5 4 F
Genvin 2013 . . .
Genvin 2014 . . .
I can fill in the missing values for Puzzles with the previous value as follows:
bysort Company Year (Puzzles) : replace Puzzles = Puzzles[_n-1] if missing(Puzzles)
Where applicable, I can do the forward fill too:
bysort Company Year (Puzzles) : replace Puzzles = Puzzles[_n+1] if missing(Puzzles)
I know how to fill in missing values with one specific column. But, how do I do this if my dataset has over 300 unique variables (and the variable names have no discernible pattern)?
I want to replace missing values with previous values by group for all my columns.
My variables can be integers or strings.
So in my toy model, the final dataset will look like:
Company Year Puzzles Racquets Choices Business
Allegra 2012 10 5 3 F
Allegra 2013 10 4 3 G
Allegra 2014 10 4 3 G
Sapota 2012 7 14 8 A
Sapota 2013 7 14 8 A
Sapota 2014 7 14 8 A
Genvin 2012 2 5 4 F
Genvin 2013 2 5 4 F
Genvin 2014 2 5 4 F
Please use dataex
to give Stata data examples. The tag Wiki gives very detailed advice, including that.
That just calls for a loop over variables. Here is some technique.
* Example generated by -dataex-. For more info, type help dataex
clear
input str7 Company int Year byte(Puzzles Racquets Choices) str1 Business
"Allegra" 2012 10 5 3 "F"
"Allegra" 2013 . 4 . "G"
"Allegra" 2014 . . . ""
"Sapota" 2012 7 . 8 ""
"Sapota" 2013 . 14 . "A"
"Sapota" 2014 . . . ""
"Genvin" 2012 2 5 4 "F"
"Genvin" 2013 . . . ""
"Genvin" 2014 . . . ""
end
ds Company Year, not
local varlist `r(varlist)'
foreach v of local varlist {
bysort Company (Year) : replace `v' = `v'[_n-1] if missing(`v')
}
What is gained for research goals is not so clear.
Key detail: Backward fill and forward fill are not symmetric in Stata. For more discussion, see e.g. this paper.