statafillna

Fill in missing values by group for all the variables in the dataset


If I have a dataset like this one:

Company     Year   Puzzles    Racquets   Choices  Business   

Allegra      2012   10         5            3        F
Allegra      2013   .          4            .        G
Allegra      2014   .          .            .        
Sapota       2012   7          .            8        
Sapota       2013   .          14           .        A
Sapota       2014   .          .            .
Genvin       2012   2          5            4        F
Genvin       2013   .          .            . 
Genvin       2014   .          .            .

I can fill in the missing values for Puzzles with the previous value as follows:

bysort Company Year (Puzzles) : replace Puzzles = Puzzles[_n-1] if missing(Puzzles)

Where applicable, I can do the forward fill too:

bysort Company Year (Puzzles) : replace Puzzles = Puzzles[_n+1] if missing(Puzzles)

I know how to fill in missing values with one specific column. But, how do I do this if my dataset has over 300 unique variables (and the variable names have no discernible pattern)?
I want to replace missing values with previous values by group for all my columns. My variables can be integers or strings.

So in my toy model, the final dataset will look like:

Company     Year   Puzzles    Racquets   Choices  Business    

Allegra      2012   10         5            3        F
Allegra      2013   10         4            3        G 
Allegra      2014   10         4            3        G
Sapota       2012   7          14           8        A
Sapota       2013   7          14           8        A 
Sapota       2014   7          14           8        A
Genvin       2012   2          5            4        F
Genvin       2013   2          5            4        F 
Genvin       2014   2          5            4        F

Solution

  • Please use dataex to give Stata data examples. The tag Wiki gives very detailed advice, including that.

    That just calls for a loop over variables. Here is some technique.

    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str7 Company int Year byte(Puzzles Racquets Choices) str1 Business
    "Allegra" 2012 10  5 3 "F"
    "Allegra" 2013  .  4 . "G"
    "Allegra" 2014  .  . . "" 
    "Sapota"  2012  7  . 8 "" 
    "Sapota"  2013  . 14 . "A"
    "Sapota"  2014  .  . . "" 
    "Genvin"  2012  2  5 4 "F"
    "Genvin"  2013  .  . . "" 
    "Genvin"  2014  .  . . "" 
    end
    
    ds Company Year, not 
    local varlist `r(varlist)' 
    
    
    foreach v of local varlist { 
        bysort Company (Year) : replace `v' = `v'[_n-1] if missing(`v') 
    }
    

    What is gained for research goals is not so clear.

    Key detail: Backward fill and forward fill are not symmetric in Stata. For more discussion, see e.g. this paper.