statapanel-datamultiple-databasesstata-macroswide-format-data

Is there a way to extract year range from wide data?


I have a series of wide panel datasets. In each of these, I want to generate a series of new variables. E.g., in Dataset1, I have variables Car2009 Car2010 Car2011 in a dataset. Using this, I want to create a variable HadCar2009, which is 1 if Car2009 is non-missing, and 0 if missing, similarly HadCar2010, and so on. Of course, this is simple to do but I want to do it for multiple datasets which could have different ranges in terms of time. E.g., Dataset2 has variables Car2005, Car2006, Car2008.

These are all very large datasets (I have about 60 such datasets), so I wouldn't want to convert them to long either.

For now, this is what I tried:

forval j = 1/2{
  use Dataset`j', clear
  forval i=2005/2011{
     capture gen HadCar`i' = .
     capture replace HadCar`i' = 1 if !missing(Car`i')
     capture replace HadCar`i' = 0 if missing(Car`i')
  }
  save Dataset`j', replace
}

This works, but I am reluctant to use capture, because perhaps some datasets have a variable called car2008 instead of Car2008, and this would be an error I would like the program to stop at.

Also, the ranges of years across my 60-odd datasets are different. Ideally, I would like to somehow get this range in a local (perhaps somehow using describe? I'm not sure) and then just generate these variables using that local with a simple for loop.

But I'm not sure I can do this in Stata.


Solution

  • Your inner loop could be rewritten from

    forval i=2005/2011{
         capture gen HadCar`i' = .
         capture replace HadCar`i' = 1 if !missing(Car`i')
         capture replace HadCar`i' = 0 if missing(Car`i')
      }
    

    to

    foreach v of var Car???? { 
          gen Had`v' = !missing(`v') 
    } 
    

    noting the fact in Stata that true or false expressions evaluate to 1 or 0 directly.

    https://www.stata-journal.com/article.html?article=dm0099

    https://www.stata-journal.com/article.html?article=dm0087

    https://www.stata.com/support/faqs/data-management/true-and-false/

    This code is going to ignore variables beginning with car. There are other ways to check for their existence. However, if there are no variables Car???? the loop will trigger an error message. A loop over ?ar???? would catch car???? and Car???? (but just possibly other variables too).