appendstataframes

Stata combining datasets in memory


I'm using census data across multiple years. For each year, the dataset is laid out completely the same in terms of structure and content (i.e. the same questions are asked of respondents and their answers are laid out in the same way across years). I want to import multiple datasets, apply changes to each, and then combine them.

To illustrate what I mean, I imported the 2018 data (the frame is titled 'cps18') and removed three specific rows. I then used frame change and imported the 2019 data, called 'cps19,' and applied similar changes. When I use append using cps18, the console returns "file cps18 not found."

From what I can find online, it seems that the append command is used for combining datasets on the disk with the frame in memory. But what if I have two or more frames in memory? Is there a way to combine them?


Solution

  • past self! You can write a loop to perform the same actions across multiple files. Let's say you have cps17.csv, cps18.csv, and cps19.csv. We can write a loop to import each file, make the necessary changes, and then append:

      cwf default // revert to default frame
    
      clear 
      tempfile cpsdata
      save `cpsdata', replace empty
        
      forval y = 17/19 {
        import delimited cps`y'.csv
            
        g newvar1 = oldvar1 + oldvar2 // sample change
        destring year, replace // sample change
            
        append using `cpsdata', force
        save `cpsdata', replace 
      }