dataframestatadata-manipulationdata-wrangling

Cross-sectional to panel data (Stata) "repeated time values within panel"


I am relatively new to Stata and I currently have a Reddit dataset in cross-sectional format with each row representing a given Reddit post by a username, and with some usernames posting several times per day while others post only once/twice in the entire dataset.

* Example generated by -dataex-. For more info, type help dataex
clear
input float id str36 username int date
 6 "(crash )" 19013
end
format %td date

I am interested in running a Heckman selection model, so I am trying to convert the data into a panel format, I created an ID variable per username as shown below:

egen id = group(username)

Then ran this to declare the data as panel following the guideline here

xtset id date

And I am receiving the following error: "repeated time values within panel" and I am not sure how to solve this because I believe in my case this is not problematic given that it's typical for social media users to post several times within the same day, which my time unit in this dataset.

If I ran the same code without the date variable, the code works w/out any errors but my understanding is that I need to use both variables for a panel format.


Solution

  • You could use a timestamp to handle this. There is usually one available in session data. Just make sure to store it as a double:

      . clear
    
    . input byte id int date double ts
    
               id      date          ts
      1. 1 0 0
      2. 1 0 1000
      3. 1 0 2000
      4. end
    
    . format %td date
    
    . format %tc ts
    
    . list, clean noobs
    
        id        date                   ts  
         1   01jan1960   01jan1960 00:00:00  
         1   01jan1960   01jan1960 00:00:01  
         1   01jan1960   01jan1960 00:00:02  
    
    . xtset id ts
    
    Panel variable: id (strongly balanced)
     Time variable: ts, 01jan1960 00:00:00 to 01jan1960 00:00:02, but with gaps
             Delta: .001 seconds
    
    . xtset id date
    repeated time values within panel
    r(451);
    

    Alternatively, collapse to user x date level if your analysis permits it.