datevalidationstatareformatting

Problem with displaying reformatted string into a four-digit year in Stata 17


I turned to a Stata video "Data management: How to create a date variable from a date stored as a string by Chuck Huber" to make sure my date variable were formatted properly, however, I cannot get to show me the reformatted variable (school_year2) to display as a year (e.g. 2018).

Can someone let me know what I may be missing here?

Thank you,

.do file

gen school_year2 = date(school_year,"Y")
format %ty school_year2
list school_year school_year2 in 1/10

     +---------------------+
     | school~r   school~2 |
     |---------------------|
  1. |     2016    2.0e+04 |
  2. |     2016    2.0e+04 |
  3. |     2016    2.0e+04 |
  4. |     2016    2.0e+04 |
  5. |     2016    2.0e+04 |
     |---------------------|
  6. |     2016    2.0e+04 |
  7. |     2016    2.0e+04 |
  8. |     2016    2.0e+04 |
  9. |     2016    2.0e+04 |
 10. |     2016    2.0e+04 |
     +---------------------+

. end of do-file


Solution

  • The value of the underlying data is still days from 1 Jan 1960 as you are using the date() function. So keep %td as you are working with days here, not years. But then you can decide for it to display only the year using %tdCCYY C standing for century and Y for year. But remember, the underlying data point is still the day 1 Jan 2016 and not 2016

    clear
    input str4 school_year
    "2016"
    "2016"
    "2016"
    "2016"
    "2016"
    "2016"
    "2016"
    "2016"
    "2016"
    "2016"
    end
    
    gen school_year2 = date(school_year,"Y")
    format %tdCCYY school_year2
    list school_year school_year2 in 1/10
    

    If year is all you want to work with then use the year() function to get the year from the date. The examples below details steps you can play around with.

    clear
    input str4 school_year
    "2016"
    "2016"
    "2016"
    "2016"
    "2016"
    "2016"
    "2016"
    "2016"
    "2016"
    "2016"
    end
    
    gen school_year2 = date(school_year,"Y")
    gen school_year3 = year(school_year2)
    format %tdCCYY school_year2
    format %ty school_year3
    list in 1/10
    

    Note that in the last example, all values look the same to you. But the first variable is a string with the text "2016", the second is a date stored as the number of days from 1 Jan 1960 with only its year value displayed, and the last is a number with the number of years from year 0 displayed as a year (which in this case would have been the same had it been displayed as its underlying number).