datesasformatproc-sqldatastep

Convert character date to SAS date


My data is currently structured in the following way (dummy data below), where the date is formated as such example: 3/7/20 (M/D/YY) but I need the data in the following month-year form: 032020 (i.e. I need format mmyyn6.). I have tried a number of different things to get in this format, but nothing has worked.

Current data structure:

DATA HAVE;
    INPUT GROUP $ DATE $ COUNT_CUMU;
    DATALINES;
A 3/7/20 2
A 3/8/20 8
A 3/9/20 16
RUN;

These solutions don't work, and give me extraneous numbers.

DATA WANT1;
    SET HAVE;
    MONTH_YEAR = INPUT(DATE,mmyyn6.);
    FORMAT DATE MMDDYY8.;
RUN;

PROC SQL;
    CREATE TABLE WANT2 AS
    SELECT *,
           INPUT(DATE, ANYDTDTM.) AS MONTH_YEAR FORMAT=mmyyn6.
    FROM HAVE;
QUIT;

This solution works, but is not the format I need it in.

PROC SQL;
    CREATE TABLE WANT3 AS
    SELECT *,
           INPUT(DATE, ANYDTDTM.) AS MONTH_YEAR FORMAT=DTMONYY7.
    FROM HAVE;
QUIT;

Thank you for any advise or code you can share.


Solution

  • It is easy to do what you asked for.

    Use the MMDDYY informat to convert the strings into date values. Note that the INPUT() function does not care if you use a width on the informat that is larger than the length of the string being read, so use the maximum width the informat supports.

    You can use the MMDDYYN format to display dates without any separator character.

    You can use MMYYN format to display only the month and year. But in that case you might always want to change the date values to the first of the month.

    And it works for the example data you provided.

    DATA HAVE;
        INPUT GROUP $ DATE $ COUNT_CUMU;
    DATALINES;
    A 3/7/20 2
    A 3/8/20 8
    A 3/9/20 16
    ;
    
    data want;
      set have;
      real_date = input(date,mmddyy10.);
      format real_date yymmdd10. ;
      month_year = intnx('month',real_date,0);
      year_month = month_year;
      format month_year  mmyyn6. year_month yymmn6. ;
    run;
    

    Results:

                              COUNT_                  month_     year_
    Obs    GROUP     DATE      CUMU      real_date     year      month
    
     1       A      3/7/20       2      2020-03-07    032020    202003
     2       A      3/8/20       8      2020-03-08    032020    202003
     3       A      3/9/20      16      2020-03-09    032020    202003
    

    If it does not work for you then you need to show examples of the input strings that do not work. Or explain how having a date value that is displayed using the MMDDYYN format does not work for you.

    PS You should avoid using only two digits to record or display years. Look up Y2K problem. You should also avoid either MDY or DMY ordering of date digits to avoid confusing 50% of your audience. If you want to use only digits then use YMD order (YYMMDD or YYMMDDN format).