sasdatasetmultiple-columnsloaddata

SAS: how to load a text file with a lot of spaces in between columns


I have a text file downloaded from the BLS website that has a lot of spaces in between columns.

Code:

data unemployment;
infile 'P:\Projects\la.data.2.AllStatesU.txt' dsd firstobs=2;
input   @1  series_id : $20.
        @32 year
        @36 period : $3.
        @51 value   
        @57 footnote_codes : $1.;


run;

But I get a mess of errors

NOTE: Invalid data for year in line 2 32-53.
NOTE: Invalid data for value in line 2 51-53.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9--

3   CHAR  LAUST010000000000003          .1976.M02.         7.7. 53
    ZONE  44555333333333333333222222222203333043302222222223230
    NUMR  C15340100000000000030000000000919769D0290000000007E79

The period column has the first two characters right, but the year and everything else is wrong. How do I fix this?

Snapshot of text file:

text file

code output:

code output


Solution

  • The file from that website

    filename bls url 
      "https://download.bls.gov/pub/time.series/la/la.data.2.AllStatesU"
    ;
    

    has tab characters in it. That is shown in the example you posted of line 3 from the SAS LOG.

    You can either tell the INFILE statement to expand the tabs into spaces and read it as fixed column format.

    data unemployment;
      infile bls expandtabs firstobs=2 truncover;
      input
        series_id $ 1-20
        year 33-36
        period $ 41-43
        value ?? 50-60
        footnote_codes $ 65
      ;
    run;
    

    Or tell it that the tab character is the delimiter.

    data unemployment;
      infile bls dlm='09'x dsd firstobs=2 truncover;
      input
        series_id :$20.
        year 
        period :$3.
        value ?? 
        footnote_codes :$1.
      ;
    run;
    

    Note: The ?? modifier for VALUE is because the file has a hyphen to represent missing values in that field. The ?? input modifier will tell the data step to not flag those as data errors.