importsasfixed-width

How to import different datasets with different variables from the same text file in SAS


I have a large text file with fixed columns that I need to import to SAS. The same file contains information at three levels: firms, associates and economic variables.

The first character of every observation tells which level the values in the row are about (1 is about firms, 2 about associates...). According to the layout, each level has different variables of different lengths.

This is a small example of the data I have to import. The first 4 variable are the same but afterwards each level has its own variables with different lengths. The variable "type" identifies which level the observation is about. And all the other variables are bunched up in "other":

DATA WORK.test;
INFILE DATALINES;
INPUT
    type $ 1-1
    full_registry $ 2-2
type_update $ 3-3
cnpj $ 4-17
other $ 18-1200

;

DATALINES;
1F 000000000001911BANCO DO BRASIL SA     DIRECAO GERAL        022005110300                                                                                                                                                                 
2F 000000000001912MARCIO HAMILTON FERREIRA    000***923641**100000020101117                                                                                       
2F 000000000001912NILSON MARTINIANO MOREIRA   000***491386**100000020101117                                                                                                                          
2F 000000000001912WALTER MALIENI JUNIOR       000***718468**100000020101117                                                                                                                          
2F 000000000001912PAULO ROGERIO CAFFARELLI    000***415907**100000020120327                                                                                                                          
2F 000000000001912ADRIANO MEIRA RICCI         000***295868**100000020180412                                                                                                                          
2F 000000000001912MARCOS RENATO COLTRI        000***890627**100000020180418                                                                                                                           
2F 000000000001912CARLA NESI                  000***370266**100000020170614                                                                                                                           
6F 000000000001916499999                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
1F 000000000002722BANCO DO BRASIL SA     MANAUS (AM)          022005110300                                                                                                                        
1F 000000000003532BANCO DO BRASIL SA     SANTOS - SANTOS (SP) 022005110300                                                                                                                              
1F 000000000004342BANCO DO BRASIL SA     CAMPOS EST.UNIF.     022005110300                                                                                                                          
1F 000000000005152BANCO DO BRASIL SA     MACAPA-EST.UNIF.     022005110300                                                                                                                          
;;;;
run;

What's the best way to read this file?

I thought of two possibilities:

I. Read the txt three times with the three layouts for each level. Throw away the the observations that were read wrong. I'm not sure if this will work.

II. Read the first 4 variables and put the rest of the line on a string variable, like I did in this example. Then use a bunch of substr() to separate the several variables. I'm not sure if this is the smartest way to do this.

Am I'm missing a much simpler way to do this?

I'm using Enterprise version 7.13.


Solution

  • You can use the single trailing @ at the end of an input statement to hold the current line for further input statements in the same data step. E.g.

    DATA want;
    INFILE DATALINES;
    /*Always input the first 4 variables*/
    INPUT type $ 1-1 full_registry $ 2-2 type_update $ 3-3 cnpj $ 4-17 @;
    select(type);
    when('1') input var1; /*fill this in with input rules for type 1 records*/
    when('2') input var2; /*fill this in with input rules for type 2 records*/
    otherwise input; /*do not input any other variables - just ignore the row*/
    end;
    DATALINES;
    1F 000000000001911BANCO DO BRASIL SA     DIRECAO GERAL        022005110300                                                                                                                                                                 
    2F 000000000001912MARCIO HAMILTON FERREIRA    000***923641**100000020101117                                                                                       
    2F 000000000001912NILSON MARTINIANO MOREIRA   000***491386**100000020101117                                                                                                                          
    2F 000000000001912WALTER MALIENI JUNIOR       000***718468**100000020101117                                                                                                                          
    2F 000000000001912PAULO ROGERIO CAFFARELLI    000***415907**100000020120327                                                                                                                          
    2F 000000000001912ADRIANO MEIRA RICCI         000***295868**100000020180412                                                                                                                          
    2F 000000000001912MARCOS RENATO COLTRI        000***890627**100000020180418                                                                                                                           
    2F 000000000001912CARLA NESI                  000***370266**100000020170614                                                                                                                           
    6F 000000000001916499999                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
    1F 000000000002722BANCO DO BRASIL SA     MANAUS (AM)          022005110300                                                                                                                        
    1F 000000000003532BANCO DO BRASIL SA     SANTOS - SANTOS (SP) 022005110300                                                                                                                              
    1F 000000000004342BANCO DO BRASIL SA     CAMPOS EST.UNIF.     022005110300                                                                                                                          
    1F 000000000005152BANCO DO BRASIL SA     MACAPA-EST.UNIF.     022005110300                                                                                                                          
    ;;;;
    run;