importsassas-macro

SAS Applying Dataset of column names to a proc import


I have a very large dataset that I'm reading in, it is 2400 columns long. When I try and read this in via Proc Import it cuts off the variable names so instead I thought I could read in the column names first, then apply these to the data when I'm reading it in.

I've been working with the below code, the issue I'm having is it doesn't delimitate by the csv commas. So it doesn't read in properly.

data var_names;
    length var $300;
    infile 'mylocation/import_file.CSV' delimiter=',' obs=1  lrecl=40423;
    input var @@ ;
run;

data var_names;
set var_names;
    translated=compress(var, , 'kad');
run;


data _NULL_;
file "mylocation\read_file.sas";
set var_names end=fine;
if _N_=1 then do;
put "data imported_data;";
put "    length        ";
end;
put "       " translated       ;
if fine then
put "   $300;";
run;    


data _NULL_;
file "mylocation\read_file.sas" mod;
set var_names end=fine;
if  _N_=1 then do;
    put "infile 'mylocation\import_file.CSV' delimiter=',' firstobs=2  lrecl=40423;";
    put "input                                                                                        ";
end;
put "      " translated;
if fine then do; 
    put "    ;";
    put "run;";
end;
run;

%include "mylocation\read_file.sas";

Solution

  • You have to use the DSD option to properly read a delimited text file, like a CSV file.

    Alice,F,13,,84
    Barbara,F,,65.3,98
    

    Otherwise SAS will treat multiple adjacent delimiters as a single delimiter. Just like it would treat spaces in an ordinary text file that has extra spaces added in front of short values to make neat columns.

    Alice   F 13    .  84
    Barbara F  . 65.3  98
    

    So change your generated INFILE statement:

    put "infile 'mylocation\import_file.CSV' dsd firstobs=2 lrecl=40423;";
    

    The default delimiter when using the DSD option is already a comma.

    PROC IMPORT cannot handle header lines that are longer than 32,767 bytes. If you want a tool that can analyze a delimited text file and create SAS code to read it try this macro : https://github.com/sasutils/macros/blob/master/csv2ds.sas

    If you are forced to stick with proc import then read the names separately and generate RENAME pairs for the names that are different than what PROC IMPORT generated.

    Example:

    filename csv 'myfilename.csv' lrecl=1000000;
    proc import file=csv out=want replace dbms=csv; run;
    data newnames ; infile csv dsd obs=1; input newname :$32. @@; run;
    proc transpose data=want(obs=1) out=oldnames name=oldname; var _all_; run;
    filename rename temp;
    data renames; 
      merge oldnames newnames end=eof; 
      *no by statement; 
      file rename; 
      if _n_=1 then put 'rename';
      if upcase(oldname) ne upcase(newname) then 
         put oldname :$quote. +(-1) 'n=' newname :$quote. +(-1) 'n' 
      ;
      if eof then put ';' ;
    run;
    proc dataset lib=WORK nolist;
      modify want;
    %include rename / source2;
    run;quit;