SyntaxHighlighter

Friday, May 22, 2015

Reading multiple CSV files containing a header row

SAS allows you to read in multiple files with a wildcard which works out great for files that do not contain a header row. The firstobs = 2 can be used on a single file being processed but that only works on the first file of a concatenated set of files to be processed.

The way to get around this limitation is to use the EOV=variable option on the INFILE statement. It is important to understand that EOV is set to 1 when it encounters the first row of the 2nd and subsequent files. However, EOV stays set to 1 so you need to reset it to 0 in your code. Also the EOV=variable is treated as an automatic variable such as _N_ and _ERROR_ and it not written to the data set.

The below code uses prior blog entries split DSN to CSV and delete external files using wildcards in the creation of the sample code. The key to testing for the first row of a new file is the trailing @ on the input @ line. If that sets EOV = 1 then EOV is reset to 0 and the row is deleted. Otherwise that row is re-read on the next input statement.

%splitdsntocsv( 
    dsn          = sashelp.class
  , byvar        = sex
  , outputprefix = c:\temp\temp_
  , outputsuffix = csv
  , header       = Y )

filename fileref 'c:\temp\temp_*.csv' ;
data class2 ;
  infile fileref
    dsd
    truncover
    firstobs = 2 
    eov      = eov ;

  input @ ;

  if eov then do ;
    /* EOV = 1 after first row of next file and needs to be reset to 0 */
    eov = 0 ;
    delete ;
  end ;

  input name $ sex $ age height weight ;
run ;
filename fileref clear ;

%deletefiles( filename = c:\temp\temp_?.csv )

3 comments:

  1. Just wanted to say thanks - was able to use your code to solve my problem with header rows and reading in multiple files.

    ReplyDelete
  2. Thanks Lorraine, glad this post was of some value to you...

    ReplyDelete
  3. Thanks Tom for sharing such a wonderful code. I am glad that I came across this blog.
    One help that I would request is can you please emphasis in the statement "input @" and explain what it does and is it necessary to include in all such merge where EOV option is used.

    ReplyDelete