Saturday, April 9, 2016

Remove CRLF inside Double Quotes

Have you ever encountered a carriage return (CR) or line feed (LF) character inside of double quotes in a comma separated values (CSV) file? As soon as SAS sees either of these characters it interprets it as the end of line and will read the rest of the line as the next line.

If you run the sample file created below and open it in an editor such as NotePad++ you can see the offending character by clicking on the "Show All Characters" button.

%let fname = c:\temp\test.csv ;

/* Create sample CSV file with LF inside of double quotes */
data _null_ ;
  file "&fname." ;
  /* byte(10) is a line feed (LF) same as '0A'x */
   x = cat( '"All Good"', ',"Line ', byte( 10 ), ' Feed"'   ) ;
   put "Column1, Column2" ;
   put x ;
run ;

data test1 ;
  infile "&fname." dsd truncover firstobs = 2 ;
  input  col1 : $16. col2 : $16. ;
run ;

The below macro was written based on this SAS tech support entry. Sharedbuffers are used to read and write to the same file. Instead of reading the entire line all at once, it reads the data one byte at a time as a binary stream using the recfm = n option. The code uses the logical NOT operator (^) to toggle the value (from true to false or vice versa) when finding the first and next double quote to determine when to convert a CR or LF character to a blank. This technique limits the removal of CRLF characters to only those that occur inside of double quotes and not the true end of line.

%macro removeQuotedCRLF( filename = ) ;
  %if not %sysfunc( fileexist( "&filename." ) ) %then %do ;
    %put %str(E)RROR: The file: &filename was not found. ;
    %return ;
  %end ;

  data _null_ ;
    infile "&filename." recfm = n sharebuffers ;
    file "&filename." recfm = n ;
    retain opendoublequote 0 ;
    input byte $char1. ;

    if byte = '"' then opendoublequote = ^( opendoublequote ) ;
    if opendoublequote then do ;
           if byte = '0D'x then put " " ;  /* CR */
      else if byte = '0A'x then put " " ;  /* LF */
    end ;
  run ;
%mend ;

%removeQuotedCRLF( filename = &fname. )

data test2 ;
  infile "&fname." dsd truncover firstobs = 2 ;
  input  col1 : $16. col2 : $16. ;
run ;

No comments:

Post a Comment