Friday, April 22, 2016

SQL Pass-Through Derived Table Join to SAS dataset

I have used the SAS/Access Interface to OLE DB for years. In the past I would capture data from a relational database management system (RDBMS) into a SAS data set and then join that remote data to a local SAS data set using another proc or data step.

Having to read the data more than once seemed inefficient, so I attempted to reference the pass-through data as a derived table (aka in-line view) and join that data directly to a SAS data set. Sure enough, as the below code using SQL Server Express shows this can be done.

In the simplified example below, a SAS data set named SASCAT is created containing the groupname and an abbreviated name (abbrname). After that, an OLEDB connection is made to the freely downloadable SQL Server Express database.

The derived table consists of the SQL statements contained between the parenthesis starting with the FROM CONNNECTION TO OLEDB keywords. That derived table is referenced as SQL which is then joined to the SAS data set sascat which is referenced as SAS and combined as an inner join using the ON clause.

%macro dbconnect( server= ) ;
  /* DSN-less OLEDB connection */
  connect to oledb
   (init="Provider=SQLNCLI11;Integrated Security=SSPI;Data Source=&server.") ;
%mend ;

%macro dbdisconnect() ;
  disconnect from oledb ;
  %if &sysdbrc ne 0 %then %put %str(E)RROR: %superq( sysdbmsg ) ;
%mend ;

data sascat ;
    groupname & $50.
    abbrname  : $4. ;
  datalines ;
  Executive General and Administration  Exec
  Inventory Management  Imgt
  Manufacturing  Mfg
  Quality Assurance  QA
run ;

proc sql ;
  %dbconnect( server = localhost\SQLExpress ) 
    create table results as
      select        sas.abbrname
                  , sql.count

      from          connection to oledb
        select      groupname
                  , count( groupname ) as count
        from        adventureworks2014.humanresources.department 
        group by    groupname
      ) sql

      join          work.sascat sas
        on          sql.groupname = sas.groupname 
quit ;

Saturday, April 9, 2016

Remove CRLF inside Double Quotes

Have you ever encountered a carriage return (CR) or line feed (LF) character inside of double quotes in a comma separated values (CSV) file? As soon as SAS sees either of these characters it interprets it as the end of line and will read the rest of the line as the next line.

If you run the sample file created below and open it in an editor such as NotePad++ you can see the offending character by clicking on the "Show All Characters" button.

%let fname = c:\temp\test.csv ;

/* Create sample CSV file with LF inside of double quotes */
data _null_ ;
  file "&fname." ;
  /* byte(10) is a line feed (LF) same as '0A'x */
   x = cat( '"All Good"', ',"Line ', byte( 10 ), ' Feed"'   ) ;
   put "Column1, Column2" ;
   put x ;
run ;

data test1 ;
  infile "&fname." dsd truncover firstobs = 2 ;
  input  col1 : $16. col2 : $16. ;
run ;

The below macro was written based on this SAS tech support entry. Sharedbuffers are used to read and write to the same file. Instead of reading the entire line all at once, it reads the data one byte at a time as a binary stream using the recfm = n option. The code uses the logical NOT operator (^) to toggle the value (from true to false or vice versa) when finding the first and next double quote to determine when to convert a CR or LF character to a blank. This technique limits the removal of CRLF characters to only those that occur inside of double quotes and not the true end of line.

%macro removeQuotedCRLF( filename = ) ;
  %if not %sysfunc( fileexist( "&filename." ) ) %then %do ;
    %put %str(E)RROR: The file: &filename was not found. ;
    %return ;
  %end ;

  data _null_ ;
    infile "&filename." recfm = n sharebuffers ;
    file "&filename." recfm = n ;
    retain opendoublequote 0 ;
    input byte $char1. ;

    if byte = '"' then opendoublequote = ^( opendoublequote ) ;
    if opendoublequote then do ;
           if byte = '0D'x then put " " ;  /* CR */
      else if byte = '0A'x then put " " ;  /* LF */
    end ;
  run ;
%mend ;

%removeQuotedCRLF( filename = &fname. )

data test2 ;
  infile "&fname." dsd truncover firstobs = 2 ;
  input  col1 : $16. col2 : $16. ;
run ;