SyntaxHighlighter

Wednesday, November 9, 2016

options printerpath =

I was doing a relatively simple ODS PDF proc report using options = landscape; and using ^{thispage} of ^{lastpage}. Here is a snippet of the original code:

ods listing close ;
options orientation = landscape nodate nonumber ;

ods escapechar = '^' ;
ods pdf file = "c:\temp\x.pdf" notoc  ;
 
title f = arial "My Title" ;
footnote color = black h = 7pt 
  j = l 'Page ^{thispage} of ^{lastpage}'
  j = r "Produced: %sysfunc(date(), mmddyy10.)" ;

proc report data = report nowd 
  style( report header column ) = [ font_size = 8pt fontfamily = arial ]
  style( column ) = [ cellheight = .18in ] ;

  columns ....

I was told that "there are certain situations where SAS changes the default values for the system options LINESIZE and PAGESIZE and then persists these changes from one SAS session to the next. If SAS sets the LINESIZE and PAGESIZE values too small, the following will show up in the log:"

ERROR: Current settings for paper size, orientation and margins
       imply that the printable area has zero size.  Change one
       or more of those settings and try again.
ERROR: Internal error trying to set up "page i of n"
       processing.  Could be caused by out of memory, out of
       disk space, or internal coding error. Check for other
       error messages above indicating probable user error or
       resource constraints; if there are none, or if they make
       no sense, please report this message and provide a test
       program to SAS Institute. Output will continue without
       PAGEOF support.

Martin from SAS technical support suggested I try this usage note: http://support.sas.com/kb/41/196.html or in the alternative SAS Research and Development suggested I add the following line to my code which worked like a charm.

options printerpath=PDF; 

Friday, July 15, 2016

ValidLibref Macro

Let's say you have a fairly long SAS program and near the top you assign a libref but keyed in the folder incorrectly.

It becomes frustrating to wait for the program to finish only to find out you have a bad library reference or libref. As a result, I wrote the short validlibref() macro that tests for a valid libref ( expecting a zero value ). If the libref is not valid then an ERROR message is written to the SAS log and the %abort cancel statement will stop the program from continuing but will not shut down SAS in interactive mode.

%macro validlibref( libref ) ;
  %if %sysfunc( libref( &libref. ) ) ne 0 %then %do ;
    %put %str(E)RROR: Invalid libref: &libref., terminating program. ;
    %abort cancel ;
  %end ;
%mend ;

%validlibref( sashlp )  /* used the invalid libref of SASHLP vs SASHELP */

/* these statements will not execute if there is not a valid libref */
data x ;
  x = 1 ;
run ;

46   %macro validlibref( libref ) ;
47     %if %sysfunc( libref( &libref. ) ) ne 0 %then %do ;
48       %put %str(E)RROR: Invalid libref: &libref., terminating program. ;
49       %abort cancel ;
50     %end ;
51   %mend ;
52
53   %validlibref( sashlp )
ERROR: Invalid libref: sashlp, terminating program.
ERROR: Execution canceled by an %ABORT CANCEL statement.
NOTE: The SAS System stopped processing due to receiving a CANCEL request.

Wednesday, June 8, 2016

Rules and Frame Style Attributes Visualized

Style attributes can be used to customize the appearance of SAS Output Delivery System (ODS) output. Styles can be assigned using proc template or style= options in procedures such as PROC REPORT displayed below.

Rules= control the internal borders while frame= control external borders. I have always found it difficult to visualize the combinations of rules= and frame= style attributes. As a result, I created the below code to display all of the combinations in a concise five page PDF file.

ods listing close ;
ods pdf file = "c:\temp\rulesandframes.pdf" startpage = no  ;
options nodate nonumber ;
ods escapechar = "~" noptitle ;

%macro rulesandframes ;
  %local 
    frames 
    frame 
    countframes 
    rules 
    rule 
    countrules 
    i 
    j ;

  %let frames = ABOVE BELOW BOX HSIDES LHS RHS VSIDES VOID ;
  %let countframes = %sysfunc( countw( &frames. ) ) ;
  %let rules = ALL COLS GROUPS NONE ROWS ;
  %let countrules  = %sysfunc( countw( &rules. ) ) ;

  %do i = 1 %to &countrules. ;
    %let rule = %scan( &rules., &i. ) ;
    title "RULES = &rule. ( internal borders )" ;

    %do j = 1 %to &countframes. ; 
      %let frame = %scan( &frames., &j. ) ;
      ods text = 
       "~{style[just=center]style(report) = [rules = &rule. frame = &frame.]}" ;
      proc report 
        nowd 
        data = sashelp.class( obs = 2 ) 
        style( report ) = [ rules = &rule. frame = &frame. ] ;
      run ;
    %end ;

    ods pdf startpage = now ;
  %end ;

%mend ;
%rulesandframes

ods pdf close ;
ods listing ;

Saturday, May 28, 2016

Compress Non-Printable Special Characters

Have you ever been bitten by flies that you feel but can't see? Those flies are often referred to as no-see-ums.

You can also get stung by non-printable and special characters (NPSC) in computer programming. Non-printable characters are those that have an ASCII decimal value of between 0 and 31 as well as 127 (DEL character) in the standard ASCII character set. Special characters are the high level ASCII decimal codes from 128 thru 255 inclusive.

Those pesky invisible characters can wreak havoc in the results of a process (see this paper for additional information). What I found out was that the COMPRESS() function with the KW (keep writable) modifier did not work in my case. I had the ASCII decimal 160 character (AO in hex) as the first character in a variable.

As a result, I wrote a function using PROC FCMP to eradicate those annoying characters and replace them with nulls. Below is the source code and a test program to remove the no-see-ums.

Be sure to assign a LENGTH to the returned variable as the function will otherwise return a string that is 32,767 characters in length. The default return length for proc fcmp is 33 characters so I set it to the maximum width to avoid truncation.

proc fcmp outlib = work.funcs.utilities ; 
  function compress_npsc( var $ ) $32767 ; 
    length npschars $256 ; 
    do i = 0 to 31, 127 to 255 ; 
      npschars = cats( npschars, byte( i ) ) ; 
    end ; 
    return( compress( var, npschars ) ) ; 
  endsub ; 
run ; 

options cmplib = ( work.funcs ) ; 

data test ; 
  length sc cc $4 ;
  sc = cat( byte( 160 ), 'ABC' ) ; 
  cc = compress_npsc( sc ) ; 
  rank_sc = rank( sc ) ;
  rank_cc = rank( cc ) ;
  hex_sc = put( sc, hex8. ) ;
  hex_cc = put( cc, hex8. ) ;
run ;

Friday, April 22, 2016

SQL Pass-Through Derived Table Join to SAS dataset

I have used the SAS/Access Interface to OLE DB for years. In the past I would capture data from a relational database management system (RDBMS) into a SAS data set and then join that remote data to a local SAS data set using another proc or data step.

Having to read the data more than once seemed inefficient, so I attempted to reference the pass-through data as a derived table (aka in-line view) and join that data directly to a SAS data set. Sure enough, as the below code using SQL Server Express shows this can be done.

In the simplified example below, a SAS data set named SASCAT is created containing the groupname and an abbreviated name (abbrname). After that, an OLEDB connection is made to the freely downloadable SQL Server Express database.

The derived table consists of the SQL statements contained between the parenthesis starting with the FROM CONNNECTION TO OLEDB keywords. That derived table is referenced as SQL which is then joined to the SAS data set sascat which is referenced as SAS and combined as an inner join using the ON clause.

%macro dbconnect( server= ) ;
  /* DSN-less OLEDB connection */
  connect to oledb
   (init="Provider=SQLNCLI11;Integrated Security=SSPI;Data Source=&server.") ;
%mend ;

%macro dbdisconnect() ;
  disconnect from oledb ;
  %if &sysdbrc ne 0 %then %put %str(E)RROR: %superq( sysdbmsg ) ;
%mend ;

data sascat ;
  input 
    groupname & $50.
    abbrname  : $4. ;
  datalines ;
  Executive General and Administration  Exec
  Inventory Management  Imgt
  Manufacturing  Mfg
  Quality Assurance  QA
  ;
run ;

proc sql ;
  %dbconnect( server = localhost\SQLExpress ) 
    create table results as
      select        sas.abbrname
                  , sql.count

      from          connection to oledb
      (
        select      groupname
                  , count( groupname ) as count
        from        adventureworks2014.humanresources.department 
        group by    groupname
        ;
      ) sql

      join          work.sascat sas
        on          sql.groupname = sas.groupname 
    ;
  %dbdisconnect()
quit ;

Saturday, April 9, 2016

Remove CRLF inside Double Quotes

Have you ever encountered a carriage return (CR) or line feed (LF) character inside of double quotes in a comma separated values (CSV) file? As soon as SAS sees either of these characters it interprets it as the end of line and will read the rest of the line as the next line.

If you run the sample file created below and open it in an editor such as NotePad++ you can see the offending character by clicking on the "Show All Characters" button.

%let fname = c:\temp\test.csv ;

/* Create sample CSV file with LF inside of double quotes */
data _null_ ;
  file "&fname." ;
  /* byte(10) is a line feed (LF) same as '0A'x */
   x = cat( '"All Good"', ',"Line ', byte( 10 ), ' Feed"'   ) ;
   put "Column1, Column2" ;
   put x ;
run ;


data test1 ;
  infile "&fname." dsd truncover firstobs = 2 ;
  input  col1 : $16. col2 : $16. ;
run ;

The below macro was written based on this SAS tech support entry. Sharedbuffers are used to read and write to the same file. Instead of reading the entire line all at once, it reads the data one byte at a time as a binary stream using the recfm = n option. The code uses the logical NOT operator (^) to toggle the value (from true to false or vice versa) when finding the first and next double quote to determine when to convert a CR or LF character to a blank. This technique limits the removal of CRLF characters to only those that occur inside of double quotes and not the true end of line.

%macro removeQuotedCRLF( filename = ) ;
  %if not %sysfunc( fileexist( "&filename." ) ) %then %do ;
    %put %str(E)RROR: The file: &filename was not found. ;
    %return ;
  %end ;

  data _null_ ;
    infile "&filename." recfm = n sharebuffers ;
    file "&filename." recfm = n ;
               
    retain opendoublequote 0 ;
    input byte $char1. ;

    if byte = '"' then opendoublequote = ^( opendoublequote ) ;
    if opendoublequote then do ;
           if byte = '0D'x then put " " ;  /* CR */
      else if byte = '0A'x then put " " ;  /* LF */
    end ;
  run ;
%mend ;

%removeQuotedCRLF( filename = &fname. )

data test2 ;
  infile "&fname." dsd truncover firstobs = 2 ;
  input  col1 : $16. col2 : $16. ;
run ;

Sunday, December 13, 2015

Parsing Multivalued columns using MOD function

One of the tenets of first normal form database design is to only allow atomic values in a column. However, that is not always the case in the real world.

I recently encountered a multi-valued column on a project and instead of writing a bunch of OR and SUBSTR(right of =) functions, I saw a pattern and used the FIND() [CHARINDEX() in SQL Server] and MOD() [% in SQL Server] functions. The use of the FIND() and MOD() functions was not only easier to maintain (less lines of code) but ran about twice as fast.

The below code emulates a multi-valued column populated via the SUBSTR(left of) to stuff the 'XY' values in the multi-valued column. The requirement is to find the value 'XY' only on an odd numbered positions (e.g. 1, 3, 5, etc). The FIND() function returns the starting point of the searched string and then the MOD function is used to determine if that position is an odd or even starting location. Modulo finds the remainder after division of one number by another. From Wikipedia:

Given two positive numbers, a (the dividend) and n (the divisor), a modulo n (abbreviated as a mod n) is the remainder of the Euclidean division of a by n. For instance, the expression "5 mod 2" would evaluate to 1 because 5 divided by 2 leaves a quotient of 2 and a remainder of 1, while "9 mod 3" would evaluate to 0 because the division of 9 by 3 has a quotient of 3 and leaves a remainder of 0; there is nothing to subtract from 9 after multiplying 3 times 3.

In order to really see the performance differences in these approaches, I wrote a macro to create a million row SAS data set. After that, the two techniques were bench-marked. The FIND()/MOD() technique took 0.35 seconds vs 0.68 seconds using OR and SUBSTR().

data test( keep = row codes ) ; length row 3 codes $32 ; do row = 1 to 31 ; codes = '1234567890ABCDEFGHIJKLMNOPQRSTUV'; substr( codes, row, 2 ) = 'XY' ; output ; end ; run ; %macro loopdata ; %local setnotes loopdataset ; %let setnotes = %sysfunc( getoption( notes ) ) ; %let loopdataset = %sysfunc(ceil( %sysevalf( 1000000 / 31 ) ) ) ; options nonotes ; /* turn off notes to the SAS log */ data testlarge ; set /* create a millon row data set */ %do i = 1 %to &loopdataset. ; test %end ; ; run ; options &setnotes ; /* reset NOTES to previous value */ %mend ; %loopdata proc sql stimer ; create table sqlmod as select row , codes , case when mod( find( codes, 'XY', 'i' ), 2 ) then 1 else 0 end as modulo from testlarge ; create table sqlsubstr as select row , codes , case when substr( codes, 1, 2 ) = 'XY' or substr( codes, 3, 2 ) = 'XY' or substr( codes, 5, 2 ) = 'XY' or substr( codes, 7, 2 ) = 'XY' or substr( codes, 9, 2 ) = 'XY' or substr( codes, 11, 2 ) = 'XY' or substr( codes, 13, 2 ) = 'XY' or substr( codes, 15, 2 ) = 'XY' or substr( codes, 17, 2 ) = 'XY' or substr( codes, 19, 2 ) = 'XY' or substr( codes, 21, 2 ) = 'XY' or substr( codes, 23, 2 ) = 'XY' or substr( codes, 25, 2 ) = 'XY' or substr( codes, 27, 2 ) = 'XY' or substr( codes, 29, 2 ) = 'XY' or substr( codes, 31, 2 ) = 'XY' then 1 else 0 end as modulo from testlarge ; quit ;