Pages

SyntaxHighlighter

Wednesday, June 8, 2016

Rules and Frame Style Attributes Visualized

Style attributes can be used to customize the appearance of SAS Output Delivery System (ODS) output. Styles can be assigned using proc template or style= options in procedures such as PROC REPORT displayed below.

Rules= control the internal borders while frame= control external borders. I have always found it difficult to visualize the combinations of rules= and frame= style attributes. As a result, I created the below code to display all of the combinations in a concise five page PDF file.

ods listing close ;
ods pdf file = "c:\temp\rulesandframes.pdf" startpage = no  ;
options nodate nonumber ;
ods escapechar = "~" noptitle ;

%macro rulesandframes ;
  %local 
    frames 
    frame 
    countframes 
    rules 
    rule 
    countrules 
    i 
    j ;

  %let frames = ABOVE BELOW BOX HSIDES LHS RHS VSIDES VOID ;
  %let countframes = %sysfunc( countw( &frames. ) ) ;
  %let rules = ALL COLS GROUPS NONE ROWS ;
  %let countrules  = %sysfunc( countw( &rules. ) ) ;

  %do i = 1 %to &countrules. ;
    %let rule = %scan( &rules., &i. ) ;
    title "RULES = &rule. ( internal borders )" ;

    %do j = 1 %to &countframes. ; 
      %let frame = %scan( &frames., &j. ) ;
      ods text = 
       "~{style[just=center]style(report) = [rules = &rule. frame = &frame.]}" ;
      proc report 
        nowd 
        data = sashelp.class( obs = 2 ) 
        style( report ) = [ rules = &rule. frame = &frame. ] ;
      run ;
    %end ;

    ods pdf startpage = now ;
  %end ;

%mend ;
%rulesandframes

ods pdf close ;
ods listing ;

Saturday, May 28, 2016

Compress Non-Printable Special Characters

Have you ever been bitten by flies that you feel but can't see? Those flies are often referred to as no-see-ums.

You can also get stung by non-printable and special characters (NPSC) in computer programming. Non-printable characters are those that have an ASCII decimal value of between 0 and 31 as well as 127 (DEL character) in the standard ASCII character set. Special characters are the high level ASCII decimal codes from 128 thru 255 inclusive.

Those pesky invisible characters can wreak havoc in the results of a process (see this paper for additional information). What I found out was that the COMPRESS() function with the KW (keep writable) modifier did not work in my case. I had the ASCII decimal 160 character (AO in hex) as the first character in a variable.

As a result, I wrote a function using PROC FCMP to eradicate those annoying characters and replace them with nulls. Below is the source code and a test program to remove the no-see-ums.

Be sure to assign a LENGTH to the returned variable as the function will otherwise return a string that is 32,767 characters in length. The default return length for proc fcmp is 33 characters so I set it to the maximum width to avoid truncation.

proc fcmp outlib = work.funcs.utilities ; 
  function compress_npsc( var $ ) $32767 ; 
    length npschars $256 ; 
    do i = 0 to 31, 127 to 255 ; 
      npschars = cats( npschars, byte( i ) ) ; 
    end ; 
    return( compress( var, npschars ) ) ; 
  endsub ; 
run ; 

options cmplib = ( work.funcs ) ; 

data test ; 
  length sc cc $4 ;
  sc = cat( byte( 160 ), 'ABC' ) ; 
  cc = compress_npsc( sc ) ; 
  rank_sc = rank( sc ) ;
  rank_cc = rank( cc ) ;
  hex_sc = put( sc, hex8. ) ;
  hex_cc = put( cc, hex8. ) ;
run ;

Friday, April 22, 2016

SQL Pass-Through Derived Table Join to SAS dataset

I have used the SAS/Access Interface to OLE DB for years. In the past I would capture data from a relational database management system (RDBMS) into a SAS data set and then join that remote data to a local SAS data set using another proc or data step.

Having to read the data more than once seemed inefficient, so I attempted to reference the pass-through data as a derived table (aka in-line view) and join that data directly to a SAS data set. Sure enough, as the below code using SQL Server Express shows this can be done.

In the simplified example below, a SAS data set named SASCAT is created containing the groupname and an abbreviated name (abbrname). After that, an OLEDB connection is made to the freely downloadable SQL Server Express database.

The derived table consists of the SQL statements contained between the parenthesis starting with the FROM CONNNECTION TO OLEDB keywords. That derived table is referenced as SQL which is then joined to the SAS data set sascat which is referenced as SAS and combined as an inner join using the ON clause.

%macro dbconnect( server= ) ;
  /* DSN-less OLEDB connection */
  connect to oledb
   (init="Provider=SQLNCLI11;Integrated Security=SSPI;Data Source=&server.") ;
%mend ;

%macro dbdisconnect() ;
  disconnect from oledb ;
  %if &sysdbrc ne 0 %then %put %str(E)RROR: %superq( sysdbmsg ) ;
%mend ;

data sascat ;
  input 
    groupname & $50.
    abbrname  : $4. ;
  datalines ;
  Executive General and Administration  Exec
  Inventory Management  Imgt
  Manufacturing  Mfg
  Quality Assurance  QA
  ;
run ;

proc sql ;
  %dbconnect( server = localhost\SQLExpress ) 
    create table results as
      select        sas.abbrname
                  , sql.count

      from          connection to oledb
      (
        select      groupname
                  , count( groupname ) as count
        from        adventureworks2014.humanresources.department 
        group by    groupname
        ;
      ) sql

      join          work.sascat sas
        on          sql.groupname = sas.groupname 
    ;
  %dbdisconnect()
quit ;

Saturday, April 9, 2016

Remove CRLF inside Double Quotes

Have you ever encountered a carriage return (CR) or line feed (LF) character inside of double quotes in a comma separated values (CSV) file? As soon as SAS sees either of these characters it interprets it as the end of line and will read the rest of the line as the next line.

If you run the sample file created below and open it in an editor such as NotePad++ you can see the offending character by clicking on the "Show All Characters" button.

%let fname = c:\temp\test.csv ;

/* Create sample CSV file with LF inside of double quotes */
data _null_ ;
  file "&fname." ;
  /* byte(10) is a line feed (LF) same as '0A'x */
   x = cat( '"All Good"', ',"Line ', byte( 10 ), ' Feed"'   ) ;
   put "Column1, Column2" ;
   put x ;
run ;


data test1 ;
  infile "&fname." dsd truncover firstobs = 2 ;
  input  col1 : $16. col2 : $16. ;
run ;

The below macro was written based on this SAS tech support entry. Sharedbuffers are used to read and write to the same file. Instead of reading the entire line all at once, it reads the data one byte at a time as a binary stream using the recfm = n option. The code uses the logical NOT operator (^) to toggle the value (from true to false or vice versa) when finding the first and next double quote to determine when to convert a CR or LF character to a blank. This technique limits the removal of CRLF characters to only those that occur inside of double quotes and not the true end of line.

%macro removeQuotedCRLF( filename = ) ;
  %if not %sysfunc( fileexist( "&filename." ) ) %then %do ;
    %put %str(E)RROR: The file: &filename was not found. ;
    %return ;
  %end ;

  data _null_ ;
    infile "&filename." recfm = n sharebuffers ;
    file "&filename." recfm = n ;
               
    retain opendoublequote 0 ;
    input byte $char1. ;

    if byte = '"' then opendoublequote = ^( opendoublequote ) ;
    if opendoublequote then do ;
           if byte = '0D'x then put " " ;  /* CR */
      else if byte = '0A'x then put " " ;  /* LF */
    end ;
  run ;
%mend ;

%removeQuotedCRLF( filename = &fname. )

data test2 ;
  infile "&fname." dsd truncover firstobs = 2 ;
  input  col1 : $16. col2 : $16. ;
run ;

Sunday, December 13, 2015

Parsing Multivalued columns using MOD function

One of the tenets of first normal form database design is to only allow atomic values in a column. However, that is not always the case in the real world.

I recently encountered a multi-valued column on a project and instead of writing a bunch of OR and SUBSTR(right of =) functions, I saw a pattern and used the FIND() [CHARINDEX() in SQL Server] and MOD() [% in SQL Server] functions. The use of the FIND() and MOD() functions was not only easier to maintain (less lines of code) but ran about twice as fast.

The below code emulates a multi-valued column populated via the SUBSTR(left of) to stuff the 'XY' values in the multi-valued column. The requirement is to find the value 'XY' only on an odd numbered positions (e.g. 1, 3, 5, etc). The FIND() function returns the starting point of the searched string and then the MOD function is used to determine if that position is an odd or even starting location. Modulo finds the remainder after division of one number by another. From Wikipedia:

Given two positive numbers, a (the dividend) and n (the divisor), a modulo n (abbreviated as a mod n) is the remainder of the Euclidean division of a by n. For instance, the expression "5 mod 2" would evaluate to 1 because 5 divided by 2 leaves a quotient of 2 and a remainder of 1, while "9 mod 3" would evaluate to 0 because the division of 9 by 3 has a quotient of 3 and leaves a remainder of 0; there is nothing to subtract from 9 after multiplying 3 times 3.

In order to really see the performance differences in these approaches, I wrote a macro to create a million row SAS data set. After that, the two techniques were bench-marked. The FIND()/MOD() technique took 0.35 seconds vs 0.68 seconds using OR and SUBSTR().

data test( keep = row codes ) ; length row 3 codes $32 ; do row = 1 to 31 ; codes = '1234567890ABCDEFGHIJKLMNOPQRSTUV'; substr( codes, row, 2 ) = 'XY' ; output ; end ; run ; %macro loopdata ; %local setnotes loopdataset ; %let setnotes = %sysfunc( getoption( notes ) ) ; %let loopdataset = %sysfunc(ceil( %sysevalf( 1000000 / 31 ) ) ) ; options nonotes ; /* turn off notes to the SAS log */ data testlarge ; set /* create a millon row data set */ %do i = 1 %to &loopdataset. ; test %end ; ; run ; options &setnotes ; /* reset NOTES to previous value */ %mend ; %loopdata proc sql stimer ; create table sqlmod as select row , codes , case when mod( find( codes, 'XY', 'i' ), 2 ) then 1 else 0 end as modulo from testlarge ; create table sqlsubstr as select row , codes , case when substr( codes, 1, 2 ) = 'XY' or substr( codes, 3, 2 ) = 'XY' or substr( codes, 5, 2 ) = 'XY' or substr( codes, 7, 2 ) = 'XY' or substr( codes, 9, 2 ) = 'XY' or substr( codes, 11, 2 ) = 'XY' or substr( codes, 13, 2 ) = 'XY' or substr( codes, 15, 2 ) = 'XY' or substr( codes, 17, 2 ) = 'XY' or substr( codes, 19, 2 ) = 'XY' or substr( codes, 21, 2 ) = 'XY' or substr( codes, 23, 2 ) = 'XY' or substr( codes, 25, 2 ) = 'XY' or substr( codes, 27, 2 ) = 'XY' or substr( codes, 29, 2 ) = 'XY' or substr( codes, 31, 2 ) = 'XY' then 1 else 0 end as modulo from testlarge ; quit ;

Monday, November 16, 2015

Full width title underline

SAS ODS styles provide opportunities to enhance the appearance of reports. To do this it is required to set an ODS escapechar to establish the inline formatting symbol. I like to use the tilde (~) symbol as in ODS escapechar = "~" ;.

Once the escapechar has been established, use it to set style properties using this syntax "~{style[ property = value ]text to display}" ;.

If you want to have a centered title with a thin red line underneath it spanning the full width of the page, use the width = 100% value (outputwidth= can also be used). The first example below uses the default width= value and is not what was desired. The second attempt works as expected.

options nodate nonumber ;
ods listing close ;
ods escapechar = "~" ;

ods pdf file = "c:\temp\partialline.pdf" ;
  title j = c "~{style[ borderbottmcolor = red 
                        borderbottomwidth = 0.5pt 
                        color = black]SASHELP.CLASS Report}" ;
  proc report data = sashelp.class( obs = 2 ) ;
  run ;
ods pdf close ;

ods pdf file = "c:\temp\fullline.pdf" ; title j = c "~{style[ borderbottmcolor = red borderbottomwidth = 0.5pt width = 100% color = black]SASHELP.CLASS Report}" ; proc report data = sashelp.class( obs = 2 ) ; run ; ods pdf close ;
ods listing ; UPDATE: 21Dec2015 - from SAS Tech support to create horizontal line across page in ODS PDF: ODS PDF TEXT = "~{style[ borderbottomcolor = red borderbottomwidth = 0.5pt width = 100%] }" ;

Thursday, September 3, 2015

dropColumns Macro

Sometimes you want to remove cluttered columns (i.e. columns where all rows have the same value) from a SAS data set and that is just what the below %dropColumns macro does. Back in 2012 I create a blog post, Drop Missing Columns that handled the removal of NULL values found in either character or numeric columns.

This macro is different in that it handles a single data type at a time, but supports both NULL as well as a fixed value passed as a parameter. So if the following parameters were used, the columns in red below will be removed. The notes in the SAS log that follow will inform you as to what transpired.

    %dropColumns( dsn=x, value=997, type = C )
  NOTE: Dropped 2 of 4 character columns containing "997" from x in 0.040 seconds.
  NOTE: Dropped columns: char1 char2
/**************************************************************************
*     Program: dropColumns.sas
*      Author: Tom Bellmer
*     Created: 03SEP2015 
*     Purpose: drop columns that have the same value in every row.
*       Usage: %dropcolumns( dsn=mydsn, value = 997, type = C )
*       Notes: Use value=, type=C or value=., type=N for NULLs
**************************************************************************/

%macro dropColumns( 
    dsn   =
  , value =
  , type  = C 
) ;

  %local lib mem varlist starttime droplist dropcount ;

  %if not %sysfunc( exist( &dsn. ) ) %then %do ;
    %put %str(E)RROR: The data set &dsn does not exist. ;
    %return ;
  %end ;

  %let type = %upcase( &type. ) ;
  %if &type ne C and &type. ne N %then %do ;
    %put %str(E)RROR: Invalid type = &type. ;
    %return ;
  %end ;

  %let starttime = %sysfunc( datetime() ) ;
  proc sql noprint ;
    %if %index( &dsn., . ) > 0 %then %do ;
      %let lib = %upcase( %scan( &dsn., 1 ) ) ;
      %let mem = %upcase( %scan( &dsn., 2 ) ) ;
    %end ;
    %else %do ;
      %let lib = WORK ;
      %let mem = %upcase( &dsn. ) ;
    %end ;
    select    name
      into    :varlist separated by ' ' 
      from    dictionary.columns
      where       libname = "&lib."
              and memname = "&mem." 
              and type    = ifc( "&type." = "C", "char", "num" ) ;
  quit ;

  %let type = %sysfunc( ifc( &type. = C, $, ) ) ;
  %if &type. = $ %then %let value = %sysfunc( quote( &value. ) ) ;

  data _null_  ;
    length droplist $32767 ;
    set &dsn. end = eof ;

    %if &sqlobs. > 0 %then %do ;
      array avars( * ) &type. &varlist. ;
      array adropvars ( &sqlobs. ) $1 _temporary_ ( &sqlobs. * 'Y' ) ;
      do i = 1 to &sqlobs. ;
        if avars( i ) ne &value. then adropvars( i ) = 'N' ;
      end ;

      if eof then do ;
        do i = 1 to &sqlobs. ;
          if adropvars( i ) = 'Y' then do ;
            dropcount + 1 ;
            droplist = catx( " ", droplist, vname( avars( i ) ) ) ;
          end ;
        end ;
      end ;

      call symputx( 'droplist', droplist, 'l' ) ;
      call symputx( 'dropcount', dropcount, 'l' ) ;
    %end ;
  run ;

  %if %symexist( dropcount )  %then %do ;
    data &dsn ;
      set &dsn( drop = &droplist ) ;
    run ;

    %put NOTE: Dropped &dropcount of %sysfunc( countw( &varlist. ) ) %sysfunc(ifc(&type=$,character,numeric)) columns containing &value. from &dsn. in %sysfunc( strip( %sysfunc( putn( %sysevalf( %sysfunc( datetime()) - &starttime.),9.3) ) ) ) seconds. ;
    %if &dropcount > 0 %then %do ;
      %put NOTE: Dropped columns: &droplist. ;
    %end ;
  %end ;
  %else %do ;
    %put NOTE: no columns were dropped. ;
  %end ;
%mend ;

/* EOF: dropColumns.sas  */