SyntaxHighlighter

Wednesday, November 20, 2013

The Need for Speed - %SCAN loop

Have you ever had to write a macro to parse words in a string using the %SCAN function and wondered what is the best, most efficient technique? The below code compares the time it takes to do this using three different techniques.

In my test, I repeated the 26 letters of the alphabet 10 times so that a total of 260 scans are performed as a benchmark. Review the below code and then see the bar chart of the average time it took after five attempts of each macro. It is clearly best to use the COUNTW() function once to detemine the number of times to loop.

data _null_ ;
  length string $1028 ;
  do i = 1 to 10 ;
    do j = 65 to 90 ;
      string = catx( " ", string, byte( j ) ) ;
     end ;
  end ;
  call symput( 'macvar', string ) ;
run ;

%let starttime = %sysfunc( datetime() ) ;
%macro good ;
  %local i retval ;
  %let i = 1 ;
  %do %while( %scan( &macvar., &i. ) ne ) ;
    %let i = %eval( &i + 1 ) ;
    %let retval = %scan( &macvar., &i. ) ;
  %end ;
%mend ;
%put %sysevalf( %sysfunc(datetime()) - &starttime. ) ;

%let starttime = %sysfunc( datetime() ) ;
%macro better ;
  %local i retval ;
  %let i = 1 ;
  %let retval = %scan( &macvar., &i. ) ;
  %do %while( &retval. ne ) ;
    %let i = %eval( &i. + 1 ) ;
    %let retval = %scan( &macvar., &i. ) ;
  %end ;
%mend ;
%better
%put %sysevalf( %sysfunc(datetime()) - &starttime. ) ;

%let starttime = %sysfunc( datetime() ) ;
%macro best ;
  %local i retval ;
  %let totalwords = %sysfunc( countw( &macvar ) ) ;
  %do i = 1 %to &totalwords ;
    %let retval = %scan( &macvar., &i. ) ;
  %end ;
%mend ;
%best
%put %sysevalf( %sysfunc(datetime()) - &starttime. ) ;

Click graph for a sharper image.

Tuesday, October 1, 2013

Last Business Friday of Month

The %lastBusinessFridayOfMonth macro (see below) can be used to produce a SAS data set containing the last business Friday of all months between a user supplied start and stop year. This code leverages the %holidays macro that can be found in one of my previous blog entries here.

I had a request to treat the Friday after Thanksgiving as a holiday. Data step code was added after the %holidays macro call to support this requirement. The holiday dates get loaded into a SAS hash object so that its .check() method can be used to determine if the last Friday of the month is a holiday.

The new in SAS 9.2 nwkdom() function was used to obtain the last Friday of each month. The nwkdom() function much easier to use than the intnx() function that would have been used prior to the release of SAS 9.2. If the last Friday is a holiday, that date is pushed back a week and tested again within the do while( holidays.check() = 0 ) loop.

A good example of a two consecutive Friday holdidays is December 31, 2010. Going back a week from December 31 puts us on December 24 or Christmas day so the last business Friday for December 2010 happens to be Decmber 17.

/**************************************************************************
*     Program: lastBusinessFridayOfMonth.sas
*      Author: Tom Bellmer
*     Created: 01OCT2013  
*     Purpose: determine last business friday of each month taking into 
*              consideration holidays.  If last friday is a holiday go 
*              back a week and try again using a hash object of holidays.
*       Usage: %lastBusinessFridayOfMonth( startyear=2000, stopyear=2100 )
***************************************************************************/

%macro lastBusinessFridayOfMonth( startyear =, stopyear = ) ;
  %holidays
    (   startyear = &startyear.
      , stopyear  = &stopyear.
      , outdsn    = work.holidays
      , view      = n 
    )

  data holidays ;
    set holidays( keep = date ) ;
    if month( date ) = 11 and day( date ) >= 20 then do ;
      date + 1 ; /* move Thanksgiving to a Friday */
      output ;
    end ;
    else output ;
  run ;

  data lastBusinessFridayOfMonth( keep = date ) ;
    if 0 then set work.holidays ;
    dcl hash holidays( dataset: 'work.holidays' ) ;
    holidays.definekey( 'date' ) ;
    holidays.definedone() ;

    do year = &startyear. to &stopyear. ;
      do month = 1 to 12 ;
        date = nwkdom( 5, 6, month, year ) ; *-- get last Friday of month ;
        do while( holidays.check() = 0 ) ;
          date = date - 7 ; *-- holiday found, go back a week ;
        end ;
        output ;
      end ;
    end ;
  run ;

%mend ;

/*EOF: lastBusinessFridayOfMonth.sas */

Tuesday, August 27, 2013

Getting around SQL reserved words

PROC SQL has some ANSI standard reserved words such as USER that cannot be used without renaming it or using the DQUOTE = ANSI option according to this web site

The first SQL query below fails because "the USER keyword will have a constant value for all rows in a query".  The second query works but requires the DQUOTE = ANSI option and you must double quote the reserved word.

I did not want to quote or rename a column. Through some trial and error I figured out that using a fully qualified table.column name was the solution that worked best for me.

data x ;
  do user = 'Billy', 'Joe', 'Bob' ;
    do i = 1 to 10 ;
      output ;
    end ;
  end ;
run ;

proc sql    ;
  create table x1 as
    select       user
               , count(*) as cnt
      from       x
      group by   user ;

  reset dquote = ansi ;
  create table x2 as
    select       "user"
               , count(*) as cnt
      from       x 
      group by   "user" ;

  create table x3 as
    select       x.user
               , count(*) as cnt
      from       x
      group by   x.user ;
quit ;

Tuesday, June 25, 2013

SYSPARM Name=Value Pairs

The -SYSPARM option adds flexibility when invoking a SAS program in batch mode. According to SAS documentation, the quoted string can be up to 200 characters in length and can obtained by using the &sysparm macro variable, or sysparm() function.

The below macro was created to parse comma seperated name=value pairs passed to SAS via a SYSPARM string such as the following batch job example:

"sas.exe" -sysin "myprog.sas" -sysparm "mode=dev,date=20130625"

%macro sysparmpairs() ;
  %local i pair name ;

    %let i = 1 ;
    %do %while( %scan( %bquote( &sysparm. ), &i  ) ne %str() ) ;
      %let pair = %scan( %bquote(&sysparm.), &i  ) ;
      %let name = %scan(&pair,1,=) ;
      %global &name ;
      
      %let &name = %scan( &pair, 2, = ) ;
      %let i = %eval( &i + 1 ) ;
    %end ;
%mend ;

%sysparmpairs()

%put _user_ ;

GLOBAL DATE 20130625
GLOBAL MODE dev

Friday, May 24, 2013

GetAttrn Macro

SAS macros can be written as functions that return a value. To write a macro as a function, you can only use macro language statements and all macro variables must be local. The %sysfunc() macro function is used extensively to extend what can otherwise be achieved. The line containing the macro value being returned should not include a semicolon.

Extracting attributes such the number of rows (nobs) or columns (nvars) from a SAS data set is a very common request. Those values and many more can be obtained by using the attrn() function but that requires the use of open() and close() functions.

Therefore wrapping the attrn() function inside of a SAS macro function makes a lot of sense. See the below code used to create the GetAttrn() macro function.

%macro getattrn( dsn =, attr = nobs ) ;
  %local dsid retval ;
  %let retval = . ;

  %let dsid = %sysfunc( open( &dsn. ) ) ;
  %if &dsid. %then %do ;
    %let retval = %sysfunc( attrn( &dsid., &attr. ) ) ;
    %let dsid   = %sysfunc( close( &dsid. ) ) ;
  %end ;

  &retval. 
%mend ;

Because macro processor events take place prior to program compilation you can take advantage of this knowledge and assign a dimension to an array within a data step.

data attribs( drop = dsid ) ;
  modified = put( %getattrn( dsn = sashelp.class, attr = modte ), datetime. ) ;
  obs      = %getattrn( dsn = sashelp.class ) ;
  vars     = %getattrn( dsn = sashelp.class, attr = nvars ) ;

  array atype[ %getattrn( dsn = sashelp.class, attr= nvar ) ] $1 ;

  dsid = open( 'sashelp.class' ) ;
  if dsid then do ;
    do _n_ = 1 to dim( atype ) ;
      atype[ _n_ ] = vartype( dsid, _n_ ) ;
    end ;
  end ;
run ; 

Friday, May 17, 2013

Holidays Macro

There are times when you need to account for holidays when performing date calculations. The below holidays macro creates a data set of holidays for the specified year range. It makes adjustments for holidays that fall on a weekend such as Independence Day in 2015 which is adjusted to July 3, 2015.


/**************************************************************************
*     Program: holidays.sas
*      Author: Tom Bellmer
*     Created: 17MAY2013  
*     Purpose: create a list of holidays for the specified years
*       Usage: %holidays( startyear = 2013, stopyear = 2014 )
*       Notes: use with a hash object to determine holiday adjustments
**************************************************************************/

%macro holidays
  ( 
      startyear = 2010
    , stopyear  = 2050
    , outdsn    = work.holidays
    , view      = y
  ) ;

  data &outdsn.  
    %if %lowcase( %substr( &view, 1, 1 ) ) = y %then / view = &outdsn. ;
  ;
    attrib 
      year    length   =   3  label = 'Year'
      date    length   =   4  label = 'Date'        format = date9.
      dayofweek length = $16  label = 'Day of Week'
      holiday length   = $32  label = 'Holiday'
    ;

    array aholidays[ 10, 2 ] $32 _temporary_ 
      ( 
          "NewYear",        "New Year's Day"
        , "MLK",            "Martin L King, Jr. birthday"
        , "USPresidents",   "President's birthdays"
        , "Memorial",       "Memorial Day"
        , "USIndependence", "Independence Day"
        , "Labor",          "Labor Day"
        , "Columbus",       "Columbus Day"
        , "VeteransUSG",    "Veterans Day"
        , "Thanksgiving",   "Thanksgiving Day"
        , "Christmas",      "Christmas"
      ) 
    ;

    do year = &startyear. to &stopyear. ;

      do _n_ = 1 to dim( aholidays, 1 ) ;
        date = holiday( aholidays[ _n_, 1 ], year ) ;
        /* adjust date forward on Sunday or back on Saturday */
        date = intnx( 'day', date
                 , choosen( weekday( date ), 1, 0, 0, 0, 0, 0, -1 ) ) ;
        dayofweek = put( date, downame.-l ) ;
        holiday = aholidays[ _n_, 2 ] ;
        output ;
      end ;

    end ;

  run ;

%mend ;

/* EOF: holidays.sas  */

Wednesday, April 17, 2013

STREXTRACT custom FCMP function

There are 906 functions supplied by SAS in release 9.2 TS2M3. You can verify this by using the SAS dictionary.functions table or sashelp.vfunc view.

Unfortunately, none of the standalone SAS supplied functions did what I desired when I needed to extract values between some nodes in XML. The good news is that SAS offers PROC FCMP so you can create your own custom functions.

The custom STREXTRACT function accepts as a parameter an input string that can be up to 32,766 characters in length. There is an issue in SAS release 9.2 that has been corrected in 9.3 that does not allow the standard SAS maximum value of 32,767 in proc fcmp when concatonating strings ( http://support.sas.com/kb/41/912.html ).

After the input string, supply the begin and end delimiters. Those delimiters are usually different in XML and HTML where a slash (/) signifies a closing tag.

The fourth and final parameter (occurance) can accept a postive integer value to extract a single node value or you can use a zero (0) to concatonate and return all the node values.

/**************************************************************************
*    Function: strextract
*      Author: Tom Bellmer
*     Created: 17APR2013  
* SAS Release: 9.2 TS2M3
*     Purpose: extracts strings between delimiters
*       Usage: x = strextract( str, '', '', 0 ) ;
**************************************************************************/

proc fcmp outlib = work.mysub.temp ;
  function strextract( string $, begindelimiter $, enddelimiter $, occurance ) $ ;
    length retval tempstr $32766 ;

    if occurance = 0 then upperlimit = count( string, begindelimiter ) ;
    else upperlimit = occurance ;

    startposition = 1 ;
    do i = 1 to upperlimit ;
      findstart     = find( string, begindelimiter, startposition ) ;
      findend       = find( string, enddelimiter,   findstart + 1 ) ;
      startposition = findend ;
      tempstr       = substr( string, findstart + lengthn( begindelimiter )
                        , findend - findstart - lengthn( begindelimiter ) ) ;
      if occurance = 0 then retval = catx( " ", retval, tempstr ) ;
    end ;
    if occurance ne 0 then retval = tempstr ;

    return( retval ) ;
  endsub ;
quit  ;

options cmplib = work.mysub ;
data demo( drop = xml str ) ;
  xml  = "< xml>< node>One< /node>< node>Two< /node>< node>Three< /node>< /xml>" ;
  str  = "|One|Two|Three|" ;
  xml0 = strextract( xml, '< node>', '< /node>', 0 ) ;
  str0 = strextract( str, '|', '|', 0 ) ;

  array axml[ 3 ] $8 ;
  array astr[ 3 ] $8 ;
  do _n_ = 1 to 3 ;
    axml[ _n_ ] = strextract( xml, '< node>', '< /node>', _n_ ) ;
    astr[ _n_ ] = strextract( str, '|', '|', _n_ ) ;
  end ; 
run ;

Tuesday, April 2, 2013

Truncated Record

You can do some pretty nice things using ODS PDF LAYOUT and its ODS TEXT= commands. In my case I was writing some long strings used as paragraphs that did automatic word wrapping when rendered. This all worked great in interactive mode, however, the same could not be said when I took the exact same code and tried to run it as a scheduled task.

The error I would get is TRUNCATED RECORD when running the code in batch. The solution is to have the Task Scheduler call a SAS program that does a %INCLUDE of the original code and contains a LRECL (logical record length) that is greater than the default of 256 bytes. So in the end my batch SAS call looks like this:

%include "e:\mypath\originalcode.sas" / source2 lrecl = 32767 ;

Problem solved! The source2 option writes the original included code to the SAS log and I used the maximum record length allowed of 32767.

Tuesday, January 22, 2013

Multi Server Join using SAS/Access ODBC

It is always best to do most of the processing on a server and only return the filtered results back to the requesting client. In this case I need two columns (state and county) otherwise I might have simply created a macro variable if only a single column were required.

The LIBNAME statement is used to bulk copy (BCP) 8 rows from DEV into the global (##) temp table. Those 8 rows are then joined to an 8 million row table and the entire process takes less than one second.

/* store data into a global (##) temporary table on SQL Server */
libname sqltemp 
  odbc 
  noprompt   = "Driver={SQL Server};Server=devserver;Trusted_Connection=Yes;" 
  bcp        = yes   
  connection = global ;

proc sql ;
  connect to odbc as prod 
    ( "Driver={SQL Server};Server=prodserver;Trusted_Connection=Yes;" ) ;

    /* populate data from prod into a dev SQL Server global temporary table */
    create table sqltemp.'##counties'n as
      select           *
        from           connection to prod
        ( select       *
            from       db.schema.table
            where          statecode  = &state. 
                       and countycode = &county. ) ;
  disconnect from prod ;

  connect to odbc as dev 
    ( "Driver={SQL Server};Server=devserver;Trusted_Connection=Yes;"  ) ;

    create table work.results as
      select            *
        from   connection to dev
        ( select         a.*
            from         db.schema.table a
            inner join   ##counties b        /* SQL Server temp table */
              on             a.state  = b.state
                         and a.county = b.county ) ;
  disconnect from dev ;
quit ;

libname sqltemp clear ;