Pages

SyntaxHighlighter

Wednesday, April 17, 2013

STREXTRACT custom FCMP function

There are 906 functions supplied by SAS in release 9.2 TS2M3. You can verify this by using the SAS dictionary.functions table or sashelp.vfunc view.

Unfortunately, none of the standalone SAS supplied functions did what I desired when I needed to extract values between some nodes in XML. The good news is that SAS offers PROC FCMP so you can create your own custom functions.

The custom STREXTRACT function accepts as a parameter an input string that can be up to 32,766 characters in length. There is an issue in SAS release 9.2 that has been corrected in 9.3 that does not allow the standard SAS maximum value of 32,767 in proc fcmp when concatonating strings ( http://support.sas.com/kb/41/912.html ).

After the input string, supply the begin and end delimiters. Those delimiters are usually different in XML and HTML where a slash (/) signifies a closing tag.

The fourth and final parameter (occurance) can accept a postive integer value to extract a single node value or you can use a zero (0) to concatonate and return all the node values.

/**************************************************************************
*    Function: strextract
*      Author: Tom Bellmer
*     Created: 17APR2013  
* SAS Release: 9.2 TS2M3
*     Purpose: extracts strings between delimiters
*       Usage: x = strextract( str, '', '', 0 ) ;
**************************************************************************/

proc fcmp outlib = work.mysub.temp ;
  function strextract( string $, begindelimiter $, enddelimiter $, occurance ) $ ;
    length retval tempstr $32766 ;

    if occurance = 0 then upperlimit = count( string, begindelimiter ) ;
    else upperlimit = occurance ;

    startposition = 1 ;
    do i = 1 to upperlimit ;
      findstart     = find( string, begindelimiter, startposition ) ;
      findend       = find( string, enddelimiter,   findstart + 1 ) ;
      startposition = findend ;
      tempstr       = substr( string, findstart + lengthn( begindelimiter )
                        , findend - findstart - lengthn( begindelimiter ) ) ;
      if occurance = 0 then retval = catx( " ", retval, tempstr ) ;
    end ;
    if occurance ne 0 then retval = tempstr ;

    return( retval ) ;
  endsub ;
quit  ;

options cmplib = work.mysub ;
data demo( drop = xml str ) ;
  xml  = "< xml>< node>One< /node>< node>Two< /node>< node>Three< /node>< /xml>" ;
  str  = "|One|Two|Three|" ;
  xml0 = strextract( xml, '< node>', '< /node>', 0 ) ;
  str0 = strextract( str, '|', '|', 0 ) ;

  array axml[ 3 ] $8 ;
  array astr[ 3 ] $8 ;
  do _n_ = 1 to 3 ;
    axml[ _n_ ] = strextract( xml, '< node>', '< /node>', _n_ ) ;
    astr[ _n_ ] = strextract( str, '|', '|', _n_ ) ;
  end ; 
run ;

Tuesday, April 2, 2013

Truncated Record

You can do some pretty nice things using ODS PDF LAYOUT and its ODS TEXT= commands. In my case I was writing some long strings used as paragraphs that did automatic word wrapping when rendered. This all worked great in interactive mode, however, the same could not be said when I took the exact same code and tried to run it as a scheduled task.

The error I would get is TRUNCATED RECORD when running the code in batch. The solution is to have the Task Scheduler call a SAS program that does a %INCLUDE of the original code and contains a LRECL (logical record length) that is greater than the default of 256 bytes. So in the end my batch SAS call looks like this:

%include "e:\mypath\originalcode.sas" / source2 lrecl = 32767 ;

Problem solved! The source2 option writes the original included code to the SAS log and I used the maximum record length allowed of 32767.