Pages

SyntaxHighlighter

Wednesday, April 17, 2013

STREXTRACT custom FCMP function

There are 906 functions supplied by SAS in release 9.2 TS2M3. You can verify this by using the SAS dictionary.functions table or sashelp.vfunc view.

Unfortunately, none of the standalone SAS supplied functions did what I desired when I needed to extract values between some nodes in XML. The good news is that SAS offers PROC FCMP so you can create your own custom functions.

The custom STREXTRACT function accepts as a parameter an input string that can be up to 32,766 characters in length. There is an issue in SAS release 9.2 that has been corrected in 9.3 that does not allow the standard SAS maximum value of 32,767 in proc fcmp when concatonating strings ( http://support.sas.com/kb/41/912.html ).

After the input string, supply the begin and end delimiters. Those delimiters are usually different in XML and HTML where a slash (/) signifies a closing tag.

The fourth and final parameter (occurance) can accept a postive integer value to extract a single node value or you can use a zero (0) to concatonate and return all the node values.

/**************************************************************************
*    Function: strextract
*      Author: Tom Bellmer
*     Created: 17APR2013  
* SAS Release: 9.2 TS2M3
*     Purpose: extracts strings between delimiters
*       Usage: x = strextract( str, '', '', 0 ) ;
**************************************************************************/

proc fcmp outlib = work.mysub.temp ;
  function strextract( string $, begindelimiter $, enddelimiter $, occurance ) $ ;
    length retval tempstr $32766 ;

    if occurance = 0 then upperlimit = count( string, begindelimiter ) ;
    else upperlimit = occurance ;

    startposition = 1 ;
    do i = 1 to upperlimit ;
      findstart     = find( string, begindelimiter, startposition ) ;
      findend       = find( string, enddelimiter,   findstart + 1 ) ;
      startposition = findend ;
      tempstr       = substr( string, findstart + lengthn( begindelimiter )
                        , findend - findstart - lengthn( begindelimiter ) ) ;
      if occurance = 0 then retval = catx( " ", retval, tempstr ) ;
    end ;
    if occurance ne 0 then retval = tempstr ;

    return( retval ) ;
  endsub ;
quit  ;

options cmplib = work.mysub ;
data demo( drop = xml str ) ;
  xml  = "< xml>< node>One< /node>< node>Two< /node>< node>Three< /node>< /xml>" ;
  str  = "|One|Two|Three|" ;
  xml0 = strextract( xml, '< node>', '< /node>', 0 ) ;
  str0 = strextract( str, '|', '|', 0 ) ;

  array axml[ 3 ] $8 ;
  array astr[ 3 ] $8 ;
  do _n_ = 1 to 3 ;
    axml[ _n_ ] = strextract( xml, '< node>', '< /node>', _n_ ) ;
    astr[ _n_ ] = strextract( str, '|', '|', _n_ ) ;
  end ; 
run ;

No comments:

Post a Comment