Pages

SyntaxHighlighter

Sunday, August 30, 2015

Column Splitter Take 2 - Merge

An anonymous person provided an alternative take on my %columnSplitter macro and I have to admit this one is easier to understand, has less code and runs about twice as fast. Congrats to whomever shared this alternative approach.

Here is the complete code using the merge statement:

%macro columnSplitter2( 
    columns = 3
  , indsn   =
  , outdsn  =
  , varname =  
) ;
 
  %local dsid nobs increment ;

  %if not %sysfunc( exist( &indsn. ) ) %then %do ;
    %put %str(E)RROR: invalid input data set name: &indsn. ;
    %return ;
  %end ;

  %if %isblank( &indsn. ) %then %do ;
    %put %str(E)RROR: invalid output data set name: &outdsn. ;
    %return ;
  %end ;

  %if %isblank( &varname. ) %then %do ;
    %put %str(E)RROR: invalid variable name: &varname. ;
    %return ;
  %end ;

  %let dsid      = %sysfunc( open( &indsn. ) ) ;
  %let nobs      = %sysfunc( attrn( &dsid., nobs ) ) ;
  %let dsid      = %sysfunc( close( &dsid. ) ) ;
  %let increment = %sysfunc( ceil( &nobs. / &columns. ) ) ;

  data &outdsn. ;
    merge
    %do i = 1 %to &columns ;
      &indsn( 
        keep     = &varname 
        rename   = ( &varname = &varname&i ) 
        firstobs = %eval( ( &i. - 1 ) * &increment. + 1 ) 
        obs      = %eval( &i. * &increment. ) )
    %end ;
    ;
  run ;

%mend ;

%columnSplitter2(columns=3,indsn=states,outdsn=statessplit3,varname=statename)

Friday, August 21, 2015

Column Splitter

I created a report that includes a one page table of contents. Having a single column of all 50 states runs off the page, so I wanted to create additional columns to spread the data horizontally. The %columnSplitter() macro was created to support this capability.

In the below code, several SAS Component Language (SCL) functions are used to process the data. The OPEN() function supports reading data sets in random access instead of sequential access mode. The dsid variable is a pointer to the data set and can be used with many other functions to determine data types, length and number of rows. The %isblank macro code can be found here.

An ARRAY is used to assign the number of columns to create. That value must be determined at compile time, hence the need for the DATA _NULL_ step prior to the assignment of the array subscript. The offset variable is the result by dividing the total number of logical rows by the desired number of columns. The key to the program is the use of the FETCHOBS() function to read a specified row number.

Based on the data type of the &varname variable, a getvarc()/getvarn() function is used to obtain the value of that row. After data for all columns is collected and output, the call missing( of colname[*] ) function is used to reset all array elements to missing. Finally, what gets opened must be closed using the CLOSE() function.

data states ;
  do statecode = 1 to 56 ;
    statename = fipnamel( statecode ) ;
    if statename not in ('Invalid Code', 'District of Columbia' ) then output ;
  end ;
run ;

%macro columnSplitter( 
    columns = 3
  , indsn   =
  , outdsn  =
  , varname =  
) ;

  %local length vartype varnum i j k ;

  %if not %sysfunc( exist( &indsn. ) ) %then %do ;
    %put %str(E)RROR: invalid input data set name: &indsn. ;
    %return ;
  %end ;

  %if %isblank( &indsn. ) %then %do ;
    %put %str(E)RROR: invalid output data set name: &outdsn. ;
    %return ;
  %end ;

  %if %isblank( &varname. ) %then %do ;
    %put %str(E)RROR: invalid variable name: &varname. ;
    %return ;
  %end ;

  data _null_ ;
    dsid = open( "&indsn." ) ;
    varnum = varnum( dsid, "&varname." ) ;
    call symputx( 'varnum', varnum ) ;
    vartype = vartype( dsid, varnum ) ;
    call symputx( 'vartype', vartype ) ;
    call symputx('length',cats(ifc(vartype='C','$',''),varlen(dsid,varnum)));
    dsid = close( dsid ) ;
  run ;

  data &outdsn. ( keep = colname: ) ; 
    array colname[ &columns. ] &length. ; 
    dsid = open( "&indsn." ) ;
    totalobs = attrn( dsid, 'nlobs' ) ;
    offset = ceil( totalobs / &columns. ) ;
    do i = 1 to offset ;
      k = 1 ;
      do j = 0 by offset while( k <= &columns. ) ;
        rc = fetchobs( dsid, i + j ) ;
        if ( i + j ) <= totalobs then do ;
          if "&vartype." = "C" then colname[k] = getvarc(dsid, &varnum.) ;
          else colname[ k ] = getvarn( dsid, &varnum. ) ;
        end ;
        k + 1 ;
      end ;
      output ;
      call missing( of colname[ * ] ) ;
    end ;

    dsid = close( dsid ) ;
    stop ;
  run ;
%mend ;

%columnSplitter(columns=3,indsn=states,outdsn=statessplit3,varname=statename)
%columnSplitter(columns=4,indsn=states,outdsn=statessplit4,varname=statename)