Pages

SyntaxHighlighter

Wednesday, May 7, 2014

Matrix Validation

Requirements were to examine the data searching for adjacent cells to the right or below and identify cases where the value decreased. An "A" is used in the output data set to signify an issue going across. A "D" signifies a downward violation while a "B" means there were issues going both across and down.

In order to read ahead, I created the test2 data set and added an extra blank row since firstobs = 2 to ensure all data from test gets read in. Arrays are used to generically compare the values and write out any offending values.

data 
  test
  test2
   ( rename = 
     ( x1 = y1 
       x2 = y2 
       x3 = y3 
       x4 = y4 
       x5 = y5 
     ) 
   ) ;

  infile datalines eof = onemore ;
  input x1 - x5;
  output ;
 return ;

 onemore:
    call missing( of _all_ ) ;
    output test2 ;
 return ;

  datalines;
1 2 3 4 5
1 1 1 2 1
4 3 3 2 1
5 4 3 1 2
1 0 1 0 2
;
run ;


data results( keep = ac: ) ;
  set test ( firstobs = 1 ) nobs = nobs  ;
  set test2 ( firstobs = 2  ) ;
  
  array aa[ * ] x: ;
  array ab[ * ] y: ;
  array ac[ * ] $1 ac1 - ac5 ;

  do i = 1 to dim( aa )  ;
    if i < dim( aa ) then do ;
      if aa[ i ] > aa[ i + 1 ] then ac[ i ] = "A" ;
    end ;
    if _n_ < nobs then do ;
      if aa[ i ] > ab[ i ] then do ;
        if missing( ac[ i ] ) then ac[ i ] = "D" ;
        else ac[ i ] = "B" ;
      end ;
    end ;
  end ;
run ;

Friday, May 2, 2014

Hash Object Throwdown: SetCur() vs Find_Next() methods

Is SAS's hash iterator object's setcur() method faster than a hash object's find_next() method when extracting multiple values from a key value? In code below, 5 million rows were created for key values 'A', 'B' and 'C' then the 'B' value was searched and extracted.

It turns out the hash object's find_next() method is about 25% faster than the iterator's setcur() method.

data input ;
  length
    key $1
    sat  5 ;
 
  do key = 'A', 'B', 'C' ;
    do sat = 1 to 5000000 ;
      output ;
    end ;
  end ;
run ;
 
data
  xiterator( keep = key sat )
  xhash( keep = key sat ) ;
  if 0 then set input ;
 
  dcl hash hh( dataset: 'input', ordered: 'a', multidata: 'y' ) ;
  dcl hiter hi( 'hh' ) ;
  hh.definekey( 'key' ) ;
  hh.definedata( 'key', 'sat' ) ;
  hh.definedone() ;
 
  findthis = 'B' ;
 
  temp_start = datetime() ;
  do rc = hi.setcur( key: findthis ) by 0 while( rc = 0 and key = findthis ) ;
    output xiterator ;
    rc = hi.next() ;
  end ;
  temp_end = datetime() - temp_start ;
  put temp_end time10.4 ;
 
  temp_start = datetime() ;
  do rc = hh.find( key: findthis ) by 0 while( rc = 0 ) ;
    output xhash ;
    rc = hh.find_next() ;
  end ;
  temp_end = datetime() - temp_start ;
  put temp_end time10.4 ;
 
  stop ;
run ;
For more information, read this excellent paper on Hash Objects: Black Belt Hashigana