SyntaxHighlighter

Saturday, February 28, 2015

SAS vs SQL Update

I have always preferred to use SQL versus SAS syntax because of the wider usage of SQL for general data manipulation. However, I have heard that the SAS update statement is much more efficient than SQL for updates, so I wrote a test case to verify.

Sure enough, the SAS update statement is several orders of magnitude faster when updating roughly 5,000 rows from one table to a 100,000 row table.

data sample ;
  do id = 1 to 100000 ;
    lname = 'Smith' ;
    output ;
  end ;
run ;

data upd ;
  set sample ;
    if ranuni( 1 ) <= .05 then do ;
      lname = 'Jones' ;
      output ;
    end ;
run ;

data sample ;
  update sample upd  ;
    by id ;
run ;

proc sql ;
  update sample s
    set lname = 
      ( select   lname 
          from   upd u
          where  u.id = s.id 
      )  
      where id in
      ( select   id
          from   upd u 
      ) ;
quit ;

proc sql ;
  update sample s
    set lname = 
      ( select   lname 
          from   upd u
          where  u.id = s.id 
      )  
      where exists
      ( select   * 
          from   upd u 
          where  u.id = s.id 
      ) ; 
quit ;

See also: Update Statement, Updating a Table with Values from Another Table

2 comments:

  1. Thank you so much for sharing this wonderful information. I find this very useful and informative. keep up this good work.



    SAS training in Chennai

    ReplyDelete
  2. Thats a big difference!
    The SQL-code is explicit about which column is being updated, and which column the values come from. On the other hand, the data step is implicitly referring to all columns (I guess). How do I make the data step explicit about which columns are being updated from which source column?

    ReplyDelete