I have always preferred to use SQL versus SAS syntax because of the wider usage of SQL for general data manipulation. However, I have heard that the SAS update statement is much more efficient than SQL for updates, so I wrote a test case to verify.
Sure enough, the SAS update statement is several orders of magnitude faster when updating roughly 5,000 rows from one table to a 100,000 row table.
data sample ; do id = 1 to 100000 ; lname = 'Smith' ; output ; end ; run ; data upd ; set sample ; if ranuni( 1 ) <= .05 then do ; lname = 'Jones' ; output ; end ; run ; data sample ; update sample upd ; by id ; run ; proc sql ; update sample s set lname = ( select lname from upd u where u.id = s.id ) where id in ( select id from upd u ) ; quit ; proc sql ; update sample s set lname = ( select lname from upd u where u.id = s.id ) where exists ( select * from upd u where u.id = s.id ) ; quit ;
See also: Update Statement, Updating a Table with Values from Another Table
Thank you so much for sharing this wonderful information. I find this very useful and informative. keep up this good work.
ReplyDeleteSAS training in Chennai
Thats a big difference!
ReplyDeleteThe SQL-code is explicit about which column is being updated, and which column the values come from. On the other hand, the data step is implicitly referring to all columns (I guess). How do I make the data step explicit about which columns are being updated from which source column?