decomposing sql%rowcount for merge
The MERGE statement (AKA "UPSERT") released in Oracle 9i is possibly one of the most useful ETL-enabling technologies built into the Oracle kernel. For those who have missed Oracle's headlines for the last year and a half and are unaware of what the MERGE statement does, it simply enables us to either UPDATE or INSERT a row into a target table in one statement. You simply tell Oracle your rules for determining whether a target row should updated or inserted from the source, and Oracle does the rest.
Prior to 9i, the alternative in SQL was to perform two individual DML statements (one UPDATE and one INSERT, each with opposing predicates). The alternative in PL/SQL was even worse - either try to INSERT a row and if it failed with a DUP_VAL_ON_INDEX exception, then UPDATE the row instead, or try to UPDATE a row, only inserting in the event of a SQL%NOTFOUND.
In the days of excessive auditing, many ETL tools keep track of the number of inserts and number of updates being performed during batch data loads. Unfortunately, Oracle is as yet unable to provide individual rowcounts for the UPDATE and INSERT components of the MERGE statement (who's betting this will be a 10g feature?). Instead, we still get the SQL%ROWCOUNT attribute, which tells us the overall number of records merged.
What I am demonstrating in this short article is my attempt at decomposing SQL%ROWCOUNT into its component DML counts. To enable this I am simply maintaining package variables to keep track of either the number of updates, inserts or both. Incrementing the package counters is done by simply "piggy-backing" one of the UPDATE SET columns or one of the INSERT VALUES columns. This will become clearer when you see the code.
First, the setup. I am going to create two variations on the EMP table. EMP_SOURCE will be a full copy of the EMP table (all 14 rows of it!). EMP_TARGET is going to contain just eight of these rows. I am then going to MERGE EMP_SOURCE into EMP_TARGET, such that we will expect eight rows to be updated and six records to be inserted.
SQL> CREATE TABLE emp_source 2 AS 3 SELECT * FROM emp;
SQL> SELECT COUNT(*) FROM emp_source;
COUNT(*) ---------- 14
SQL> CREATE TABLE emp_target 2 AS 3 SELECT * FROM emp WHERE ROWNUM <= 8;
SQL> SELECT COUNT(*) FROM emp_target;
COUNT(*) ---------- 8
I'll now MERGE the EMP_SOURCE data into EMP_TARGET.
SQL> BEGIN 2 MERGE INTO emp_target et 3 USING ( SELECT * FROM emp_source ) es 4 ON ( et.empno = es.empno ) 5 WHEN MATCHED THEN 6 UPDATE 7 SET et.ename = es.ename 8 , et.sal = es.sal 9 , et.mgr = es.mgr 10 , et.deptno = es.deptno 11 WHEN NOT MATCHED THEN 12 INSERT 13 ( et.empno, et.ename, et.sal, et.mgr, et.deptno ) 14 VALUES 15 ( es.empno, es.ename, es.sal, es.mgr, es.deptno ); 16 17 DBMS_OUTPUT.PUT_LINE(TO_CHAR(SQL%ROWCOUNT) || ' rows merged.'); 18 END; 19 /
14 rows merged. PL/SQL procedure successfully completed.
An important caveat to note is that Oracle will generate an "ORA-30926: unable to get a stable set of rows in the source tables" error if there is either a many-to-one or many-to-many relationship between the source and target tables. This is not as serious as it sounds because you would normally have to MERGE a one-to-one or one-to-zero relationship as your join condition would be protected by the target's primary key.
As can be seen, the SQL%ROWCOUNT attribute provides us with the total MERGE count, but we have no idea of how many updates or inserts were performed, which doesn't help us when we have a production batch run to audit or debug. To enable us to keep track of this, I created a small package named ETL with three functions (plus two overloads) and one procedure. The source code for this is as follows.
CREATE OR REPLACE PACKAGE etl AS c_inserting CONSTANT PLS_INTEGER := 0; c_updating CONSTANT PLS_INTEGER := 1; FUNCTION merge_counter ( action_in IN PLS_INTEGER DEFAULT c_inserting ) RETURN PLS_INTEGER; FUNCTION get_merge_update_count RETURN PLS_INTEGER; FUNCTION get_merge_update_count ( merge_count_in IN PLS_INTEGER ) RETURN PLS_INTEGER; FUNCTION get_merge_insert_count RETURN PLS_INTEGER; FUNCTION get_merge_insert_count ( merge_count_in in PLS_INTEGER ) RETURN PLS_INTEGER; PROCEDURE reset_counters; END etl; / CREATE OR REPLACE PACKAGE BODY etl AS g_update_counter PLS_INTEGER NOT NULL := 0; g_insert_counter PLS_INTEGER NOT NULL := 0; /*----------- FUNCTION merge_counter -----------*/ FUNCTION merge_counter ( action_in IN PLS_INTEGER DEFAULT c_inserting ) RETURN PLS_INTEGER IS BEGIN CASE action_in WHEN c_updating THEN g_update_counter := g_update_counter + 1; WHEN c_inserting THEN g_insert_counter := g_insert_counter + 1; ELSE RAISE PROGRAM_ERROR; END CASE; RETURN 0; END merge_counter; /*----------- FUNCTION get_merge_update_count V1 -----------*/ FUNCTION get_merge_update_count RETURN PLS_INTEGER is BEGIN RETURN g_update_counter; END get_merge_update_count; /*----------- FUNCTION get_merge_update_count V2 -----------*/ FUNCTION get_merge_update_count ( merge_count_in IN PLS_INTEGER ) RETURN PLS_INTEGER IS BEGIN RETURN NVL( merge_count_in - g_insert_counter, 0 ); END get_merge_update_count; /*----------- FUNCTION get_merge_insert_count V1 -----------*/ FUNCTION get_merge_insert_count RETURN PLS_INTEGER IS BEGIN RETURN g_insert_counter; END get_merge_insert_count; /*----------- FUNCTION get_merge_insert_count V2 -----------*/ FUNCTION get_merge_insert_count ( merge_count_in IN PLS_INTEGER ) RETURN PLS_INTEGER IS BEGIN RETURN NVL( merge_count_in - g_update_counter, 0 ); END get_merge_insert_count; /*----------- FUNCTION reset_counters -----------*/ PROCEDURE reset_counters IS BEGIN g_update_counter := 0; g_insert_counter := 0; END reset_counters; END etl; /
Note that there is one function to set either the UPDATE or INSERT counter and two functions to retrieve INSERT or UPDATE counts (each with an overload - totalling four "get" functions). Finally, there is a small procedure to reset the counters.
decomposing sql%rowcount using the etl package
The following is an example of how we might "piggy-back" the earlier MERGE statement to decompose the SQL%ROWCOUNT.
SQL> BEGIN 2 MERGE INTO emp_target et 3 USING ( SELECT * FROM emp_source ) es 4 ON ( et.empno = es.empno ) 5 WHEN MATCHED THEN 6 UPDATE 7 SET et.ename = es.ename 8 , et.sal = es.sal 9 , et.mgr = es.mgr 10 , et.deptno = es.deptno 11 WHEN NOT MATCHED THEN 12 INSERT 13 ( et.empno 14 , et.ename 15 , et.sal 16 , et.mgr 17 , et.deptno 18 ) 19 VALUES 20 ( CASE etl.merge_counter(etl.c_inserting) 21 WHEN 0 THEN es.empno 22 END 23 , es.ename 24 , es.sal 25 , es.mgr 26 , es.deptno 27 ); 28 29 DBMS_OUTPUT.PUT_LINE( 30 TO_CHAR(SQL%ROWCOUNT) || ' rows merged.' 31 ); 32 DBMS_OUTPUT.PUT_LINE( 33 TO_CHAR(etl.get_merge_insert_count) || ' rows inserted.' 34 ); 35 DBMS_OUTPUT.PUT_LINE( 36 TO_CHAR(etl.get_merge_update_count( SQL%ROWCOUNT )) 37 || ' rows updated.' 38 ); 39 etl.reset_counters; 40 END; 41 /
14 rows merged. 6 rows inserted. 8 rows updated.
Some notes on the above PL/SQL block:
- Line 20. I have wrapped the INSERT of es.empno in a simple CASE expression. This CASE expression calls the ETL function MERGE_COUNTER, telling it that I require an INSERT counter to be incremented. This function ALWAYS returns 0, so my first test is for 0, which of course guarantees that the es.empno value will be preserved in the overall INSERT statement.
- Line 33. A simple call to the non-parameter overload of the ETL.GET_MERGE_INSERT_COUNT function returns me the value of the INSERT counter.
- Line 36. I chose not to maintain an UPDATE counter in this example (because I know my data and I always get more updates than inserts, so to be as efficient as possible given the fact that I'm forcing all these extra function executions, I'll not bother with an UPDATE count). Instead, I supplied the overloaded ETL.GET_MERGE_UPDATE_COUNT function with the SQL%ROWCOUNT and this returned me the total MERGE count minus the retained INSERT count. You have the option of course to keep both counters going in one statement, but the overloads in the ETL package make this unnecessary.
And that is my basic implementation of a MERGE INSERT and a MERGE UPDATE counter. There is a performance cost, naturally, but Oracle maintains that the cost of executing PL/SQL functions from SQL is continually falling. My tests in merging 146,000 rows, 99% of which were inserts showed very few identifiable resource costs other than CPU time, which was marginally increased as is to be expected. No doubt some of you will consider it a cost too much, but I would argue that in many cases, the extra cost of keeping a package variable updated will be marginal when set against a large, database-intensive MERGE statement. I will be happily using it in my ETL processing, until Oracle includes its own of course!
latest version of package
Note that since writing this article, a modified version of the MERGE counter package is available here.
The source code for the examples in this article can be downloaded from here.
Adrian Billington, August 2003Back to Top