introduction to 8i bulk pl/sql processing
I'm pleased to announce that I've been selected as a finalist in the inaugural Oracle Database Developer Choice Awards. To quote, these awards "celebrate and recognize technical expertise and contributions in the Oracle Database community".
I've been selected for the PL/SQL category, primarily for the articles and utilities that I share here on oracle-developer.net. I'm very pleased to be recognised by the Oracle Community in this way and if you have found oracle-developer.net to be a helpful and/or interesting resource, then I'd appreciate your vote (my voting page is here).
More information on the Awards, including an explanatory video, the rules, other categories and finalists can all be found on the main page.
Thanks for your support!
This paper briefly introduces the bulk PL/SQL processing concepts and enhancements in Oracle 8i. Bulk or array processing has been available in Oracle for some time (using PRO*C or DBMS_SQL), but has only become available in native PL/SQL with the release of Oracle 8i.
The concept of bulk processing in PL/SQL is very simple - it enables PL/SQL and SQL to exchange "batches" of data, rather than a row at a time. Consider the following example:
BEGIN FOR rec IN ( SELECT value FROM table ) LOOP rec.value := rec.value * 1.5; INSERT INTO new_table VALUES (rec.value); END LOOP; COMMIT; END;
Obviously this is an unrealistic methodology (though I have seen reams of code written in such a way - you would hopefully use a single INSERT..SELECT SQL statement to achieve the same) but it illustrates the point that row-by-row processing means that the SQL engine and PL/SQL engine must exchange information to enable SQL and procedural logic to co-exist. The exchange of emphasis between the PL/SQL and SQL engines is known as "context switching".
In the above example, there are several context switches. First, PL/SQL asks SQL for a row. Second, SQL passes a row through to PL/SQL. Third, PL/SQL re-assigns the record attribute and then passes over to SQL to insert a row into a table. Fourth, SQL then passes back to PL/SQL, which comes to the end of the loop and straight away requests another row from SQL. That's a lot of context switching just to process one row!
The point of bulk processing in 8i PL/SQL is, therefore, to reduce the context switching. It enables PL/SQL to receive a batch of records from the database to process in one pass, or enables PL/SQL to send a batch of records to the database in one pass. This enormous reduction in context switching contributes to (in some cases) massive performance increases. When coupled with DML (as in the example of "bad" PL/SQL above), additional high performance gains are achieved by the fact that bulk DML uses significantly fewer resources (such as REDO) than row-at-a-time DML.
syntax and usage
BULK COLLECT [LIMIT] FORALL RETURNING
bulk fetch (bulk collect)
This enables you to request, from the SQL engine, either a batch or all of the records returned by a cursor in your PL/SQL. This is best demonstrated by an example - in the following we will bulk collect all object names available to our user, loop through them to "process" the data and then finally report on how many records we processed.
SQL> DECLARE 2 3 /* 4 * Declare an index-by table type and 5 * variable to hold the fetched data... 6 */ 7 TYPE typ_ibt_name IS TABLE OF dba_objects.object_name%TYPE 8 INDEX BY BINARY_INTEGER; 9 ibt_names typ_ibt_name; 10 11 /* We'll time it... */ 12 v_start_time PLS_INTEGER := DBMS_UTILITY.GET_TIME; 13 v_elapsed PLS_INTEGER; 14 15 BEGIN 16 17 /* Fetch all of the object names in one go... */ 18 SELECT object_name 19 BULK COLLECT INTO ibt_names 20 FROM dba_objects; 21 22 /* Loop and "process" the data... */ 23 FOR i IN 1 .. ibt_names.COUNT LOOP 24 NULL; --processing 25 END LOOP; 26 27 /* How long did it take ? */ 28 v_elapsed := DBMS_UTILITY.GET_TIME - v_start_time; 29 30 DBMS_OUTPUT.PUT_LINE ( 31 'Fetched ' || TO_CHAR ( ibt_names.COUNT ) || 32 ' records in ' || 33 TO_CHAR ( ROUND ( v_elapsed/100,2 )) || 34 ' seconds.'); 35 36 END; 37 /
Fetched 25340 records in 1.86 seconds. PL/SQL procedure successfully completed.
Some points to note are:
- Lines 7-9. Bulk processing in PL/SQL requires the use of index-by tables (PL/SQL-only arrays) or collections created on the database. In general there is little performance difference between index-by tables and collections, and in the example above, we declare an index-by table type to define our array variable.
- Line 19. BULK COLLECT syntax as an extension of an implicit SELECT..INTO statement. We fetch straight into our index-by table implicitly. Note that the BULK COLLECT derivative will not raise a NO_DATA_FOUND exception in the event of no rows being returned.
- Line 23. BULK COLLECT loads index-by tables dense so that array.COUNT is equivalent to the last element. This means we can loop through the index-by table using (1 .. array.COUNT) as iterators as well as (array.FIRST .. array.LAST). The benefit of the (1 .. array.COUNT) method is that if the index-by table is empty, no exception will be raised. The (array.FIRST .. array.LAST) method will raise an exception if the index-by table is empty, such that additional defensive coding is required (e.g. IF array.COUNT > 0 THEN ...).
- Lines 23-25. We still need to loop to process data in this example, but looping through an index-by table or collection is significantly faster than looping through a cursor as it does not require any context switches.
Prior to 8i, a row-by-row implementation would have been as follows.
SQL> DECLARE 2 3 /* We'll time it... */ 4 v_start_time PLS_INTEGER := DBMS_UTILITY.GET_TIME; 5 v_elapsed PLS_INTEGER; 6 7 v_counter PLS_INTEGER := 0; 8 9 BEGIN 10 11 /* Process object names one at a time... */ 12 FOR rec IN ( SELECT object_name FROM dba_objects ) LOOP 13 v_counter := v_counter + 1; 14 END LOOP; 15 16 /* How long did it take ? */ 17 v_elapsed := DBMS_UTILITY.GET_TIME - v_start_time; 18 19 DBMS_OUTPUT.PUT_LINE ( 20 'Fetched ' || TO_CHAR ( v_counter ) || 21 ' records in ' || 22 TO_CHAR ( ROUND ( v_elapsed/100,2 )) || 23 ' seconds.'); 24 25 END; 26 /
Fetched 25340 records in 5.18 seconds. PL/SQL procedure successfully completed.
Despite requiring more code, we can see that the new BULK COLLECT syntax enables us to speed up the fetching of data (in the above example by almost 3 times). It also enables us to get much closer to the performance achievable in SQL (running SELECT COUNT(*) FROM dba_objects took 1.36 seconds).
limiting bulk fetches
If we need to process a large table, there is a LIMIT clause that enables us to BULK COLLECT in batches, both to make DML more manageable and also to avoid using too much memory in our index-by tables / collections. The optimal size of LIMIT is dependant on the task in hand, but in DML-intensive applications I've always found LIMITs between 500 and 1000 records to perform well.
If we extend our simple DBA_OBJECTS example above to use a sizeable DBA_SOURCE instead, we will fetch all the lines of code currently compiled on our database and then report on how many we fetched. We'll assume in this instance that fetching all code in one go will be too memory-intensive for our system.
SQL> DECLARE 2 3 /* 4 * Declare index-by table types and 5 * variables to hold the fetched data... 6 */ 7 TYPE typ_ibt_name IS TABLE OF dba_source.name%TYPE 8 INDEX BY BINARY_INTEGER; 9 10 TYPE typ_ibt_line IS TABLE OF dba_source.line%TYPE 11 INDEX BY BINARY_INTEGER; 12 13 TYPE typ_ibt_text IS TABLE OF dba_source.text%TYPE 14 INDEX BY BINARY_INTEGER; 15 16 ibt_names typ_ibt_name; 17 ibt_lines typ_ibt_line; 18 ibt_text typ_ibt_text; 19 20 v_counter PLS_INTEGER := 0; 21 22 /* Explicit cursor required... */ 23 CURSOR cur 24 IS 25 SELECT name, line, text 26 FROM dba_source 27 ORDER BY 28 name, type, line; 29 30 /* Our "batch" size... */ 31 c_batch CONSTANT PLS_INTEGER := 500; 32 33 BEGIN 34 35 /* Loop the cursor... */ 36 OPEN cur; 37 LOOP 38 39 /* Fetch in batches... */ 40 FETCH cur 41 BULK COLLECT INTO ibt_names, 42 ibt_lines, 43 ibt_text LIMIT c_batch; 44 45 IF ibt_names.COUNT > 0 THEN 46 47 /* Do some "processing"... */ 48 v_counter := v_counter + ibt_names.COUNT; 49 50 END IF; 51 52 /* Critical placement of EXIT condition... */ 53 EXIT WHEN cur%NOTFOUND; 54 55 END LOOP; 56 CLOSE cur; 57 58 DBMS_OUTPUT.PUT_LINE ( 59 'We fetched ' || TO_CHAR ( v_counter ) || 60 ' records ' || TO_CHAR ( c_batch ) || 61 ' records at a time.' ); 62 63 END; 64 /
We fetched 519155 records 500 records at a time. PL/SQL procedure successfully completed.
Some points to note:
- Lines 16-18. Each column fetched requires its own index-by table or collection. We cannot BULK COLLECT straight into an index-by table/collection of records in 8i (except with workarounds using object and collection types). In production code, I would put these various index-by tables into a record.
- Lines 39-43. Each fetch of (in this case) 500 rows will re-populate the index-by tables, so there's no need to empty them each time around the loop.
- Line 53. If the cursor does not fetch a full LIMIT's worth of data ( in this case 500 ), this will satisfy the EXIT WHEN SQL%NOTFOUND clause. It is therefore critical that all processing is done before the exit condition - in the above example, the last FETCH returned 171 records, which would satisfy cur%NOTFOUND. If you want to have an EXIT WHEN before the processing section, use EXIT WHEN ibt_names.COUNT = 0 instead.
bulk binding (forall)
This clause enables us to send batches of data to the SQL engine from PL/SQL. This is for DML only and the first statement following a FORALL statement must be INSERT, UPDATE or DELETE.
In the following example, we'll extend the previous LIMIT example to simulate a staged load in a data warehouse batch run. A full copy of DBA_SOURCE will serve as our base target table and USER_SOURCE will serve as our new staged data. We'll BULK COLLECT several LIMITs of new source data and then UPDATE our target table with some dummy manipulation of the source record using bulk binding (FORALL).
SQL> CREATE TABLE target_table 2 NOLOGGING 3 AS 4 SELECT owner, name, type, line, text 5 FROM dba_source;
SQL> CREATE INDEX ni1_target_table 2 ON 3 target_table ( owner, name, type, line ) 4 NOLOGGING 5 COMPUTE STATISTICS;
SQL> DECLARE 2 3 /* Explicit cursor required... */ 4 CURSOR cur 5 IS 6 SELECT name, type, line, text 7 FROM user_source 8 ORDER BY 9 name, type, line; 10 11 /* 12 * Declare index-by table types 13 * defined by the source data... 14 */ 15 TYPE typ_ibt_name IS TABLE OF user_source.name%TYPE 16 INDEX BY BINARY_INTEGER; 17 18 TYPE typ_ibt_type IS TABLE OF user_source.type%TYPE 19 INDEX BY BINARY_INTEGER; 20 21 TYPE typ_ibt_line IS TABLE OF user_source.line%TYPE 22 INDEX BY BINARY_INTEGER; 23 24 TYPE typ_ibt_text IS TABLE OF user_source.text%TYPE 25 INDEX BY BINARY_INTEGER; 26 27 /* 28 * We'll use a record type this time for "professional" 29 * code... 30 */ 31 TYPE typ_rec_source IS RECORD 32 ( names typ_ibt_name 33 , types typ_ibt_type 34 , lines typ_ibt_line 35 , text typ_ibt_text 36 ); 37 rec_source typ_rec_source; 38 39 v_fetched PLS_INTEGER := 0; 40 v_updated PLS_INTEGER := 0; 41 42 /* Our "batch" size... */ 43 c_batch CONSTANT PLS_INTEGER := 500; 44 45 BEGIN 46 47 /* Loop the cursor... */ 48 OPEN cur; 49 LOOP 50 51 /* Fetch in batches... */ 52 FETCH cur 53 BULK COLLECT INTO rec_source.names, 54 rec_source.types, 55 rec_source.lines, 56 rec_source.text LIMIT c_batch; 57 58 IF rec_source.names.COUNT > 0 THEN 59 60 v_fetched := v_fetched + rec_source.names.COUNT; 61 62 /* 63 * Update our target data with new source 64 * data in batches... 65 */ 66 FORALL i IN rec_source.names.FIRST .. 67 rec_source.names.LAST 68 UPDATE target_table 69 SET text = UPPER(rec_source.text(i)) 70 WHERE owner = USER 71 AND name = rec_source.names(i) 72 AND type = rec_source.types(i) 73 AND line = rec_source.lines(i); 74 75 v_updated := v_updated + SQL%ROWCOUNT; 76 77 END IF; 78 79 /* Critical placement of EXIT condition... */ 80 EXIT WHEN cur%NOTFOUND; 81 82 END LOOP; 83 CLOSE cur; 84 85 DBMS_OUTPUT.PUT_LINE ( 86 'We fetched ' || TO_CHAR ( v_fetched ) || 87 ' source records and updated ' || 88 TO_CHAR ( v_updated ) ||' target records.' ); 89 90 END; 91 /
We fetched 2807 source records and updated 2807 target records. PL/SQL procedure successfully completed.
Some points to note are:
- Lines 31-35. As with BULK COLLECT, FORALL does not support index-by tables/collections of records, hence each individual column being targeted in a FORALL DML statement will require its own index-by table/collection.
- Lines 66-67. FORALL is not a loop construct. The lower and upper bounds are integers that tell the SQL engine which range of elements in an index-by table or collection to use. In this example we are using all elements in our index-by tables from the FIRST to the LAST (using collection methods that return the offset of the FIRST and LAST elements in an index-by table or collection).
- Lines 66-73. FORALL in 8i either succeeds or fails in its entirety. There's no direct means of skipping any elements in the index-by table or collection which cause an exception in the DML.
- Line 75. SQL%ROWCOUNT reports the affected row count of each "batch" of DML.
capturing modified data (returning)
The RETURNING clause enables us to capture the data that was affected by a DML statement we executed. When used with BULK COLLECT, it can be quite powerful.
Consider the following example: we want to DELETE the records of deceased customers from our CUSTOMERS table, but keep a record of the records that we removed in a DECEASED_CUSTOMERS table. The RETURNING clause enables us to get this information we need in one hit, as the following example shows.
SQL> CREATE TABLE customers 2 NOLOGGING 3 AS 4 SELECT object_id AS customer_id 5 , DECODE( MOD(ROWNUM,10), 0, 'Y', 'N') 6 AS deceased_ind 7 , DECODE( MOD(ROWNUM,10), 0, created, TO_DATE(NULL)) 8 AS date_deceased 9 FROM user_objects;
SQL> CREATE TABLE deceased_customers 2 ( customer_id INTEGER 3 , date_deceased DATE 4 );
SQL> DECLARE 2 3 TYPE typ_ibt_id IS TABLE OF customers.customer_id%TYPE 4 INDEX BY BINARY_INTEGER; 5 6 TYPE typ_ibt_date IS TABLE OF customers.date_deceased%TYPE 7 INDEX BY BINARY_INTEGER; 8 9 ibt_ids typ_ibt_id; 10 ibt_dates typ_ibt_date; 11 12 BEGIN 13 14 /* 15 * DELETE deceased customers from customers table 16 * but keep a record of the records for storing in 17 * DECEASED_CUSTOMERS table... 18 */ 19 DELETE FROM customers 20 WHERE deceased_ind = 'Y' 21 RETURNING customer_id, date_deceased 22 BULK COLLECT INTO ibt_ids, ibt_dates; 23 24 /* Did we delete any ? */ 25 DBMS_OUTPUT.PUT_LINE ( 'DELETE count = ' || 26 TO_CHAR(SQL%ROWCOUNT) ); 27 28 DBMS_OUTPUT.PUT_LINE ( 'RETURNING count = ' || 29 TO_CHAR(ibt_ids.COUNT) ); 30 31 /* Record them in DECEASED_CUSTOMERS table... */ 32 IF ibt_ids.COUNT > 0 THEN 33 34 FORALL i IN ibt_ids.FIRST .. ibt_ids.LAST 35 INSERT INTO deceased_customers 36 VALUES ( ibt_ids(i), ibt_dates(i) ); 37 38 DBMS_OUTPUT.PUT_LINE ( 'INSERT count = ' || 39 TO_CHAR(SQL%ROWCOUNT) ); 40 41 END IF; 42 43 END; 44 /
DELETE count = 9 RETURNING count = 9 INSERT count = 9 PL/SQL procedure successfully completed.
Some points to note are:
- Lines 21-22. We have "tacked on" a RETURNING clause to our DELETE statement to request a series of specific columns back from the records that are affected by the DML (in this case just CUSTOMER_ID and DATE_DECEASED). In addition to this, we have asked for the values for these particular columns to be returned in bulk using BULK COLLECT (note that RETURNING can also be used on single-row DML).
- Line 34-36. We can now use bulk binding (FORALL) to INSERT our deceased customer records without having to re-apply any of our original criteria - all the data we require is already in memory. Prior to 8i we would probably run an INSERT .. SELECT statement into the DECEASED_CUSTOMERS table, then run a DELETE statement against CUSTOMERS by referencing either the DECEASED_CUSTOMERS table or the original INSERT..SELECT criteria. Either method would have probably required an additional table scan, so in using RETURNING and BULK COLLECT we have avoided a lot of extra physical and logical I/O.
bulk fetching, bulk binding and dynamic sql
As mentioned in the introduction to this article, the low-level DBMS_SQL package can be used to both fetch and perform DML in batches, although this forces everything down the dynamic SQL route - even code that could be written statically.
Oracle 8i introduced Native Dynamic SQL (NDS) to make dynamic SQL applications much simpler than DBMS_SQL, though there is a price to pay for this simplicity. One of the shortcomings of NDS in 8i is that there is no direct way of using BULK COLLECT or FORALL against dynamic SQL statements. Native Dynamic SQL and row-by-row processing are an "expensive" combination.
All is not lost, however, as there are workarounds available by using dynamic PL/SQL as an extension of NDS. For either a BULK COLLECT of a dynamic SQL statement or a FORALL of a dynamic DML statement, a run-time anonymous PL/SQL block can be generated and executed to perform these bulk processing tasks. I have not included the code in this article (which is intended to be an introduction to bulk processing, not an in-depth discussion), but several examples are available online.
As of Oracle 8i, I can see very few reasons NOT to make use of bulk fetching and binding. You should always lean towards performing SQL or DML in single statements, but for those times when PL/SQL simply MUST be used, you should no longer treat row-by-row processing as an acceptable implementation. The additional code required to utilise bulk processing is far outweighed by the benefits to be gained in both execution times and resource costs. As we move into Oracle 9i territory, there will be some significant enhancements to bulk processing which makes its application almost limitless.
The source code for the examples in this article can be downloaded from here.
Adrian Billington, December 2001Back to Top