pl/sql optimisation in 10g
Oracle has made several claims that PL/SQL is now faster "out of the box" in 10g, due largely to a re-development of its underlying algorithms. Oracle also provides a new parameter for us to control how much of this optimisation we wish to adopt when running PL/SQL in 10g.
In addition to this, various elements of the language have also undergone improvements. For example, all integer arithmetic is now machine-based. Floating point arithmetic is possible in Oracle for the first time and provides for some impressive performance gains in heavily-computational code (even without native compilation).
This article will, however, concentrate solely on the underlying PL/SQL optimisation rather than the performance gains with using new datatypes as described above. This article will be divided into three sections. The first section will briefly highlight the performance improvements in 10g PL/SQL. The second section demonstrates a specific optimisation dedicated to cursor-for-loops and the final section is a new addition that describes some potential issues (bugs) with the hidden optimisation.
about the examples in this article
Most of the original examples in this article compare the performance of fairly trivial PL/SQL blocks between 10g Release One (10.1.0.1) and 9i Release Two (9.2.0.5). The databases both reside on the same twin CPU Linux x86 server and the general setups are as comparable as two different releases of Oracle can be. The examples in the new section on optimisation issues have been tested on 10.1.0.5, 10.2.0.3 and 11.1.0.6.
plsql_optimize_level
For the first time, Oracle has provided a parameter to control how much optimisation we wish our PL/SQL modules to undergo. There are three levels of optimisation, as follows.
- level 0: no optimisation (simulates previous releases of Oracle);
- level 1: minor optimisation (some code re-write, removing unnecessary assignments or computations); and
- level 2: default optimisation (major code re-write and relocation during compilation for more significant performance improvements).
As stated above, level 2 is the default and it is unlikely that we would ever wish to set it otherwise. Should we wish to change the level we can either set it instance-wide in our spfile/pfile or for our session only with an ALTER SESSION statement.
simple optimisation
To begin, we'll simply compare a computational PL/SQL block in 10g and 9i and time it by the "wall-clock". We'll begin by running the block on 10g. This runs 1 million iterations of a meaningless calculation.
10g> set timing on 10g> DECLARE 2 c NUMBER := 1; 3 pi NUMBER := 3.142; 4 r NUMBER := DBMS_RANDOM.RANDOM; 5 BEGIN 6 FOR i IN 1 .. 1000000 LOOP 7 /* Use a random, variable calculation... */ 8 c := pi*(r**2) + (MOD(r,c)*pi+i); 9 END LOOP; 10 END; 11 /
PL/SQL procedure successfully completed. Elapsed: 00:00:02.13
Now we can run exactly the same block in 9i and see how long it takes to run.
9i> set timing on 9i> DECLARE 2 c NUMBER := 1; 3 pi NUMBER := 3.142; 4 r NUMBER := DBMS_RANDOM.RANDOM; 5 BEGIN 6 FOR i IN 1 .. 1000000 LOOP 7 /* Use a random, variable calculation... */ 8 c := pi*(r**2) + (MOD(r,c)*pi+i); 9 END LOOP; 10 END; 11 /
PL/SQL procedure successfully completed. Elapsed: 00:00:04.15
Oracle 10g is roughly twice as fast with a simple task such as that above (repeated runs gave the same results). This is impressive given that there are no obvious re-write optimisations available on a variable calculation such as this. In the following example, we shall see the difference that re-writes can make to our timings. We'll run a large loop that contains only a constant assignment. Remember that optimisation level 2 in 10g will relocate constant assignments that appear inside loops.
This time, we'll start by running the block on 9i, again with timing on.
9i> DECLARE 2 c PLS_INTEGER; 3 BEGIN 4 FOR i IN 1 .. 100000000 LOOP 5 c := 1; 6 END LOOP; 7 END; 8 /
PL/SQL procedure successfully completed. Elapsed: 00:00:16.56
Now we can compare the performance of 10g for the same block. This is at optimisation level 2.
10g> DECLARE 2 c PLS_INTEGER; 3 BEGIN 4 FOR i IN 1 .. 100000000 LOOP 5 c := 1; 6 END LOOP; 7 END; 8 /
PL/SQL procedure successfully completed. Elapsed: 00:00:01.94
This time, we've seen some impressive performance gains for no effort (though admittedly this is quite a contrived example). Using the PL/SQL Profiler, we can easily demonstrate the effects of the optimisation. We will run the previous example at optimisation levels 1 and 2 but with 10,000 loop iterations (because the PL/SQL Profiler writes its trace to tables). We will begin with level 1 as follows.
10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 1;
Session altered.
10g> DECLARE 2 c PLS_INTEGER; 3 BEGIN 4 DBMS_PROFILER.START_PROFILER('OPTIMIZE LEVEL 1'); 5 FOR i IN 1 .. 10000 LOOP 6 c := 1; 7 END LOOP; 8 DBMS_PROFILER.STOP_PROFILER(); 9 END; 10 /
PL/SQL procedure successfully completed. Elapsed: 00:00:00.03
The PL/SQL Profiler report for this example is as follows (the SQL is provided in the download at the end of this article).
RUN_COMMENT UNIT_NAME LINE# TOTAL_OCCUR TOTAL_TIME -------------------- -------------- ---------- ----------- ---------------------- OPTIMIZE LEVEL 1 <anonymous> 5 10001 511829296 OPTIMIZE LEVEL 1 <anonymous> 6 10000 507050756 OPTIMIZE LEVEL 1 <anonymous> 8 1 826641 3 rows selected.
We can see that line 6 (the assignment) occurs 10,000 times so clearly has not been compiled out. We will repeat the example at level 2 below.
10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 2;
Session altered.
10g> DECLARE 2 c PLS_INTEGER; 3 BEGIN 4 DBMS_PROFILER.START_PROFILER('OPTIMIZE LEVEL 2'); 5 FOR i IN 1 .. 10000 LOOP 6 c := 1; 7 END LOOP; 8 DBMS_PROFILER.STOP_PROFILER(); 9 END; 10 /
PL/SQL procedure successfully completed. Elapsed: 00:00:00.01
The Profiler report for this anonymous block under level 2 optimisation is as follows.
RUN_COMMENT UNIT_NAME LINE# TOTAL_OCCUR TOTAL_TIME -------------------- -------------- ---------- ----------- ---------------------- OPTIMIZE LEVEL 2 <anonymous> 5 10000 468979157 OPTIMIZE LEVEL 2 <anonymous> 8 1 617676 2 rows selected.
At optimisation level 2, the static assignment at line 6 has been removed from the loop.
pl/sql optimisation and cursor-for-loops
For most of us, the PL/SQL we write is a mix of some computational logic and heavy data-access. In many legacy systems, cursor-for-loops prevail, despite the SQL mantra being practiced by many developers in recent years. An online article by Jonathan Lewis (see the link at the end of this article) describes a cursor-for-loop optimisation included in the 10g compiler. At level 2 (default), static cursor-for-loops will be re-written during compilation as bulk fetches with a limit of 100 elements. This is quite a significant optimisation that can give 15-20% performance improvements to existing code.
To see the benefits of this optimisation, we'll run a simple cursor-for-loop on 10g with SQL trace enabled. We'll run it first at level 0 and then again at level 2. We'll compare the wall-clocks and then examine the trace files for detail.
10g> exec DBMS_MONITOR.SESSION_TRACE_ENABLE;
PL/SQL procedure successfully completed.
10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 0;
Session altered.
10g> set timing on 10g> DECLARE 2 i PLS_INTEGER := 0; 3 BEGIN 4 FOR r IN ( SELECT * FROM dba_source level_0 ) LOOP 5 i := i + 1; 6 END LOOP; 7 END; 8 /
PL/SQL procedure successfully completed. Elapsed: 00:00:22.14
Optimisation at level 0 behaves the same as previous versions of Oracle. To loop through the records in DBA_SOURCE (nearly 290,000 records on the test database) took over 22 seconds. We can run the same example back at level 2 (default) as follows.
10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 2;
Session altered.
10g> DECLARE 2 i PLS_INTEGER := 0; 3 BEGIN 4 FOR r IN ( SELECT * FROM dba_source level_2 ) LOOP 5 i := i + 1; 6 END LOOP; 7 END; 8 /
PL/SQL procedure successfully completed. Elapsed: 00:00:03.71
This time the loop completes in less than 4 seconds for truly impressive performance gains. We'll format the trace files with tkprof for evidence of why this is the case (note from the zero cache misses that this was not the first time these PL/SQL blocks ran. This was deliberate to avoid "first run interference" in the timings).
******************************************************************************** SELECT * FROM DBA_SOURCE LEVEL_0 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 289739 5.54 6.82 4894 290593 0 289738 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 289741 5.54 6.82 4894 290593 0 289738 Misses in library cache during parse: 0 Optimizer mode: ALL_ROWS Parsing user id: 57 (recursive depth: 1) ******************************************************************************** SELECT * FROM DBA_SOURCE LEVEL_2 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2898 1.65 3.78 3711 8397 0 289738 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 2900 1.65 3.78 3711 8397 0 289738 Misses in library cache during parse: 0 Optimizer mode: ALL_ROWS Parsing user id: 57 (recursive depth: 1)
The statistics for level 2 demonstrate that Oracle is fetching in batches of 100. The CPU time (largely due to the reduced context switching) has dropped significantly.
There are two important points worth making about the cursor-for-loop optimisation. Firstly, it only applies to cursor-for-loops and not explicit "fetch loops". So PL/SQL of the following structure will not be optimised, nor will ref cursors (regardless of whether they are static or dynamic).
OPEN cursor_name; LOOP FETCH cursor_name INTO record_name; EXIT WHEN cursor_name%NOTFOUND; <<do some processing>> END LOOP; CLOSE cursor_name;
The second important point to note is that Oracle does not optimise any DML statements within the loop (i.e. it does not turn them into FORALL behind the scenes). We can demonstrate this quite easily. We'll create a copy of DBA_OBJECTS and re-run the previous example but insert each fetch into the demonstration table. We'll then look at the resulting trace file.
10g> CREATE TABLE t 2 NOLOGGING 3 AS 4 SELECT * 5 FROM dba_source 6 WHERE ROWNUM < 1;
Table created.
10g> exec DBMS_MONITOR.SESSION_TRACE_ENABLE;
PL/SQL procedure successfully completed.
10g> BEGIN 2 FOR r IN ( SELECT * FROM dba_source ) LOOP 3 INSERT INTO t VALUES r; 4 END LOOP; 5 END; 6 /
PL/SQL procedure successfully completed.
We can see from the following tkprof output that the fetches have been optimised as expected but the inserts have not.
******************************************************************************** SELECT * FROM DBA_SOURCE call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 0.01 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2898 1.89 6.03 5341 8397 0 289738 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 2900 1.90 6.05 5341 8397 0 289738 ******************************************************************************** INSERT INTO T VALUES (:B1 ,:B2 ,:B3 ,:B4 ,:B5 ) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 289738 12.90 24.33 2 5976 334593 289738 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 289739 12.90 24.33 2 5976 334593 289738
optimisation bugs
We have seen that the PL/SQL compiler in 10g provides good performance gains without any additional effort by us and that one of the methods Oracle uses to achieve this is code re-organisation. However, the fact that the compiler is re-organising our code can raise issues, especially when the algorithms we use are complex or when we have "hidden" some logic (we will see an example of this below). Adrian Christie has provided two good examples of PL/SQL optimiser bugs he has discovered and we will investigate these below. Note that these bugs are reproducible on both major releases of 10g but are fixed in 11g.
example one: collections and loops
The first issue we will see is a surprisingly simple one. We will extend a collection variable to ten elements and loop through it, trimming ("popping") one element on each iteration. We therefore expect ten loop iterations. We will first run the example at PL/SQL optimisation level 0 (i.e. no optimisation), as follows.
10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 0;
Session altered.
10g> DECLARE 2 TYPE element_ntt IS TABLE OF PLS_INTEGER; 3 v_elements element_ntt := element_ntt(); 4 BEGIN 5 v_elements.EXTEND(10); 6 FOR i IN 1 .. v_elements.COUNT() LOOP 7 DBMS_OUTPUT.PUT_LINE(v_elements.COUNT()); 8 v_elements.TRIM(); 9 END LOOP; 10 END; 11 /
10 9 8 7 6 5 4 3 2 1 PL/SQL procedure successfully completed.
The simple output demonstrates the expected behaviour and this is repeated if we test using level 1 optimisation. We will now run the same example under level 2 optimisation, as follows.
10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 2;
Session altered.
10g> DECLARE 2 TYPE element_ntt IS TABLE OF PLS_INTEGER; 3 v_elements element_ntt := element_ntt(); 4 BEGIN 5 v_elements.EXTEND(10); 6 FOR i IN 1 .. v_elements.COUNT() LOOP 7 DBMS_OUTPUT.PUT_LINE(v_elements.COUNT()); 8 v_elements.TRIM(); 9 END LOOP; 10 END; 11 /
10 9 8 7 6 5 PL/SQL procedure successfully completed.
Oracle has given a different result this time. We know from our tests and the documentation that Oracle will re-organise, relocate and optimise our code, but it's difficult to see how it has generated just five loop iterations. If we profile the examples at level 0 and 2 (using DBMS_PROFILER) and compare the results, we see the following output (the query is available in the accompanying download).
RUN_COMMENT LINE# L0_OCCUR L2_OCCUR -------------------- ---------- ---------- ---------- Level 0 only 3 1 Both levels 5 1 1 Both levels 6 11 7 Both levels 7 10 6 Both levels 8 10 6 Both levels 10 1 1 6 rows selected.
According to these results, the initialisation of the collection variable on line 3 has been removed under level 2 optimisation. We can also see fewer loop operations under level 2, which seems to indicate that the collection has been initialised with too few elements. One of the workarounds we can use for this is to capture the initial collection count into a variable and use the variable to constrain the loop, as follows.
10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 2;
Session altered.
10g> DECLARE 2 TYPE element_ntt IS TABLE OF PLS_INTEGER; 3 v_elements element_ntt := element_ntt(); 4 v_count PLS_INTEGER; 5 BEGIN 6 v_elements.EXTEND(10); 7 v_count := v_elements.COUNT; 8 FOR i IN 1 .. v_count LOOP 9 DBMS_OUTPUT.PUT_LINE(v_elements.COUNT()); 10 v_elements.TRIM(); 11 END LOOP; 12 END; 13 /
10 9 8 7 6 5 4 3 2 1 PL/SQL procedure successfully completed.
We can only speculate the cause from these results. Either the collection was initialised with too few elements, the collection assignment was relocated or the loop boundary was incorrectly computed. It is more likely to be related to the initialisation of the collection, with the introduction of the v_count variable preventing Oracle from re-organising code in such a way that it incorrectly generates five elements.
example two: global variables and functions with side effects
The second example is slightly more complex, but it illustrates a common technique used in PL/SQL programs; namely "control-break logic" in a loop. With control-break logic, we continually compare a current record's value with the previous record to see if it has changed. In the example below, we will use the following logic in a small loop:
- capture the current time (actually just the seconds);
- sleep for two seconds;
- capture the current time;
- compare the two captured times and output a message to indicate if they are the same or different.
This is a contrived yet simple example of control-break logic. We can see clearly from these steps that the second time we will capture will be two seconds greater than the first. However, the code we will use to perform this logic will have some surprises and complexity. We will run the example at level 0 optimisation below and follow with a description of the code.
10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 0;
Session altered.
10g> DECLARE 2 3 TYPE record_type IS RECORD 4 ( attribute PLS_INTEGER ); 5 6 g_global record_type; 7 8 -------------------------------------------------- 9 FUNCTION function_with_side_effects 10 RETURN PLS_INTEGER IS 11 BEGIN 12 g_global.attribute := EXTRACT(SECOND FROM SYSTIMESTAMP); 13 RETURN EXTRACT(SECOND FROM SYSTIMESTAMP); 14 END function_with_side_effects; 15 16 -------------------------------------------------- 17 PROCEDURE optimisable_procedure IS 18 19 v_last_time PLS_INTEGER := g_global.attribute; 20 21 BEGIN 22 23 FOR i IN 1 .. 5 LOOP 24 25 DBMS_LOCK.SLEEP(2); 26 27 DBMS_OUTPUT.PUT( 28 'Last time=[' || v_last_time || ']; ' || 29 'This time=[' || function_with_side_effects() || ']' 30 ); 31 32 IF v_last_time != g_global.attribute THEN 33 DBMS_OUTPUT.PUT_LINE('; The time has changed.'); 34 v_last_time := g_global.attribute; 35 ELSE 36 DBMS_OUTPUT.PUT_LINE('; The time has not changed.'); 37 END IF; 38 39 END LOOP; 40 41 END optimisable_procedure; 42 43 -------------------------------------------------- 44 45 BEGIN 46 g_global.attribute := EXTRACT(SECOND FROM SYSTIMESTAMP); 47 optimisable_procedure(); 48 END; 49 /
Last time=[50]; This time=[52]; The time has changed. Last time=[52]; This time=[54]; The time has changed. Last time=[54]; This time=[56]; The time has changed. Last time=[56]; This time=[58]; The time has changed. Last time=[58]; This time=[60]; The time has changed. PL/SQL procedure successfully completed.
The results are as we predicted. In every loop iteration, we follow a capture-sleep-capture pattern, so the times clearly differ by the sleep interval. Some points to note about the code used for this example are as follows.
- Lines 3-6: we use a global variable of a user-defined record type to store the captured seconds;
- Lines 9-14: we use a function to capture the time and return it. This is a function with side-effects because it modifies our state (i.e. the global variable);
- Line 19: we take a local copy of our global variable as a starting point. We use this local variable in our control-break logic (i.e. to compare the current value with the previous value to see if it has changed);
- Line 25: at the beginning of each loop iteration, we sleep for two seconds. This guarantees that the current and previous time values are always different (see the output for evidence);
- Line 29: in a seemingly innocent DBMS_OUTPUT call, we include a call to our function_with_side_effects. This sets our global variable to a new time of approximately two seconds greater than its previous value;
- Lines 32-37: we compare the current and previous time values (i.e. our control-break logic). As stated, these will be different on every loop iteration;
- Lines 46-47: we initialise the global variable and call the internal procedure.
The example is written as an anonymous block but uses forward declarations to simulate the structure of a package body. In general, the code is a contrived combination of inconsistent techniques and logic, but most developers can probably find equivalent examples in their legacy systems! For example, note the variety of ways in which the global variable is referenced or assigned and the repetitive calls to EXTRACT(SECOND FROM SYSTIMESTAMP). Of course, the point of this example is to show how "sloppy-programming" can hide the developer's intentions from the PL/SQL optimising compiler in such a way that it eventually makes a wrong decision. We will see this below as we repeat the PL/SQL block at optimisation level 2.
10g> ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL = 2;
Session altered.
10g> DECLARE 2 3 TYPE record_type IS RECORD 4 ( attribute PLS_INTEGER ); 5 6 g_global record_type; 7 8 -------------------------------------------------- 9 FUNCTION function_with_side_effects 10 RETURN PLS_INTEGER IS 11 BEGIN 12 g_global.attribute := EXTRACT(SECOND FROM SYSTIMESTAMP); 13 RETURN EXTRACT(SECOND FROM SYSTIMESTAMP); 14 END function_with_side_effects; 15 16 -------------------------------------------------- 17 PROCEDURE optimisable_procedure IS 18 19 v_last_time PLS_INTEGER := g_global.attribute; 20 21 BEGIN 22 23 FOR i IN 1 .. 5 LOOP 24 25 DBMS_LOCK.SLEEP(2); 26 27 DBMS_OUTPUT.PUT( 28 'Last time=[' || v_last_time || ']; ' || 29 'This time=[' || function_with_side_effects() || ']' 30 ); 31 32 IF v_last_time != g_global.attribute THEN 33 DBMS_OUTPUT.PUT_LINE('; The time has changed.'); 34 v_last_time := g_global.attribute; 35 ELSE 36 DBMS_OUTPUT.PUT_LINE('; The time has not changed.'); 37 END IF; 38 39 END LOOP; 40 41 END optimisable_procedure; 42 43 -------------------------------------------------- 44 45 BEGIN 46 g_global.attribute := EXTRACT(SECOND FROM SYSTIMESTAMP); 47 optimisable_procedure(); 48 END; 49 /
Last time=[47]; This time=[47]; The time has not changed. Last time=[49]; This time=[49]; The time has not changed. Last time=[51]; This time=[51]; The time has not changed. Last time=[53]; This time=[53]; The time has not changed. Last time=[55]; This time=[55]; The time has not changed. PL/SQL procedure successfully completed.
This time the results are clearly incorrect. Despite our intended sleep between the capture of the previous and current time, the results show that there is no difference between the two values. Our control-break logic is broken! On first glance at the results it appears as though the sleep call on line 25 has been factored out. We can compare PL/SQL Profiler results as we did for the first example and the results are as follows.
RUN_COMMENT LINE# L0_OCCUR L2_OCCUR -------------------- ---------- ---------- ---------- Both levels 12 5 5 Both levels 13 5 5 Both levels 14 5 5 Both levels 19 1 1 Both levels 23 6 6 Both levels 25 5 5 Both levels 27 5 5 Both levels 32 5 5 Level 0 only 33 5 Level 0 only 34 5 Level 2 only 36 5 Both levels 41 1 1 Both levels 46 1 1 Both levels 47 1 1 Both levels 48 1 1 15 rows selected.
According to the PL/SQL Profiler report, the only differences are in the control-break logic (due to the second run taking a different decision path). Obviously, the profiler data doesn't show the interpreted representation of our code, and in this case offers us no clues.
Incidentally, there are any number of ways that this bug can be avoided with the above example. For example, we can change the global record variable to a simple scalar variable, or we can change all direct assignments to/from the global variable to use a single consistent method. We can even move the initialisation of the global variable into the OPTIMISABLE procedure. In other words, do it properly!
Of course, this optimisation bug has serious implications for upgrade testing (although we would have to be extremely unlucky to hit it). It will probably be very difficult to spot incorrect behaviour due to optimisation bugs and might be the last place that developers will look (especially those who are unaware of this 10g feature). As ever, careful regression testing will be critical in any upgrade to 10g and lowering the default optimisation level is always an option. However, for those systems that are unaffected by this bug (probably most), the original premise of this article remains: that performance gains can be quite significant.
further reading
For more background to the re-development of the PL/SQL compiler, see these Oracle white papers. For Jonathan Lewis' article on cursor-for-loop optimisation, see DBAZine.
acknowledgements
Thanks to Adrian Christie for bringing the optimisation bugs to my attention and providing good test cases. I've adapted these for this article.
source code
The source code for the examples in this article can be downloaded from here.
Adrian Billington, November 2004 (updated March 2008)
Back to Top