pl/sql functions and cbo costing

Oracle developers have been creating PL/SQL functions ever since the first database release that supported them (circa Oracle version 7). Ever since we created our first stored (packaged) function, we have had to suffer a continual trade-off between the encapsulation of business rules and optimal performance. We have usually paid a performance penalty for good programming practices in one or more of the following ways:

context-switching between SQL and PL/SQL (when embedding SQL statements inside PL/SQL functions);
context-switching between SQL and PL/SQL (when calling stored functions from SQL queries);
subprogram invocation (when calling PL/SQL functions from other modules/packages);
query optimisation deficiencies (when calling PL/SQL functions in SQL predicates; i.e. functions in WHERE clauses), primarily due to the absence of useful statistics for the CBO to use.

Oracle has attempted to address these performance issues in several ways (especially in 11g with features such as the Cross-Session PL/SQL Function Result Cache and subprogram inlining). The last issue in the list has, however, been addressed for many releases of Oracle and we will see how in this article.

This article was inspired by a presentation given by Joze Senegacnik at the 2008 UKOUG Conference. In his presentation "Cost Based Optimizer Fallacies Caused By Erroneous Application Design", Joze described a number of sub-optimal application design decisions that affect the accuracy and effectiveness of the CBO. One of the techniques he described was the use of PL/SQL functions in query predicates, how this can affect the CBO's attempts to optimise a query and how it could be overcome using the under-used ASSOCIATE STATISTICS feature of Oracle. This feature is so important that I decided to write an article on it for oracle-developer.net. While this article expands on some of the content of Joze's presentation (and also includes some new material), it is impossible not to overlap in certain areas and for this reason, Joze has reviewed this article prior to publication.

pl/sql functions, selectivity and cardinality

We will begin by looking at how PL/SQL functions are handled by the CBO. To do this, we will need a sample table and function. We will start with the table, as follows.

SQL> CREATE TABLE thousand_rows
  2  AS
  3     SELECT ROWNUM           AS n1
  4     ,      RPAD('x',50,'x') AS v1
  5     FROM   dual
  6     CONNECT BY ROWNUM <= 1000;

Table created.

SQL> BEGIN
  2     DBMS_STATS.GATHER_TABLE_STATS(user, 'THOUSAND_ROWS');
  3  END;
  4  /

PL/SQL procedure successfully completed.

We now have a small table with 1,000 rows (and its statistics). Using 1,000 rows will make it easy to understand numbers such as selectivity (the percentage of rows satisfied by a predicate) and cardinality (the number of rows returned after applying a predicate) once we start to query it.

We will complete the setup with a very simple PL/SQL function, as follows.

SQL> CREATE FUNCTION some_function (
  2                  p_id IN INTEGER
  3                  ) RETURN INTEGER AS
  4  BEGIN
  5     RETURN 0;
  6  END some_function;
  7  /

Function created.

Our function takes a single parameter but returns a constant value. The function's return value is irrelevant at this stage because we only want to look at the CBO's estimates when we use the function in a query predicate.

We will now query our sample table and use our function in an equality predicate below. We will explain the query using Autotrace and also trace the CBO using the 10053 event that some developers will be familiar with.

SQL> set autotrace traceonly explain

SQL> ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';

Session altered.

SQL> SELECT *
  2  FROM   thousand_rows
  3  WHERE  some_function(n1) = 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 4012467810

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |    10 |   540 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| THOUSAND_ROWS |    10 |   540 |     5   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("SOME_FUNCTION"("N1")=0)

SQL> ALTER SESSION SET EVENTS '10053 trace name context off';

Session altered.

We can see a default selectivity of 1% that the optimiser uses when faced with a predicate of the form FUNCTION(column) = literal and there are no statistics to work with. For our particular query, this leads to an estimated cardinality of just 10 rows (1000 rows * 0.01 selectivity = 10 rows). The following excerpt from the 10053 trace file summarises the CBO's arithmetic for our simple query.

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: THOUSAND_ROWS  Alias: THOUSAND_ROWS
    #Rows: 1000  #Blks:  12  AvgRowLen:  54.00
Access path analysis for THOUSAND_ROWS
***************************************
SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for THOUSAND_ROWS[THOUSAND_ROWS] 
  No default cost defined for function SOME_FUNCTION
  No default selectivity defined for function SOME_FUNCTION
  Table: THOUSAND_ROWS  Alias: THOUSAND_ROWS
    Card: Original: 1000.000000  Rounded: 10  Computed: 10.00  Non Adjusted: 10.00
  Access Path: TableScan
    Cost:  5.15  Resp: 5.15  Degree: 0
      Cost_io: 5.00  Cost_cpu: 3285657
      Resp_io: 5.00  Resp_cpu: 3285657
  Best:: AccessPath: TableScan
         Cost: 5.15  Degree: 1  Resp: 5.15  Card: 10.00  Bytes: 0

***************************************

The summary line highlighted in blue (the line beginning "Card: Original..." for readers in black and white) shows the cardinality of 10 rows that results from the default selectivity the CBO used for our function-based predicate. What is more interesting in this output is the information highlighted in red (lines beginning "No default..."). These state that there are no default statistics (selectivity or cost) defined for our function. This gives us the first clue that we are able to supply statistics for our PL/SQL functions to the optimiser. Of course, this is essentially the subject of this article, but before we define what is meant by these trace entries, we will see some of the issues that arise when we use PL/SQL function-based predicates without any statistics.

pl/sql functions and the cbo

Most developers will be aware that without decent statistics, the CBO can't make good decisions about the order in which it accesses, joins or filters data. We usually ensure that we have representative statistics on our tables and indexes. We might also set statistics for our external tables and use dynamic sampling for global temporary tables. But most of us would not think about setting statistics for PL/SQL functions (or even be aware that we can do this). This can lead to some sub-optimal execution plans and we can demonstrate this quite simply below.

First we will create two functions (QUICK_FUNCTION and SLOW_FUNCTION), as follows.

SQL> CREATE FUNCTION quick_function (
  2                  p_id IN INTEGER
  3                  ) RETURN INTEGER AS
  4  BEGIN
  5     RETURN MOD(p_id, 1000);
  6  END quick_function;
  7  /

Function created.

SQL> CREATE FUNCTION slow_function (
  2                  p_id IN INTEGER
  3                  ) RETURN INTEGER AS
  4  BEGIN
  5     DBMS_LOCK.SLEEP(0.005);
  6     RETURN MOD(p_id, 2);
  7  END slow_function;
  8  /

Function created.

The quick function is a simple wrapper for the built-in MOD function. The slow function has a small sleep embedded in it to slow it down but also wraps the MOD function. Note that QUICK_FUNCTION will return the value 0 once every 1,000 rows, whereas SLOW_FUNCTION will do this every 2 rows. We will now use these functions in equality predicates against our THOUSAND_ROWS table, using autotrace to output a theoretical plan.

SQL> set autotrace traceonly

SQL> set timing on

SQL> SELECT *
  2  FROM   thousand_rows
  3  WHERE  quick_function(n1) = 0
  4  AND    slow_function(n1) = 0;

1 row selected.

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
Plan hash value: 4012467810

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |     1 |    54 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| THOUSAND_ROWS |     1 |    54 |     5   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("QUICK_FUNCTION"("N1")=0 AND "SLOW_FUNCTION"("N1")=0)

This query completes in just 0.03 seconds. Note the order of the predicates in both the query and the Predicate Information section of the execution plan. These match up. In fact, without any suitable statistics for our functions, the CBO will not re-order any of our function-based predicates. These will be executed in the order they appear in the WHERE clause. Fortunately for us, the above example applied the QUICK_FUNCTION predicate first (remember that this is the most selective in terms of the number of times per 1,000 rows it returns the value 0). The second filter predicate (using SLOW_FUNCTION) was only applied to the resultset after the first predicate (using QUICK_FUNCTION) was complete. This optimisation means that Oracle only had to execute SLOW_FUNCTION once for the entire query, simply because the QUICK_FUNCTION predicate filtered the resultset down to a single row first.

With this in mind, we will now re-order our predicates and execute the new query, as follows.

SQL> SELECT *
  2  FROM   thousand_rows
  3  WHERE  slow_function(n1) = 0
  4  AND    quick_function(n1) = 0;

1 row selected.

Elapsed: 00:00:15.64

Execution Plan
----------------------------------------------------------
Plan hash value: 4012467810

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |     1 |    54 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| THOUSAND_ROWS |     1 |    54 |     5   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("SLOW_FUNCTION"("N1")=0 AND "QUICK_FUNCTION"("N1")=0)

This time, our query executed in almost 16 seconds. This is simply because we coded the SLOW_FUNCTION predicate first and therefore the CBO followed its own rules and executed this first. This means that SLOW_FUNCTION was executed for every row in our THOUSAND_ROWS table as the filter predicate was applied. Of course, this example was deliberately biased to demonstrate the effects of predicate ordering. However, it highlights the fact that without any good statistical information on our PL/SQL functions, the CBO can't make any meaningful decisions on how to order table accesses or filters, as we can see above.

With this in mind, we will now examine the different ways in which we can influence the optimiser's decisions and the effects that these have on our execution plans.

default statistics and the extensible optimiser

An under-utilised SQL command is ASSOCIATE STATISTICS. This is combined with two little-known features of Oracle: default statistics and the Extensible Optimiser (part of Oracle's Data Cartridge feature). By using the ASSOCIATE STATISTICS SQL command (described later), both of these features enable us to supply statistical information (such as selectivity and cost) on our PL/SQL functions, packages and type methods to the CBO. In the following sections, we will look at both methods.

This article covers both the default statistics and Extensible Optimiser features, but only with respect to standalone PL/SQL functions. Between them, the features' collective coverage is much wider and includes, for example, packages, type methods, indextypes, domain indexes and user-defined statistics collection. For more information, see the Further Reading section at the end of this article.

default statistics

The first technique we can use to supply function statistics to the CBO is to set default selectivity and costs. We will look at each in turn below, starting with default selectivity.

default selectivity

We are able to set a fixed selectivity percentage for one or more of our functions. We saw evidence of this in our 10053 trace file earlier, when the optimiser looked for any defined defaults when costing our function-based predicate. To demonstrate how to set default selectivity and how it affects the CBO's decisions, we will return to our SLOW_FUNCTION and QUICK_FUNCTION examples.

Remember that QUICK_FUNCTION returned a value of 0 only once every 1,000 rows and that SLOW_FUNCTION returned 0 every 2 rows. Assuming we are querying with a predicate of the form SLOW/QUICK_FUNCTION(column) = 0, we can derive a default selectivity from this. In other words, QUICK_FUNCTION will have a selectivity of 0.1% (1/1000) and SLOW_FUNCTION will have a selectivity of 50% (1/2). We can set these defaults as follows:

SQL> ASSOCIATE STATISTICS WITH FUNCTIONS quick_function DEFAULT SELECTIVITY 0.1;

Statistics associated.

SQL> ASSOCIATE STATISTICS WITH FUNCTIONS slow_function DEFAULT SELECTIVITY 50;

Statistics associated.

As we can see, we associate statistics with our functions. Once we do this, the association is a known "object" and can be viewed in the data dictionary (described later). Using Autotrace, we will run some new queries to see the effects of the default selectivities. First, we'll flush any cached plans for our earlier examples because associating default statistics with a function will not invalidate any existing dependant cursors. We'll do this with a simple comment "trick" as follows, but there are several other ways to flush cached cursor and their plans with varying degrees of side-effects.

SQL> COMMENT ON TABLE thousand_rows IS '';

Comment created.

We are now ready to explain some queries. First we will use a single equality predicate for QUICK_FUNCTION, as follows.

SQL> set autotrace traceonly explain

SQL> SELECT *
  2  FROM   thousand_rows
  3  WHERE  quick_function(n1) = 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 4012467810

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |     1 |    54 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| THOUSAND_ROWS |     1 |    54 |     5   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("QUICK_FUNCTION"("N1")=0)

We can see that Oracle has correctly estimated a single row for cardinality (1/1000), using our selectivity of 0.1% (remember that in the absence of statistics, the CBO's default for FUNCTION(column) = literal is 1%, which in this case would have resulted in a cardinality of 10). We will try the same query but with SLOW_FUNCTION in the predicate, as follows.

SQL> SELECT *
  2  FROM   thousand_rows
  3  WHERE  slow_function(n1) = 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 4012467810

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |   500 | 27000 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| THOUSAND_ROWS |   500 | 27000 |     5   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("SLOW_FUNCTION"("N1")=0)

As we can see, the 50% selectivity has been used by the optimiser and we have an accurate cardinality of 500 rows. The CBO now has some information to work with!

Of course, the key test is to determine what happens when we combine predicates. We saw earlier that in the absence of statistics, the CBO evaluates function-based predicates according to the order in which they appear in the WHERE clause and that this can have an impact on our queries' performance. Now that we have default selectivity statistics, we can repeat our slower query as follows.

SQL> set autotrace traceonly

SQL> set timing on

SQL> SELECT *
  2  FROM   thousand_rows
  3  WHERE  slow_function(n1) = 0
  4  AND    quick_function(n1) = 0;

1 row selected.

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
Plan hash value: 4012467810

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |     1 |    54 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| THOUSAND_ROWS |     1 |    54 |     5   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("QUICK_FUNCTION"("N1")=0 AND "SLOW_FUNCTION"("N1")=0)

This time, the optimiser has re-ordered our predicates to lead with the QUICK_FUNCTION filter, based on the information we supplied on selectivity. This clearly demonstrates the importance of supplying suitable information to the CBO for the various objects we reference in our queries.

Note that to remove the statistics from one or more of our functions, we use the DISASSOCIATE STATISTICS command, as follows.

SQL> DISASSOCIATE STATISTICS FROM FUNCTIONS quick_function, slow_function;

Statistics disassociated.

default cost

In addition to default selectivity, we are also able to supply a default cost for a single execution of a function. The default comprises the following costs:

CPU cost;
I/O cost;
Network cost (although this isn't implemented as of 11.1.0.7).

By providing default costs, we should expect the CBO to behave in similar way to that which we saw for default selectivity. Default costs will work either with or without default selectivity, but the more information we can provide, the more accurate the CBO can be.

To demonstrate the effect of default costs in isolation, we will create two functions below: HIGH_CPU_IO and LOW_CPU_IO. As their respective names suggest, one function will use a lot of CPU and I/O during its execution and the other will use very little.

SQL> CREATE FUNCTION high_cpu_io (
  2                  p_id IN INTEGER
  3                  ) RETURN INTEGER AS
  4  BEGIN
  5     FOR r IN (SELECT n1, v1 FROM thousand_rows) LOOP
  6        r.v1 := TO_CHAR(MOD(p_id, DBMS_RANDOM.RANDOM));
  7     END LOOP;
  8     RETURN MOD(p_id, 10);
  9  END high_cpu_io;
 10  /

Function created.

SQL> CREATE FUNCTION low_cpu_io (
  2                  p_id IN INTEGER
  3                  ) RETURN INTEGER AS
  4  BEGIN
  5     RETURN p_id;
  6  END low_cpu_io;
  7  /

Function created.

To begin, we will set some arbitrary defaults for CPU and I/O for our functions as follows.

SQL> ASSOCIATE STATISTICS WITH FUNCTIONS high_cpu_io DEFAULT COST (10000, 1000, 0);

Statistics associated.

SQL> ASSOCIATE STATISTICS WITH FUNCTIONS low_cpu_io DEFAULT COST (100, 0, 0);

Statistics associated.

Note that the ordering of the costs in our association is CPU, I/O and network (we have used 0 for network as it is not currently used by the CBO). We have used a default I/O value of 1,000 for the HIGH_CPU_IO function (we are associating a cost of 1 LIO per row). The LOW_CPU_IO generates no I/O, hence we have used default I/O cost of 0. In the following queries (explained with Autotrace), we will first use each function independently to see their respective costs and then combine them to see predicate re-ordering in operation.

SQL> set autotrace traceonly explain

SQL> SELECT *
  2  FROM   thousand_rows
  3  WHERE  high_cpu_io(n1) = 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 4012467810

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |    10 |   540 |  1000K  (0)| 03:20:01 |
|*  1 |  TABLE ACCESS FULL| THOUSAND_ROWS |    10 |   540 |  1000K  (0)| 03:20:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("HIGH_CPU_IO"("N1")=0)

The cost of using HIGH_CPU_IO in a simple equality predicate is 1,000,000. Note that without statistics, the CBO would estimate this cost at a value of just 5. The cost of using LOW_CPU_IO is as follows.

SQL> SELECT *
  2  FROM   thousand_rows
  3  WHERE  low_cpu_io(n1) = 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 4012467810

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |    10 |   540 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| THOUSAND_ROWS |    10 |   540 |     5   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("LOW_CPU_IO"("N1")=0)

With a cost of 5, this is the same as the CBO's default (the output is not shown above, but without statistics, the cost of either function-based predicate is also estimated to be just 5). In the following query, we will use both functions but order the predicates with the "expensive" HIGH_CPU_IO predicate being referenced first.

SQL> SELECT *
  2  FROM   thousand_rows
  3  WHERE  high_cpu_io(n1) = 0
  4  AND    low_cpu_io(n1) = 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 4012467810

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |     1 |    54 | 10005   (0)| 00:02:01 |
|*  1 |  TABLE ACCESS FULL| THOUSAND_ROWS |     1 |    54 | 10005   (0)| 00:02:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("LOW_CPU_IO"("N1")=0 AND "HIGH_CPU_IO"("N1")=0)

As we saw with default selectivity, the CBO has re-ordered our predicates, according to the information we gave it on the costs of executing our functions.

setting meaningful defaults

In the previous examples, we used meaningless defaults for CPU and I/O costs (simply to demonstrate the effects of default costs). We can be far more accurate by tracing the execution of each of our functions, recording the I/O costs and converting the time into CPU costs. There are several ways to trace the execution statistics of a database operation (such as SQL Trace or Autotrace). We will use the oracle-developer.net Mystats utility. We will start by taking a snapshot of session statistics, as follows.

SQL> exec mystats_pkg.ms_start;

PL/SQL procedure successfully completed.

We will now run 100 executions of the HIGH_CPU_IO function, as follows.

SQL> DECLARE
  2     n INTEGER;
  3  BEGIN
  4     FOR i IN 1 .. 100 LOOP
  5        n := high_cpu_io(i);
  6     END LOOP;
  7  END;
  8  /

PL/SQL procedure successfully completed.

With the Mystats utility, we can output statistics for a given set of counters. In this case, we will use consistent gets and db block gets as good indicators of the I/O performed by the function, as follows.

SQL> BEGIN
  2     mystats_pkg.ms_stop(
  3        mystats_pkg.statname_ntt('consistent gets','db block gets'));
  4  END;
  5  /

------------------------------------------------------------------------------
Session statistics report : 24-FEB-2009 07:59:49
Snapshot interval         : .69 seconds


Statistic Name                                                           Value
------------------------------------------------------------  ----------------
STAT..db block gets                                                          0
STAT..consistent gets                                                    2,106

PL/SQL procedure successfully completed.

This report tells us that 100 executions of the HIGH_CPU_IO function performs over 2,100 consistent gets (an average of 21 per execution). Note that we could flush the buffer cache prior to measuring our function's resource usage, but in this example it is unlikely to make much difference. We will repeat the operation for the LOW_CPU_IO function, as follows.

SQL> exec mystats_pkg.ms_start;

PL/SQL procedure successfully completed.

SQL> DECLARE
  2     n INTEGER;
  3  BEGIN
  4     FOR i IN 1 .. 100 LOOP
  5        n := low_cpu_io(i);
  6     END LOOP;
  7  END;
  8  /

PL/SQL procedure successfully completed.

SQL> BEGIN
  2     mystats_pkg.ms_stop(
  3        mystats_pkg.statname_ntt('consistent gets','db block gets'));
  4  END;
  5  /

------------------------------------------------------------------------------
Session statistics report : 24-FEB-2009 07:59:49
Snapshot interval         : .01 seconds


Statistic Name                                                           Value
------------------------------------------------------------  ----------------
STAT..db block gets                                                          0
STAT..consistent gets                                                        6

PL/SQL procedure successfully completed.

We now have a good approximation for the I/O costs of our two functions. To calculate the CPU costs, we use the DBMS_ODCI.ESTIMATE_CPU_UNITS function, as follows.

SQL> DECLARE
  2     n NUMBER;
  3  BEGIN
  4     n := 1000 * DBMS_ODCI.ESTIMATE_CPU_UNITS(&time);
  5     DBMS_OUTPUT.PUT_LINE(ROUND(n,0));
  6  END;
  7  /

Enter value for time: 0.0069
6747773

PL/SQL procedure successfully completed.

Note that we have supplied the timing for one execution of our HIGH_CPU_IO function (0.69/100) and multiplied the result by 1,000. We will repeat the call for LOW_CPU_IO as follows.

SQL> /

Enter value for time: 0.0001
97794

PL/SQL procedure successfully completed.

We now have some good estimates for I/O and CPU costs, so we will disassociate our original defaults and associate the new values, as follows.

SQL> DISASSOCIATE STATISTICS FROM FUNCTIONS high_cpu_io, low_cpu_io;

Statistics disassociated.

SQL> ASSOCIATE STATISTICS WITH FUNCTIONS high_cpu_io DEFAULT COST (6747773, 21, 0);

Statistics associated.

SQL> ASSOCIATE STATISTICS WITH FUNCTIONS low_cpu_io DEFAULT COST (97794, 0, 0);

Statistics associated.

To test the new default costs, we will flush any previous cursors and repeat our query that leads with the "expensive" predicate.

SQL> COMMENT ON TABLE thousand_rows IS '';

Comment created.

SQL> set autotrace traceonly explain

SQL> SELECT *
  2  FROM   thousand_rows
  3  WHERE  high_cpu_io(n1) = 0
  4  AND    low_cpu_io(n1) = 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 4012467810

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |     1 |    54 |   222   (4)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| THOUSAND_ROWS |     1 |    54 |   222   (4)| 00:00:03 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("LOW_CPU_IO"("N1")=0 AND "HIGH_CPU_IO"("N1")=0)

As before, the CBO has re-ordered the predicates to evaluate the "cheaper" function first.

default cost, selectivity and joins

So far, we have seen some simple examples of how the optimiser treats function-based predicates, both with and without default statistics (we also treated selectivity and cost independently). We are more likely to be writing complex queries that involve multiple tables, however, and from what we have seen so far, we can easily imagine the importance of function statistics. In the following examples, we will see the effect of default statistics on a query that joins three tables and includes a single function-based predicate. For this we require an example function, which we will create as follows.

SQL> CREATE FUNCTION promo_function(
  2                  p_promo_category IN VARCHAR2
  3                  ) RETURN VARCHAR2 IS
  4  BEGIN
  5     RETURN UPPER(p_promo_category);
  6  END promo_function;
  7  /

Function created.

This function simply returns the value of the parameter in upper-case. There is no database I/O or indeed anything intensive about this function. We will use it in a predicate for the following query (this is based on tables in the supplied SH schema).

SQL> set autotrace traceonly explain

SQL> SELECT *
  2  FROM   sh.sales       s
  3  ,      sh.promotions  p
  4  ,      sh.times       t
  5  WHERE  s.promo_id = p.promo_id
  6  AND    s.time_id = t.time_id
  7  AND    t.time_id BETWEEN DATE '2000-01-01' AND DATE '2000-03-31'
  8  AND    promo_function(p.promo_category) = 'TV';

Execution Plan
----------------------------------------------------------
Plan hash value: 4019800577

-------------------------------------------------------------------------------- ...
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| ...
-------------------------------------------------------------------------------- ...
|   0 | SELECT STATEMENT             |            | 51913 |    15M|    62   (5)| ...
|*  1 |  HASH JOIN                   |            | 51913 |    15M|    62   (5)| ...
|   2 |   TABLE ACCESS BY INDEX ROWID| TIMES      |   182 | 35854 |     8   (0)| ...
|*  3 |    INDEX RANGE SCAN          | TIMES_PK   |   182 |       |     2   (0)| ...
|*  4 |   HASH JOIN                  |            | 52142 |  6415K|    53   (4)| ...
|*  5 |    TABLE ACCESS FULL         | PROMOTIONS |     5 |   485 |    17   (0)| ...
|   6 |    PARTITION RANGE SINGLE    |            | 62197 |  1761K|    35   (3)| ...
|*  7 |     TABLE ACCESS FULL        | SALES      | 62197 |  1761K|    35   (3)| ...
-------------------------------------------------------------------------------- ...

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("S"."TIME_ID"="T"."TIME_ID")
   3 - access("T"."TIME_ID">=TO_DATE(' 2000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "T"."TIME_ID"<=TO_DATE(' 2000-03-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   4 - access("S"."PROMO_ID"="P"."PROMO_ID")
   5 - filter("PROMO_FUNCTION"("P"."PROMO_CATEGORY")='TV')
   7 - filter("S"."TIME_ID"<=TO_DATE(' 2000-03-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

This query joins SALES, PROMOTIONS and TIMES for a specified date-range and promotion category. Rather than supply a predicate based on the PROMOTIONS table itself, we have contrived a predicate that uses PROMO_FUNCTION instead (of the form FUNCTION(column) = literal). Without any function statistics, Explain Plan shows us that the TIMES table will probably be accessed first, with the PROMOTIONS table being joined directly to SALES.

Without statistics, the CBO will use a 1% selectivity for our function-based predicate. Knowing that we can provide function statistics directly to the optimiser, we will calculate a better value for selectivity, as follows.

SQL> SELECT promo_category
  2  ,      ROUND(RATIO_TO_REPORT(c) OVER () * 100, 1) AS selectivity
  3  FROM  (
  4         SELECT promo_category
  5         ,      COUNT(*) AS c
  6         FROM   sh.promotions
  7         GROUP  BY
  8                promo_category
  9        )
 10  ORDER  BY
 11         2;

PROMO_CATEGORY                 SELECTIVITY
------------------------------ -----------
NO PROMOTION                            .2
ad news                                 .4
radio                                    4
flyer                                    8
post                                  12.9
magazine                              12.9
internet                              16.9
newspaper                             21.9
TV                                    22.9

9 rows selected.

We can see that the TV category accounts for nearly 23% of all promotions (for the sake of simplicity, we've not considered how this applies to SALES data), so we will now associate this value with PROMO_FUNCTION. We will also supply an arbitrary default cost based on the small amount of work that the function performs. We can associate all default costs in one call, as follows.

SQL> ASSOCIATE STATISTICS WITH FUNCTIONS promo_function
  2     DEFAULT COST (100000, 0, 0)
  3     DEFAULT SELECTIVITY 23;

Statistics associated.

To see the effect of these new statistics, we will flush the original cursor and repeat the query using Autotrace/Explain Plan, as follows.

SQL> COMMENT ON TABLE sh.promotions IS '';

Comment created.

SQL> set autotrace traceonly explain

SQL> SELECT *
  2  FROM   sh.sales       s
  3  ,      sh.promotions  p
  4  ,      sh.times       t
  5  WHERE  s.promo_id = p.promo_id
  6  AND    s.time_id = t.time_id
  7  AND    t.time_id BETWEEN DATE '2000-01-01' AND DATE '2000-03-31'
  8  AND    promo_function(p.promo_category) = 'TV';

Execution Plan
----------------------------------------------------------
Plan hash value: 2769270327

--------------------------------------------------------------------------------- ...
| Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| ...
--------------------------------------------------------------------------------- ...
|   0 | SELECT STATEMENT              |            | 61759 |    19M|    64   (8)| ...
|*  1 |  HASH JOIN                    |            | 61759 |    19M|    64   (8)| ...
|*  2 |   TABLE ACCESS FULL           | PROMOTIONS |   116 | 11252 |    19  (11)| ...
|*  3 |   HASH JOIN                   |            | 61924 |    13M|    44   (5)| ...
|   4 |    TABLE ACCESS BY INDEX ROWID| TIMES      |   182 | 35854 |     8   (0)| ...
|*  5 |     INDEX RANGE SCAN          | TIMES_PK   |   182 |       |     2   (0)| ...
|   6 |    PARTITION RANGE SINGLE     |            | 62197 |  1761K|    35   (3)| ...
|*  7 |     TABLE ACCESS FULL         | SALES      | 62197 |  1761K|    35   (3)| ...
--------------------------------------------------------------------------------- ...

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("S"."PROMO_ID"="P"."PROMO_ID")
   2 - filter("PROMO_FUNCTION"("P"."PROMO_CATEGORY")='TV')
   3 - access("S"."TIME_ID"="T"."TIME_ID")
   5 - access("T"."TIME_ID">=TO_DATE(' 2000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "T"."TIME_ID"<=TO_DATE(' 2000-03-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   7 - filter("S"."TIME_ID"<=TO_DATE(' 2000-03-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Given the selectivity and cost statistics for PROMO_FUNCTION, the CBO has changed the join order of our three-table query. This time, the PROMOTIONS table is accessed first, while the TIMES and SALES tables are joined together. As join orders and access types are critical to query performance, this clearly shows the importance of function statistics in scenarios such as this. Of course, the defaults we used above only apply to the 'TV' predicate, so might not have the desired effect for other predicates. We therefore require greater flexibility in setting function statistics and in the next section we will see how the Extensible Optimiser provides it.

user-defined statistics and the extensible optimiser

By now we have seen several examples (and the benefits) of using default statistics to provide more information to the CBO. One issue with default statistics, however, is that they are fixed and cannot respond to the changing selectivities and costs that arise with different invocations of their associated functions. This is where the second method of associating statistics comes into play: namely, the Extensible Optimiser (part of Oracle's Data Cartridge feature).

Using a set of well-defined interfaces (prescribed by Oracle), the Extensible Optimiser enables us to design our own cost and selectivity calculations and then associate these with our PL/SQL functions, packages or type methods. The calculations are stored in methods of an object type (known as an interface type) and are invoked by the CBO during query optimisation. In other words, the CBO will execute the methods of the interface type to get the costs and/or selectivity of a new function-based predicate.

using the extensible optimiser

To use the Extensible Optimiser for one or more of our functions (or packages/type methods), we need to do the following:

create an interface type that contains methods to calculate cost and selectivity for our function (or package or type method);
associate the statistics interface type with the function (or package or type method).

In our final example for default statistics above, we created a function named PROMO_FUNCTION, so we will use this. Remember that this accepted a promotion category and simply returned it in uppercase. Using the 'TV' category, we fixed a default selectivity of 23% based on our analysis of the data in the PROMOTIONS table. We will now replace the default statistics with an interface type to more accurately calculate costs and selectivities for different promotion categories.

creating an interface type

As stated, we already have PROMO_FUNCTION, so we will create a statistics type to associate it with. We will begin by creating the interface object type specification, as follows.

SQL> CREATE TYPE promo_stats_ot AS OBJECT (
  2
  3     dummy_attribute NUMBER,
  4
  5     STATIC FUNCTION ODCIGetInterfaces (
  6                     p_interfaces OUT SYS.ODCIObjectList
  7                     ) RETURN NUMBER,
  8
  9     STATIC FUNCTION ODCIStatsSelectivity (
 10                     p_pred_info      IN  SYS.ODCIPredInfo,
 11                     p_selectivity    OUT NUMBER,
 12                     p_args           IN  SYS.ODCIArgDescList,
 13                     p_start          IN  VARCHAR2,
 14                     p_stop           IN  VARCHAR2,
 15                     p_promo_category IN  VARCHAR2,
 16                     p_env            IN  SYS.ODCIEnv
 17                     ) RETURN NUMBER,
 18
 19     STATIC FUNCTION ODCIStatsFunctionCost (
 20                     p_func_info      IN  SYS.ODCIFuncInfo,
 21                     p_cost           OUT SYS.ODCICost,
 22                     p_args           IN  SYS.ODCIArgDescList,
 23                     p_promo_category IN  VARCHAR2,
 24                     p_env            IN  SYS.ODCIEnv
 25                     ) RETURN NUMBER
 26  );
 27  /

Type created.

Some important points to note about this are as follows:

Line 3: object types must have at least one attribute, even if it is not required. We don't need an attribute for our statistics type, hence we have named it DUMMY_ATTRIBUTE;
Lines 5, 9, 19: these method names are prescribed by Oracle and we must use them. There are a number of different methods that we can use for the Extensible Optimiser. In our case, we are only using methods for selectivity and function costing. The ODCIGetInterfaces method is mandatory for all interface types for the Extensible Optimiser;
Lines 10-16: the parameter positions and types for ODCIStatsSelectivity are prescribed by Oracle. There is one exception to this. Note the highlighted line between the "stop" and "env" parameters (line 15). Here we must include all of the parameters of our associated PL/SQL function(s). In the case of PROMO_FUNCTION, the only parameter is p_promo_category, which we have added to ODCIStatsSelectivity above. Interface type methods can cater for more than one user-parameter if required;
Lines 20-24: the same rules apply for ODCIStatsFunctionCost; namely the parameter positions and types are mandatory and we include the associated function parameters before the "env" parameter.

calculating cost and selectivity in the interface type body

The interface type body is where we code our algorithms for PROMO_FUNCTION's selectivity and cost. We will display each method separately and explain the logic in each as we see it.

SQL> CREATE TYPE BODY promo_stats_ot AS
  2
  3     STATIC FUNCTION ODCIGetInterfaces (
  4                     p_interfaces OUT SYS.ODCIObjectList
  5                     ) RETURN NUMBER IS
  6     BEGIN
  7        p_interfaces := SYS.ODCIObjectList(
  8                           SYS.ODCIObject ('SYS', 'ODCISTATS2')
  9                           );
 10        RETURN ODCIConst.success;
 11     END ODCIGetInterfaces;

As stated earlier, the ODCIGetInterfaces method is mandatory and so is its implementation, as shown. We'll not describe it in any detail other than to note that the 'ODCISTATS2' value is only used in post-8i databases.

The remaining methods are where we include our own algorithms. Our selectivity method is as follows.

 13     STATIC FUNCTION ODCIStatsSelectivity (
 14                     p_pred_info        IN  SYS.ODCIPredInfo,
 15                     p_selectivity      OUT NUMBER,
 16                     p_args             IN  SYS.ODCIArgDescList,
 17                     p_start            IN  VARCHAR2,
 18                     p_stop             IN  VARCHAR2,
 19                     p_promo_category   IN  VARCHAR2,
 20                     p_env              IN  SYS.ODCIEnv
 21                     ) RETURN NUMBER IS
 22     BEGIN
 23
 24        /* Calculate selectivity of predicate... */
 25        SELECT (COUNT(CASE
 26                         WHEN UPPER(promo_category) = p_start
 27                         THEN 0
 28                      END) / COUNT(*)) * 100 AS selectivity
 29        INTO   p_selectivity
 30        FROM   sh.promotions;
 31
 32        RETURN ODCIConst.success;
 33     END ODCIStatsSelectivity;

This is a simple calculation of selectivity, based on a lookup of the underlying PROMOTIONS data. We calculate selectivity for a given promotion category and pass it out in the p_selectivity parameter. Of course, if we had a histogram on the PROMO_CATEGORY column, we would probably be better off using that in our selectivity method instead. The selectivities would have already been calculated by Oracle and it would be a simple case of looking them up in the relevant histogram view.

Note the use of the highlighted p_start parameter in the CASE expression on line 26. One of the most interesting features of the Extensible Optimiser is that the interface methods can access some of the underlying query predicate values. For example, if we have a query predicate such as WHERE promo_function(promo_category) = 'TV', the "start" and "stop" parameters will both contain 'TV'. In a range-based predicate, the "start" and "stop" parameters in the selectivity method will hold the lower and upper bounds of the given range.

The function cost method is slightly more complicated, as follows.

 35     STATIC FUNCTION ODCIStatsFunctionCost (
 36                     p_func_info      IN  SYS.ODCIFuncInfo,
 37                     p_cost           OUT SYS.ODCICost,
 38                     p_args           IN  SYS.ODCIArgDescList,
 39                     p_promo_category IN  VARCHAR2,
 40                     p_env            IN  SYS.ODCIEnv
 41                     ) RETURN NUMBER IS
 42
 43        aa_io   DBMS_SQL.NUMBER_TABLE;
 44        aa_ela  DBMS_SQL.NUMBER_TABLE;
 45        v_dummy VARCHAR2(100);
 46
 47        FUNCTION snap_io RETURN NUMBER IS
 48           v_io NUMBER;
 49        BEGIN
 50           SELECT SUM(ss.value) INTO v_io
 51           FROM   v$sesstat ss
 52           ,      v$statname sn
 53           WHERE  ss.statistic# = sn.statistic#
 54           AND    sn.name IN ('db block gets','consistent gets');
 55           RETURN v_io;
 56        END snap_io;
 57
 58     BEGIN
 59        p_cost := SYS.ODCICost(NULL, NULL, NULL, NULL);
 60
 61        /* Snap a sample execution of the function... */
 62        aa_io(1) := snap_io;
 63        aa_ela(1) := DBMS_UTILITY.GET_TIME;
 64        v_dummy := promo_function('tv');
 65        aa_ela(2) := DBMS_UTILITY.GET_TIME;
 66        aa_io(2) := snap_io;
 67
 68        /* Calculate costs from snaps... */
 69        p_cost.CPUCost := 1000 * DBMS_ODCI.ESTIMATE_CPU_UNITS(
 70                                    (aa_ela(2) - aa_ela(1)) / 100);
 71        p_cost.IOCost := aa_io(2) - aa_io(1);
 72        p_cost.NetworkCost := 0;
 73
 74        RETURN ODCIConst.success;
 75     END ODCIStatsFunctionCost;
 76
 77  END;
 78  /

Type body created.

We can use any algorithms we like for function cost calculation, but in this case we are executing PROMO_FUNCTION once (line 64) and capturing two pieces of information:

the I/O for a dummy execution of PROMO_FUNCTION;
the elapsed time, which we convert into CPU units using DBMS_ODCI as before.

That's all we need to do to create the interface type. As most of the implementation is well-defined by Oracle, we only need to fill in the gaps by adding our target function's (or functions') parameters and our algorithms. We will now use this type below.

associating the interface type

Our final task is to associate the statistics type with our function. For this we use the ASSOCIATE STATISTICS SQL statement as before, but first we must disassociate the default statistics we already have for PROMO_FUNCTION, as follows.

SQL> DISASSOCIATE STATISTICS FROM FUNCTIONS promo_function;

Statistics disassociated.

We will now associate our PROMO_STATS_OT interface type with PROMO_FUNCTION, as follows.

SQL> ASSOCIATE STATISTICS WITH FUNCTIONS promo_function USING promo_stats_ot;

Statistics associated.

Note that this is a slightly different syntax to our default statistics examples (i.e. USING instead of DEFAULT).

testing the interface type

To demonstrate the dynamic nature of the interface type method, we will explain some queries with different function predicates and view their execution plans using Autotrace. We will also trace the CBO with the 10053 event. Each query will be optimised by the CBO, which means that the interface type methods will be executed for each variation of the predicate.

SQL> ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';

Session altered.

SQL> set autotrace traceonly explain

SQL> SELECT *
  2  FROM   sh.promotions
  3  WHERE  promo_function(promo_category) = 'TV';

Execution Plan
----------------------------------------------------------
Plan hash value: 4106015420

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |   115 | 11155 |  1526   (0)| 00:00:19 |
|*  1 |  TABLE ACCESS FULL| PROMOTIONS |   115 | 11155 |  1526   (0)| 00:00:19 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROMO_FUNCTION"("PROMO_CATEGORY")='TV')

We can see that the CBO has the correct cardinality for our function-based predicate (this is exactly the number of rows for the 'TV' category). We will explain a query with a new predicate, as follows.

SQL> SELECT *
  2  FROM   sh.promotions
  3  WHERE  promo_function(promo_category) = 'RADIO';

Execution Plan
----------------------------------------------------------
Plan hash value: 4106015420

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |    20 |  1940 |    17   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| PROMOTIONS |    20 |  1940 |    17   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROMO_FUNCTION"("PROMO_CATEGORY")='RADIO')

This time, the CBO (or rather, our interface type) has correctly calculated a cardinality of 20 rows (and a lower cost associated with fewer rows). This clearly shows the benefit of using the Extensible Optimiser to respond to different data patterns and predicates. If we examine the 10053 trace file, we can see the actual calls that the CBO made to our interface type methods during the optimisation phase. We can also see the values of our predicate literals and the final cost and selectivity calculations passed out by our methods. The following excerpt is from the optimisation of our first query, searching on the literal 'TV'.

SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for PROMOTIONS[PROMOTIONS] 
  Calling user-defined function cost function...
    predicate: "SCOTT"."PROMO_FUNCTION"("PROMOTIONS"."PROMO_CATEGORY")
  declare 
     cost sys.ODCICost := sys.ODCICost(NULL, NULL, NULL, NULL); 
     arg0 VARCHAR2(1) := null;
 
    begin 
      :1 := "SCOTT"."PROMO_STATS_OT".ODCIStatsFunctionCost(
                     sys.ODCIFuncInfo('SCOTT', 
                            'PROMO_FUNCTION', 
                            NULL, 
                            1), 
                     cost, 
                     sys.ODCIARGDESCLIST(sys.ODCIARGDESC(2, 'PROMOTIONS', 'SH', '"PROMO_CATEGORY"', NULL, NULL, NULL)) 
                     , arg0, 
                     sys.ODCIENV(:5,:6,:7,:8)); 
      if cost.CPUCost IS NULL then 
        :2 := -1.0; 
      else 
        :2 := cost.CPUCost; 
      end if; 
      if cost.IOCost IS NULL then 
        :3 := -1.0; 
      else 
        :3 := cost.IOCost; 
      end if; 
      if cost.NetworkCost IS NULL then 
        :4 := -1.0; 
      else 
        :4 := cost.NetworkCost; 
      end if; 
      exception 
        when others then 
          raise; 
    end;
ODCIEnv Bind :5 Value 0
ODCIEnv Bind :6 Value 0
ODCIEnv Bind :7 Value 0
ODCIEnv Bind :8 Value 5

*** 2009-03-05 18:08:50.875
  SCOTT.PROMO_STATS_OT.ODCIStatsFunctionCost returned:
    CPUCost      : 0
    IOCost       : 3
    NetworkCost  : 0
  Calling user-defined selectivity function...
    predicate: "SCOTT"."PROMO_FUNCTION"("PROMOTIONS"."PROMO_CATEGORY")='TV'
  declare 
     sel number; 
     arg0 VARCHAR2(1) := null;
 
    begin 
      :1 := "SCOTT"."PROMO_STATS_OT".ODCIStatsSelectivity(
                     sys.ODCIPREDINFO('SCOTT', 
                            'PROMO_FUNCTION', 
                            NULL, 
                            29), 
                     sel, 
                     sys.ODCIARGDESCLIST(sys.ODCIARGDESC(3, NULL, NULL, NULL, NULL, NULL, NULL), sys.ODCIARGDESC(3, NULL, NULL, NULL, NULL, NULL, NULL), sys.ODCIARGDESC(2, 'PROMOTIONS', 'SH', '"PROMO_CATEGORY"', NULL, NULL, NULL)), 
                     :3, 
                     :4 
                     , arg0, 
                     sys.ODCIENV(:5,:6,:7,:8)); 
      if sel IS NULL then 
        :2 := -1.0; 
      else 
        :2 := sel; 
      end if; 
      exception 
        when others then 
          raise; 
    end;
Bind :3 Value 'TV'
Bind :4 Value 'TV'
ODCIEnv Bind :5 Value 0
ODCIEnv Bind :6 Value 0
ODCIEnv Bind :7 Value 0
ODCIEnv Bind :8 Value 5
  SCOTT.PROMO_STATS_OT.ODCIStatsSelectivity returned selectivity: 22.86282306%
  Table: PROMOTIONS  Alias: PROMOTIONS
    Card: Original: 503.000000  Rounded: 115  Computed: 115.00  Non Adjusted: 115.00
  Access Path: TableScan
    Cost:  1526.03  Resp: 1526.03  Degree: 0
      Cost_io: 1526.00  Cost_cpu: 560562
      Resp_io: 1526.00  Resp_cpu: 560562
  Best:: AccessPath: TableScan
         Cost: 1526.03  Degree: 1  Resp: 1526.03  Card: 115.00  Bytes: 0

This trace information clearly shows the CBO's workings with our interface type. Interestingly, it also shows the binding of the 'TV' literal to the p_start and p_stop parameters for the ODCIStatsSelectivity method. This is a clear demonstration of how query predicate information is passed through the CBO into the interface type for us to use in our algorithms.

If we use bind variable predicates instead of literals, the values are still available to the interface type, as long as bind variable peeking occurs. We will demonstrate this below, beginning with our bind variable, as follows.

SQL> var bv_category VARCHAR2(30);

SQL> exec :bv_category := 'INTERNET';

PL/SQL procedure successfully completed.

Rather than use Autotrace as with our previous examples, we will switch to executing the query and viewing its cached plan. This is because EXPLAIN PLAN doesn't support bind variable peeking, which is required if the interface type is to know the actual value of the bound predicate.

SQL> SELECT *
  2  FROM   sh.promotions
  3  WHERE  promo_function(promo_category) = :bv_category;

<<...output snipped...>>

85 rows selected.

SQL> SELECT *
  2  FROM   TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
SQL_ID  5fcuf16f11vdn, child number 0
-------------------------------------
SELECT * FROM   sh.promotions WHERE  promo_function(promo_category) =
:bv_category

Plan hash value: 4106015420

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |       |       |    17 (100)|          |
|*  1 |  TABLE ACCESS FULL| PROMOTIONS |    85 |  8245 |    17   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROMO_FUNCTION"("PROMO_CATEGORY")=:BV_CATEGORY)


19 rows selected.

Again the CBO has the correct cardinality to work with. The trace file shows the following bind variable information:

*******************************************
Peeked values of the binds in SQL statement
*******************************************
----- Bind Info (kkscoacd) -----
 Bind#0
  oacdty=01 mxl=32(30) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1000000 frm=01 csi=178 siz=32 off=0
  kxsbbbfp=0eb679a8  bln=32  avl=08  flg=05
  value="INTERNET"

As long as bind variable peeking is performed, therefore, the interface method continues to work. If bind variable peeking is not performed, as is the case with EXPLAIN PLAN, the estimated cardinality will default to 1, based on a selectivity of 0.00000000%, as reported in the 10053 trace file:

  SCOTT.PROMO_STATS_OT.ODCIStatsSelectivity returned selectivity: 0.00000000%
  Table: PROMOTIONS  Alias: PROMOTIONS
    Card: Original: 503.000000  Rounded: 1  Computed: 0.00  Non Adjusted: 0.00

Of course, bind variable peeking is largely a one-time operation performed during query optimisation, so the statistics calculations will be performed using the first value only and fixed for all further executions of the query. It could be argued that this is less satisfactory than using sensible default statistics for queries with predicates of the form function(column) = :bind_variable.

other considerations

associations

Throughout this article, we have been associating and disassociating default/user-defined statistics with our functions. Details of associations are stored in the data dictionary, under the XXX_ASSOCIATIONS views, where XXX can be USER, ALL or DBA. For example, in Oracle 11.1.0.7, the USER_ASSOCIATIONS view is as follows.

SQL> desc user_associations

 Name                                   Null?    Type
 -------------------------------------- -------- -------------
 OBJECT_OWNER                           NOT NULL VARCHAR2(30)
 OBJECT_NAME                            NOT NULL VARCHAR2(30)
 COLUMN_NAME                                     VARCHAR2(30)
 OBJECT_TYPE                                     VARCHAR2(9)
 STATSTYPE_SCHEMA                                VARCHAR2(30)
 STATSTYPE_NAME                                  VARCHAR2(30)
 DEF_SELECTIVITY                                 NUMBER
 DEF_CPU_COST                                    NUMBER
 DEF_IO_COST                                     NUMBER
 DEF_NET_COST                                    NUMBER
 INTERFACE_VERSION                               NUMBER
 MAINTENANCE_TYPE                                VARCHAR2(14)

We can view details of the associations for the objects we created in this article as follows.

SQL> SELECT object_name
  2  ,      statstype_name
  3  ,      def_selectivity
  4  ,      def_cpu_cost
  5  ,      def_io_cost
  6  ,      interface_version
  7  FROM   user_associations;

OBJECT_NAME      STATSTYPE_NAME  DEF_SELECTIVITY DEF_CPU_COST DEF_IO_COST INTERFACE_VERSION
---------------- --------------- --------------- ------------ ----------- -----------------
PROMO_FUNCTION   PROMO_STATS_OT                                                           2
HIGH_CPU_IO                                           6747773          21                 0
LOW_CPU_IO                                              97794                             0
SOME_FUNCTION                                      4294967295  4294967295                 0

4 rows selected.

Note that we no longer have any default selectivity statistics (simply due to the sequence we ran our examples in) and we have a single usage of the Extensible Optimiser (identified either by the existence of a statistics type or an interface version value of 2). As stated before, all of our examples have been with standalone functions, but packages and type methods (among others) can also have default or user-defined statistics.

effects of the function result cache

Many developers will be familiar with 11g's new function result cache feature (see this oracle-developer.net article for more details). There isn't much to say about this feature because it makes no difference to the way that the optimiser uses PL/SQL function statistics. In other words, whether a function is cached or not is irrelevant to the CBO when it orders function-based predicates in queries. We can demonstrate this quite easily, by creating two functions (one with result caching) and using them in a query, as follows.

SQL> CREATE FUNCTION cached_function(
  2                  p_input IN INTEGER
  3                  ) RETURN INTEGER RESULT_CACHE IS
  4  BEGIN
  5     RETURN p_input + 100;
  6  END cached_function;
  7  /

Function created.

SQL> CREATE FUNCTION uncached_function(
  2                  p_input IN INTEGER
  3                  ) RETURN INTEGER IS
  4  BEGIN
  5     RETURN p_input + 100;
  6  END uncached_function;
  7  /

Function created.

The only difference between these functions is that CACHED_FUNCTION uses the PL/SQL function result cache to store results based on the input parameter. Using these functions, we will repeat our simple query to test the ordering of function-based predicates below. Note that the UNCACHED_FUNCTION predicate (theoretically the most "expensive" to execute) is coded first.

SQL> set autotrace traceonly explain

SQL> SELECT *
  2  FROM   thousand_rows
  3  WHERE  uncached_function(n1) > 900
  4  AND    cached_function(n1) > 900;

Execution Plan
----------------------------------------------------------
Plan hash value: 4012467810

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |     3 |   162 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| THOUSAND_ROWS |     3 |   162 |     5   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("UNCACHED_FUNCTION"("N1")>900 AND
              "CACHED_FUNCTION"("N1")>900)

The fact that CACHED_FUNCTION uses the result cache is irrelevant and the predicates are ordered as coded, even with statistics, as follows.

SQL> ASSOCIATE STATISTICS WITH FUNCTIONS cached_function, uncached_function
  2     DEFAULT SELECTIVITY 20
  3     DEFAULT COST (100000, 100000, 0);

Statistics associated.

SQL> COMMENT ON TABLE thousand_rows IS '';

Comment created.

SQL> set autotrace traceonly explain

SQL> SELECT *
  2  FROM   thousand_rows
  3  WHERE  uncached_function(n1) > 900
  4  AND    cached_function(n1) > 900;

Execution Plan
----------------------------------------------------------
Plan hash value: 4012467810

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |    40 |  2160 |   120M  (1)|400:00:01 |
|*  1 |  TABLE ACCESS FULL| THOUSAND_ROWS |    40 |  2160 |   120M  (1)|400:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("UNCACHED_FUNCTION"("N1")>900 AND
              "CACHED_FUNCTION"("N1")>900)

Something looks a little strange with those cost and time estimates but essentially we can see that there is no change in predicate order related to the result cache.

summary

In this article, we have seen the importance of providing meaningful statistics to the CBO on the costs and selectivities of our PL/SQL functions. Using either default statistics or the Extensible Optimiser, we can supply accurate information to improve the quality of the CBO's optimisation decisions, particularly for queries with function-based predicates. Whatever method we use, however, the statistics we generate can help the CBO to decide predicate and join orders, although complex query transformations are out of scope.

acknowledgements

Thanks to Joze Senegacnik for reviewing this article and providing some useful feedback and suggestions.

source code

The source code for the examples in this article can be downloaded from here.

Adrian Billington, June 2009

oracle-developer.net