setting cardinality for pipelined and table functions

Table and pipelined functions are often used as simple rowsources (for example, SELECT * FROM TABLE(dbms_xplan.display)). In these cases, cardinality is not particularly useful as there is only one possible execution plan. However, there might be times when we need to join a pipelined or table function to a set of other rowsources (tables, views etc). At this point having cardinality statistics for the function can be critical to enable the CBO to derive an efficient execution plan. This is what we will focus on in this article by demonstrating several methods for setting cardinality statistics for pipelined and table functions.

It is assumed that readers are comfortable with pipelined and table functions. If not, this oracle-developer.net article is recommended background reading.

methods for setting cardinality

We will cover the following methods for setting table/pipelined function cardinality in this article:

Note that the first two methods use undocumented hints. These hints are used internally by Oracle but they haven't been included in the public documentation. For this reason, they should not be used in production code. They are included in this article for full coverage of this subject and not as recommended solutions.

The examples in this article were run on an 11.1.0.7 database, but the version information above indicates the Oracle versions that support each method.

sample pipelined function

The majority of the examples in this article will be based on a join between the DEPARTMENTS and EMPLOYEES data in the supplied HR schema. The EMPLOYEES data will be generated by a pipelined function as an alternative rowsource to use in the queries. We will begin by creating the object and collection types to define our pipelined function's record structure and return type, as follows.

SQL> CREATE TYPE employees_ot AS OBJECT
  2  ( employee_id    NUMBER(6)
  3  , first_name     VARCHAR2(20)
  4  , last_name      VARCHAR2(25)
  5  , email          VARCHAR2(25)
  6  , phone_number   VARCHAR2(20)
  7  , hire_date      DATE
  8  , job_id         VARCHAR2(10)
  9  , salary         NUMBER(8,2)
 10  , commission_pct NUMBER(2,2)
 11  , manager_id     NUMBER(6)
 12  , department_id  NUMBER(4)
 13  );
 14  /

Type created.

SQL> CREATE TYPE employees_ntt AS TABLE OF employees_ot;
  2  /

Type created.

We will now create a pipelined function, as follows.

SQL> CREATE FUNCTION employees_piped
  2     RETURN employees_ntt PIPELINED AS
  3  BEGIN
  4     FOR r IN (SELECT * FROM employees) LOOP
  5        FOR i IN 1 .. 200 LOOP
  6           PIPE ROW (employees_ot(
  7                        r.employee_id, r.first_name, r.last_name,
  8                        r.email, r.phone_number, r.hire_date,
  9                        r.job_id, r.salary, r.commission_pct,
 10                        r.manager_id, r.department_id));
 11        END LOOP;
 12     END LOOP;
 13     RETURN;
 14  END employees_piped;
 15  /

Function created.

Note that this function pipes over 20,000 records by inflating the EMPLOYEES dataset by a factor of 200 (this volume of data will show the impact of using correct cardinality statistics more clearly). The duplicate data and the fact that there are no complex transformations (which is usually the reason for using a pipelined function in the first place) is irrelevant in this case as we simply need a representative rowsource for our examples.

default cardinality

We will begin by investigating how the CBO treats pipelined functions when optimising SQL. First we will query our function to see how many rows it actually generates, as follows.

SQL> SELECT COUNT(*)
  2  FROM   TABLE(employees_piped);

  COUNT(*)
----------
     21400

1 row selected.

Our function pipes 21,400 employees records. Using Autotrace to generate an Explain Plan, we will re-query the function to see how many rows the CBO thinks it returns.

SQL> set autotrace traceonly explain

SQL> SELECT *
  2  FROM   TABLE(employees_piped);

Execution Plan
----------------------------------------------------------
Plan hash value: 825990498

--------------------------------------------------------------------- ...
| Id  | Operation                         | Name            | Rows  | ...
--------------------------------------------------------------------- ...
|   0 | SELECT STATEMENT                  |                 |  8168 | ...
|   1 |  COLLECTION ITERATOR PICKLER FETCH| EMPLOYEES_PIPED |       | ...
--------------------------------------------------------------------- ...

We can see that the CBO uses a cardinality of 8,168 rows for our pipelined function. This is a heuristic value (rather than an estimate) that the optimiser uses for all table/pipelined functions. This default value appears to be dependant on the default database block size (this example is from a database with an 8kb block size). On databases with larger default block sizes, the CBO will use a larger heuristic value for the cardinality.

Given that the optimiser uses a default (and usually incorrect) value for cardinality, it follows that this could lead to sub-optimal execution plans. We will test this assertion by joining our EMPLOYEES_PIPED function to the DEPARTMENTS table, as follows.

SQL> SELECT *
  2  FROM   departments              d
  3  ,      TABLE(employees_piped)   e
  4  WHERE  d.department_id = e.department_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 178575256

----------------------------------------------------------------------- ...
| Id  | Operation                           | Name            | Rows  | ...
----------------------------------------------------------------------- ...
|   0 | SELECT STATEMENT                    |                 |  8168 | ...
|   1 |  MERGE JOIN                         |                 |  8168 | ...
|   2 |   TABLE ACCESS BY INDEX ROWID       | DEPARTMENTS     |    27 | ...
|   3 |    INDEX FULL SCAN                  | DEPT_ID_PK      |    27 | ...
|*  4 |   SORT JOIN                         |                 |  8168 | ...
|   5 |    COLLECTION ITERATOR PICKLER FETCH| EMPLOYEES_PIPED |       | ...
----------------------------------------------------------------------- ...

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("D"."DEPARTMENT_ID"=VALUE(KOKBF$))
       filter("D"."DEPARTMENT_ID"=VALUE(KOKBF$))

With a cardinality of 8,168 for the pipelined function, the optimiser has chosen a sort-merge join. It is unlikely that this is the most efficient method for joining these two rowsources, so we will now demonstrate the effect of setting correct cardinality statistics for our pipelined function, using each of the four available methods in turn.

method 1: cardinality hint (undocumented)

The CARDINALITY hint is simple to use (although quite limited) yet for some reason remains undocumented. As its name suggests, this hint enables us to supply the cardinality for a rowsource in a query. Despite the fact that we shouldn't use this hint in production code, we will see the effect it has on our query below. We will supply the exact rowcount of our pipelined function in this case, though in practice we would probably only ever use an estimated value.

SQL> SELECT /*+ CARDINALITY(e, 21400) */ *
  2  FROM   departments              d
  3  ,      TABLE(employees_piped)   e
  4  WHERE  d.department_id = e.department_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 3250313570

---------------------------------------------------------------------- ...
| Id  | Operation                          | Name            | Rows  | ...
---------------------------------------------------------------------- ...
|   0 | SELECT STATEMENT                   |                 | 21400 | ...
|*  1 |  HASH JOIN                         |                 | 21400 | ...
|   2 |   TABLE ACCESS FULL                | DEPARTMENTS     |    27 | ...
|   3 |   COLLECTION ITERATOR PICKLER FETCH| EMPLOYEES_PIPED |       | ...
---------------------------------------------------------------------- ...

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("D"."DEPARTMENT_ID"=VALUE(KOKBF$))

Using the hint, we have supplied the exact cardinality of our pipelined function to the CBO. With a more suitable cardinality, the optimiser has chosen an alternative access path (a hash join in this case). This clearly shows the importance of supplying representative statistics, rather than relying on defaults.

method 2: opt_estimate hint (undocumented)

The second method uses the OPT_ESTIMATE hint. This hint, although undocumented, is used by Oracle's SQL Profile feature to correct the cardinalities used by the optimiser. The OPT_ESTIMATE hint is used in SQL Profiles to scale the CBO's estimates up or down according to the correction factors it determined during a SQL Tuning Task (note that during a SQL Tuning Task, Oracle compares actual rowsource cardinalities for a query with the estimates it used to derive the original plan. A SQL Profile stores the corrections).

The OPT_ESTIMATE hint can be used to re-factor the cardinalities for tables (this includes views and table functions), indexes and joins. We will use it to correct our pipelined function cardinality to its exact rowcount, as follows.

SQL> SELECT /*+ OPT_ESTIMATE(table, e, scale_rows=2.62) */
  2         *
  3  FROM   departments              d
  4  ,      TABLE(employees_piped)   e
  5  WHERE  d.department_id = e.department_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 3250313570

---------------------------------------------------------------------- ...
| Id  | Operation                          | Name            | Rows  | ...
---------------------------------------------------------------------- ...
|   0 | SELECT STATEMENT                   |                 | 21400 | ...
|*  1 |  HASH JOIN                         |                 | 21400 | ...
|   2 |   TABLE ACCESS FULL                | DEPARTMENTS     |    27 | ...
|   3 |   COLLECTION ITERATOR PICKLER FETCH| EMPLOYEES_PIPED |       | ...
---------------------------------------------------------------------- ...

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("D"."DEPARTMENT_ID"=VALUE(KOKBF$))

With this hint, we have told the CBO to scale its cardinality estimates for our pipelined function by a factor of 2.62 (this is 21400/8168 to 2 decimal places but is probably more precise than we needed to be). Note that we defined this rowsource as a "table" and referenced it by its alias in the query. As we can see, the result is that Oracle uses a hash join, as it did when we used the CARDINALITY hint above. This is to be expected of course, because we have supplied the same information to the optimiser, albeit by a different method.

method 3: dynamic_sampling hint

The dynamic sampling feature has been available since 9i Release 2 but only supports pipelined and table functions from 11.1.0.7 onwards (at sampling levels 2 to 10). Note that the CBO will only sample a pipelined or table function if dynamic sampling is requested by using the DYNAMIC_SAMPLING hint and not with the OPTIMIZER_DYNAMIC_SAMPLING parameter setting. With this in mind, we will use the hint in our standard query, as follows.

SQL> set autotrace traceonly explain

SQL> SELECT /*+ DYNAMIC_SAMPLING(e, 5) */
  2         *
  3  FROM   departments              d
  4  ,      TABLE(employees_piped)   e
  5  WHERE  d.department_id = e.department_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 3250313570

---------------------------------------------------------------------- ...
| Id  | Operation                          | Name            | Rows  | ...
---------------------------------------------------------------------- ...
|   0 | SELECT STATEMENT                   |                 | 21400 | ...
|*  1 |  HASH JOIN                         |                 | 21400 | ...
|   2 |   TABLE ACCESS FULL                | DEPARTMENTS     |    27 | ...
|   3 |   COLLECTION ITERATOR PICKLER FETCH| EMPLOYEES_PIPED |       | ...
---------------------------------------------------------------------- ...

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("D"."DEPARTMENT_ID"=VALUE(KOKBF$))

Note
-----
   - dynamic sampling used for this statement

The notes section of the DBMS_XPLAN output tells us that dynamic sampling was used for our query. We used a sampling level of 5 and interestingly the CBO has the exact cardinality for our pipelined function (and hence it switches to a hash join again). If we repeat the query with a dynamic sampling level of just 2 and trace the CBO using the 10053 event, we can see quite clearly how the CBO can be so accurate at lower sampling levels.

SQL> ALTER SESSION SET EVENTS '10053 trace name context forever';

Session altered.

SQL> SELECT /*+ DYNAMIC_SAMPLING(e, 2) */
  2         *
  3  FROM   departments              d
  4  ,      TABLE(employees_piped)   e
  5  WHERE  d.department_id = e.department_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 3250313570

---------------------------------------------------------------------- ...
| Id  | Operation                          | Name            | Rows  | ...
---------------------------------------------------------------------- ...
|   0 | SELECT STATEMENT                   |                 | 21400 | ...
|*  1 |  HASH JOIN                         |                 | 21400 | ...
|   2 |   TABLE ACCESS FULL                | DEPARTMENTS     |    27 | ...
|   3 |   COLLECTION ITERATOR PICKLER FETCH| EMPLOYEES_PIPED |       | ...
---------------------------------------------------------------------- ...

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("D"."DEPARTMENT_ID"=VALUE(KOKBF$))

Note
-----
   - dynamic sampling used for this statement

SQL> set autotrace off

SQL> ALTER SESSION SET EVENTS '10053 trace name context off';

Session altered.

Even at level 2, the CBO has the exact cardinality. The reason why is in the trace file, as the following excerpt demonstrates.


***************************************
SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for KOKBF$[KOKBF$] 

*** 2009-06-11 18:15:51.109
** Performing dynamic sampling initial checks. **
** Dynamic sampling initial checks returning TRUE (level = 2).
** Dynamic sampling updated table stats.: blocks=33

<<...snip...>>

*** 2009-06-11 18:15:51.140
** Executed dynamic sampling query:
    level : 2
    sample pct. : 100.000000
    actual sample size : 21400
    filtered sample card. : 21400
    orig. card. : 8168
    block cnt. table stat. : 33
    block cnt. for sampling: 33
    max. sample block cnt. : 64
    sample block cnt. : 33
    min. sel. est. : -1.00000000
** Using dynamic sampling card. : 21400
** Dynamic sampling updated table card.
  Table: KOKBF$  Alias: KOKBF$
    Card: Original: 21400.000000  Rounded: 21400  Computed: 21400.00  Non Adjusted: 21400.00
    
***************************************

As highlighted above, dynamic sampling for pipelined functions at level 2 or above means that a 100% sample of the rowsource is taken, regardless of the sampling level specified in the hint. For pipelined functions with small resultsets, this 100% sample might be trivial compared with the effect that the wrong cardinality could have on the overall execution plan.

For pipelined functions that process large volumes of data, however, dynamic sampling could be more costly because it increases the workload during parsing. Despite this, it doesn't mean that the workload (and therefore time to completion) is doubled just because the function is queried twice. We can use SQL Trace to demonstrate this, by measuring the additional workload that dynamic sampling generates. In the following example, we will query our pipelined function with SQL Trace set and a dynamic sampling level of 2.

SQL> ALTER SESSION SET EVENTS '10046 trace name context forever, level 12';

Session altered.

SQL> set autotrace traceonly

SQL> SELECT /*+ DYNAMIC_SAMPLING(e, 2) */ *
  2  FROM   TABLE(employees_piped) e;

21400 rows selected.

<<...snip...>>

SQL> ALTER SESSION SET EVENTS '10046 trace name context off';

Session altered.

The resulting trace file (formatted with TKProf) shows two SQL statements for our query. First, we can see the query itself, as follows.


SELECT /*+ DYNAMIC_SAMPLING(e, 2) */ *
FROM   TABLE(employees_piped) e

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       44      0.15       0.15          0          0          0       21400
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       46      0.15       0.15          0          0          0       21400

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 82  

Rows     Row Source Operation
-------  ---------------------------------------------------
  21400  COLLECTION ITERATOR PICKLER FETCH EMPLOYEES_PIPED (cr=8 pr=0 pw=0 time=0 us)

We can see that this query completed extremely quickly (this was due to having Autotrace set to traceonly so that the resultset wasn't spooled to screen and a SQL*Plus arraysize of 500). The tracefile also contains a second query of our pipelined function, as follows.


SQL ID: 82w2n8jb84umg
Plan Hash: 3623837808
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE 
  NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') 
  NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), 
  NVL(SUM(C2),:"SYS_B_1") 
FROM
 (SELECT /*+ NO_PARALLEL("KOKBF$") FULL("KOKBF$") NO_PARALLEL_INDEX("KOKBF$") 
  */ :"SYS_B_2" AS C1, :"SYS_B_3" AS C2 FROM TABLE("HR"."EMPLOYEES_PIPED"()) 
  "KOKBF$") SAMPLESUB


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.03       0.03          0          0          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.03       0.03          0          0          0           1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 82     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=8 pr=0 pw=0 time=0 us)
  21400   COLLECTION ITERATOR PICKLER FETCH EMPLOYEES_PIPED (cr=8 pr=0 pw=0 time=0 us)

This recursive query was executed by Oracle because we used the DYNAMIC_SAMPLING hint. We can see that it took less time and fewer resources than our actual query, yet it accessed the same number of rows. The workload was reduced because Oracle was simply trying to get the pipelined function's rowcount. This means that it was able to use an aggregate query without projecting any of the function's return attributes. Instead, it used bind variable expressions. The raw trace file tells us that the values of the bind variables were 0 for the default NVL arguments in the outer query block and 1 for the C1 and C2 expressions in the in-line view. By using an in-line view, SUM and NVL, Oracle executed a more complicated version of a COUNT(*) from the function to determine its rowcount.

method 4: extensible optimiser

So far in this article, we have seen three hints that can be used to ensure the CBO has accurate cardinalities for our table or pipelined functions. However, two of these hints are unsupported and the third is potentially costly. Since 10g Release 1, however, there has been a fully supported and lightweight method for supplying cardinality to the optimiser; this being the Extensible Optimiser.

The Extensible Optimiser is implemented by an Oracle Data Cartridge (which is essentially an object type, known as an interface type, that contains one or more well-defined and structured methods). This feature enables us to design our own cardinality calculations (as a prescribed method in the interface type) and then associate them with our table or pipelined functions. The type's cardinality method is invoked by the CBO during query optimisation to determine the rowcount for the pipelined or table function. We will see examples of this below.

enabling the extensible optimiser

The Extensible Optimiser has a range of well-defined methods for calculating various statistics for functions, but the one we are interested in is the ODCIStatsTableFunction method. To use the Extensible Optimiser with a table or pipelined function, we require three components:

We will create each of these components below.

(1) pipelined function

First, we'll modify our EMPLOYEES_PIPED pipelined function to include a P_NUM_ROWS parameter, as follows.

SQL> CREATE OR REPLACE FUNCTION employees_piped(
  2                             p_num_rows IN NUMBER
  3                             ) RETURN employees_ntt PIPELINED AS
  4  BEGIN
  5     FOR r IN (SELECT * FROM employees) LOOP
  6        FOR i IN 1 .. 200 LOOP
  7           PIPE ROW (employees_ot(
  8                        r.employee_id, r.first_name, r.last_name,
  9                        r.email, r.phone_number, r.hire_date,
 10                        r.job_id, r.salary, r.commission_pct,
 11                        r.manager_id, r.department_id));
 12        END LOOP;
 13     END LOOP;
 14     RETURN;
 15  END employees_piped;
 16  /

Function created.

Note that the P_NUM_ROWS parameter doesn't appear in the function body at all. Instead, it will be used by the CBO as we will see shortly.

(2) interface type

We will now create an interface object type, beginning with the specification, as follows.

SQL> CREATE OR REPLACE TYPE pipelined_stats_ot AS OBJECT (
  2
  3     dummy_attribute NUMBER,
  4
  5     STATIC FUNCTION ODCIGetInterfaces (
  6                     p_interfaces OUT SYS.ODCIObjectList
  7                     ) RETURN NUMBER,
  8
  9     STATIC FUNCTION ODCIStatsTableFunction (
 10                     p_function IN  SYS.ODCIFuncInfo,
 11                     p_stats    OUT SYS.ODCITabFuncStats,
 12                     p_args     IN  SYS.ODCIArgDescList,
 13                     p_num_rows IN  NUMBER
 14                     ) RETURN NUMBER
 15
 16  );
 17  /

Type created.

Some important points to note about this are as follows:

The interface type body is where we code our cardinality calculation, as follows.

SQL> CREATE TYPE BODY pipelined_stats_ot AS
  2
  3     STATIC FUNCTION ODCIGetInterfaces (
  4                     p_interfaces OUT SYS.ODCIObjectList
  5                     ) RETURN NUMBER IS
  6     BEGIN
  7        p_interfaces := SYS.ODCIObjectList(
  8                           SYS.ODCIObject ('SYS', 'ODCISTATS2')
  9                           );
 10        RETURN ODCIConst.success;
 11     END ODCIGetInterfaces;
 12
 13     STATIC FUNCTION ODCIStatsTableFunction (
 14                     p_function IN  SYS.ODCIFuncInfo,
 15                     p_stats    OUT SYS.ODCITabFuncStats,
 16                     p_args     IN  SYS.ODCIArgDescList,
 17                     p_num_rows IN  NUMBER
 18                     ) RETURN NUMBER IS
 19     BEGIN
 20        p_stats := SYS.ODCITabFuncStats(p_num_rows);
 21        RETURN ODCIConst.success;
 22     END ODCIStatsTableFunction;
 23
 24  END;
 25  /

Type body created.

As stated earlier, the ODCIGetInterfaces method is mandatory and so is its implementation, as shown. The ODCIStatsTableFunction is the method where we can be creative, although in fact our implementation is very simple. Remember that we included a P_NUM_ROWS parameter in our pipelined function. We didn't use it in the function itself. Instead, we have simply taken this parameter and passed it straight through to the CBO via the interface type, as highlighted above (on line 20).

(3) association

The interface type is the bridge between the table or pipelined function and the CBO. The ODCIStatsTableFunction method simply picks up the parameter we pass to our pipelined function, optionally uses it to calculate a cardinality value and then passes it on to the CBO. For Oracle to be able to do this, however, we require a third and final component; that is, the association between the pipelined function and the interface type. We do this as follows.

SQL> ASSOCIATE STATISTICS WITH FUNCTIONS employees_piped USING pipelined_stats_ot;

Statistics associated.

With this command, our pipelined function and interface type are now directly linked. Incidentally, our type could also be associated with other functions (assuming they also had a single parameter named P_NUM_ROWS).

testing the extensible optimiser

We have now completed our setup for the Extensible Optimiser. To test it, we will repeat our sample query but without any hints, as follows.

SQL> set autotrace traceonly explain

SQL> SELECT *
  2  FROM   departments                   d
  3  ,      TABLE(employees_piped(21400)) e
  4  WHERE  d.department_id = e.department_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 3250313570

---------------------------------------------------------------------- ...
| Id  | Operation                          | Name            | Rows  | ...
---------------------------------------------------------------------- ...
|   0 | SELECT STATEMENT                   |                 | 21400 | ...
|*  1 |  HASH JOIN                         |                 | 21400 | ...
|   2 |   TABLE ACCESS FULL                | DEPARTMENTS     |    27 | ...
|   3 |   COLLECTION ITERATOR PICKLER FETCH| EMPLOYEES_PIPED |       | ...
---------------------------------------------------------------------- ...

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("D"."DEPARTMENT_ID"=VALUE(KOKBF$))

As we can see, the CBO has picked up the cardinality value that we passed to our pipelined function and used it to optimise our SQL. If we trace the optimisation of our query with a 10053 event, we can see further evidence that our interface type is being used by the CBO, as follows.


Access path analysis for KOKBF$
***************************************
SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for KOKBF$[KOKBF$] 
  Calling user-defined function card function...
Bind :3 Value 21400
  HR.PIPELINED_STATS_OT.ODCIStatsTableFunction returned:
    num_rows      : 21400
  Table: KOKBF$  Alias: KOKBF$
    Card: Original: 21400.000000  Rounded: 21400  Computed: 21400.00  Non Adjusted: 21400.00
  Access Path: TableScan
    Cost:  29.29  Resp: 29.29  Degree: 0
      Cost_io: 29.00  Cost_cpu: 6481984
      Resp_io: 29.00  Resp_cpu: 6481984
  Best:: AccessPath: TableScan
         Cost: 29.29  Degree: 1  Resp: 29.29  Card: 21400.00  Bytes: 0

***************************************

This trace file is from an 11.1.0.7 instance. We can see that the optimiser executes the ODCIStatsTableFunction method in our interface type and receives the correct cardinality in return. Note that in 10g (and possibly 11.1.0.6), the 10053 trace file includes the full PL/SQL block that the optimiser uses to execute our ODCIStatsTableFunction method.

benefits of the extensible optimiser method

Overall, we can see that the Extensible Optimiser is a useful, accurate and, importantly, a supported method for supplying cardinality to the CBO. Its main benefit over the DYNAMIC_SAMPLING hint (the only other supported method at the time of writing) is that it doesn't execute the pipelined function itself, just the ODCIStatsTableFunction method.

In our example, we have simply exploited the Extensible Optimiser feature to create our own alternative to the CARDINALITY hint. Apart from being a supported method, another benefit over the CARDINALITY hint is that initial value of P_NUM_ROWS could be passed as a variable rather than hard-coded as above (it must be known in advance when used in the CARDINALITY hint).

an alternative implementation

For an alternative implementation of our Extensible Optimiser method, we could remove the P_NUM_ROWS parameter and instead use a lookup table to store representative cardinalities for all of our table or pipelined functions. A single interface type could be associated to all functions, with the ODCIStatsTableFunction method looking up the cardinality based on the executing function name (which is also known to the interface type). With this technique, we could avoid hard-coding cardinalities and modify them over time in the lookup table as needed.

the extensible optimiser, table functions and variable in-lists

So far, our examples have all been based on the EMPLOYEES_PIPED pipelined function. We will complete this article with an example of a table function. Table functions are commonly used as a mechanism to bind variable in-lists passed as collections into SQL queries (for an example, see this article). Using the Extensible Optimiser, we can devise a generic way to determine the cardinalities of all table functions used in variable in-list queries.

First, we will create a simple collection type to support any variable in-list of string values.

SQL> CREATE OR REPLACE TYPE varchar2_ntt AS TABLE OF VARCHAR2(4000);
  2  /

Type created.

Second, we will create a small function that will receive and return a collection of our generic VARCHAR2_NTT type. This function does nothing with the collection itself; it is merely a wrapper over it.

SQL> CREATE FUNCTION collection_wrapper(
  2                  p_collection IN varchar2_ntt
  3                  ) RETURN varchar2_ntt IS
  4  BEGIN
  5     RETURN p_collection;
  6  END collection_wrapper;
  7  /

Function created.

Third, we will create an interface type specification to be associated with our simple COLLECTION_WRAPPER function, as follows.

SQL> CREATE TYPE collection_wrapper_ot AS OBJECT (
  2
  3     dummy_attribute NUMBER,
  4
  5     STATIC FUNCTION ODCIGetInterfaces (
  6                     p_interfaces OUT SYS.ODCIObjectList
  7                     ) RETURN NUMBER,
  8
  9     STATIC FUNCTION ODCIStatsTableFunction (
 10                     p_function   IN  SYS.ODCIFuncInfo,
 11                     p_stats      OUT SYS.ODCITabFuncStats,
 12                     p_args       IN  SYS.ODCIArgDescList,
 13                     p_collection IN varchar2_ntt
 14                     ) RETURN NUMBER
 15
 16  );
 17  /

Type created.

This is very similar to our previous example so doesn't need to be explained in any great detail. Note, however, that our function has a P_COLLECTION parameter, which needs to be replicated in our ODCIStatsTableFunction method signature. We can now add our interface type body, as follows.

SQL> CREATE TYPE BODY collection_wrapper_ot AS
  2
  3     STATIC FUNCTION ODCIGetInterfaces (
  4                     p_interfaces OUT SYS.ODCIObjectList
  5                     ) RETURN NUMBER IS
  6     BEGIN
  7        p_interfaces := SYS.ODCIObjectList(
  8                           SYS.ODCIObject ('SYS', 'ODCISTATS2')
  9                           );
 10        RETURN ODCIConst.success;
 11     END ODCIGetInterfaces;
 12
 13     STATIC FUNCTION ODCIStatsTableFunction (
 14                     p_function   IN  SYS.ODCIFuncInfo,
 15                     p_stats      OUT SYS.ODCITabFuncStats,
 16                     p_args       IN  SYS.ODCIArgDescList,
 17                     p_collection IN  varchar2_ntt
 18                     ) RETURN NUMBER IS
 19     BEGIN
 20        p_stats := SYS.ODCITabFuncStats(p_collection.COUNT);
 21        RETURN ODCIConst.success;
 22     END ODCIStatsTableFunction;
 23
 24  END;
 25  /

Type body created.

Our implementation is very similar to our previous example. This time, however, we have a collection parameter rather than a scalar number, so to supply the CBO with the correct cardinality, we simply count the collection's elements (line 20).

Fourth and finally, we must associate the function with the interface type, as follows.

SQL> ASSOCIATE STATISTICS WITH FUNCTIONS collection_wrapper USING collection_wrapper_ot;

Statistics associated.

Before we test the Extensible Optimiser with a variable in-list query, we'll see how it works with a simple table function select with Autotrace. First, we will query a hard-coded collection of three elements without our COLLECTION_WRAPPER function, as follows.

SQL> set autotrace traceonly explain

SQL> SELECT *
  2  FROM   TABLE(varchar2_ntt('A','B','C'));

Execution Plan
----------------------------------------------------------
Plan hash value: 1748000095

--------------------------------------------------------------
| Id  | Operation                             | Name | Rows  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT                      |      |  8168 |
|   1 |  COLLECTION ITERATOR CONSTRUCTOR FETCH|      |       |
--------------------------------------------------------------

Unsurprisingly, the CBO has to resort to the default cardinality of 8,168 rows. To counter this, we can wrap our collection in a call to the wrapper function and enable the CBO to get the correct cardinality, as follows.


set autotrace traceonly explain

SQL> SELECT *
  2  FROM   TABLE(
  3            collection_wrapper(
  4               varchar2_ntt('A','B','C')));

Execution Plan
----------------------------------------------------------
Plan hash value: 4261576954

------------------------------------------------------------------------
| Id  | Operation                         | Name               | Rows  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                    |     3 |
|   1 |  COLLECTION ITERATOR PICKLER FETCH| COLLECTION_WRAPPER |       |
------------------------------------------------------------------------

This time the optimiser has the correct cardinality due to our interface type. The generic COLLECTION_WRAPPER function and COLLECTION_WRAPPER_OT interface type combine to provide statistics to the CBO when using small collections such as this.

As stated earlier, table functions are typically used to support variable in-lists, so we will see an example of how this wrapper method can assist in this scenario. We will filter the EMPLOYEES table by a variable in-list of names. The in-list is represented by a collection and although it is hard-coded for simplicity below, we would usually expect it to be passed as a parameter/bind variable.

First, we will execute the query without the COLLECTION_WRAPPER function, as follows.

SQL> set autotrace traceonly explain

SQL> SELECT *
  2  FROM   employees
  3  WHERE  last_name IN (SELECT column_value
  4                       FROM   TABLE(
  5                                 varchar2_ntt('Grant','King')));

Execution Plan
----------------------------------------------------------
Plan hash value: 2363723373

-------------------------------------------------------------------- ...
| Id  | Operation                              | Name      | Rows  | ...
-------------------------------------------------------------------- ...
|   0 | SELECT STATEMENT                       |           |     1 | ...
|*  1 |  HASH JOIN SEMI                        |           |     1 | ...
|   2 |   TABLE ACCESS FULL                    | EMPLOYEES |   107 | ...
|   3 |   COLLECTION ITERATOR CONSTRUCTOR FETCH|           |       | ...
-------------------------------------------------------------------- ...

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("LAST_NAME"=VALUE(KOKBF$))

Despite the fact that we only want to query two names from the EMPLOYEES table, Oracle has chosen a hash semi-join. This is because the optimiser has used the 8,168 heuristic cardinality for our collection. To see the effect of the Extensible Optimiser in this case, we will repeat the query but with the COLLECTION_WRAPPER function, as follows.

SQL> SELECT *
  2  FROM   employees
  3  WHERE  last_name IN (SELECT column_value
  4                       FROM   TABLE(
  5                                 collection_wrapper(
  6                                    varchar2_ntt('Grant','King'))));

Execution Plan
----------------------------------------------------------
Plan hash value: 1407606243

--------------------------------------------------------------------------- ...
| Id  | Operation                            | Name               | Rows  | ...
--------------------------------------------------------------------------- ...
|   0 | SELECT STATEMENT                     |                    |     2 | ...
|   1 |  NESTED LOOPS                        |                    |       | ...
|   2 |   NESTED LOOPS                       |                    |     2 | ...
|   3 |    SORT UNIQUE                       |                    |       | ...
|   4 |     COLLECTION ITERATOR PICKLER FETCH| COLLECTION_WRAPPER |       | ...
|*  5 |    INDEX RANGE SCAN                  | EMP_NAME_IX        |     1 | ...
|   6 |   TABLE ACCESS BY INDEX ROWID        | EMPLOYEES          |     1 | ...
--------------------------------------------------------------------------- ...

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("LAST_NAME"=VALUE(KOKBF$))

This time, the optimiser is aware that there are only two elements in the variable in-list collection and has opted for a nested loops join accordingly. This also demonstrates that the COLLECTION_WRAPPER and COLLECTION_WRAPPER_OT objects can be re-used for all queries that include small collections (as in-lists or otherwise).

summary

In this article, we have seen four methods for supplying table and pipelined function cardinalities to the optimiser. Two of these methods are unsupported (as of 11g Release 1) and for this reason, their use in production code is discouraged. Of the two supported methods, the DYNAMIC_SAMPLING hint is a new feature of 11.1.0.7 and has some limitations and performance implications. The Extensible Optimiser feature is the most flexible method to use at this stage and is usable in all versions of 10g. Using this, we have devised a good alternative to the CARDINALITY hint for pipelined functions and also created a generic wrapper for small collections that are typically used in variable in-list queries.

further reading

For more information on the Extensible Optimiser and how it can be used to generate a range of statistics for functions, read pl/sql functions and cbo costing on this website and the further reading references provided therein. For discussions of variable in-lists as a technique, see this article by William Robertson and binding in-lists in 10g on oracle-developer.net. For more information on dynamic sampling, read the Performance Tuning Guide in the online Oracle documentation.

source code

The source code for the examples in this article can be downloaded from here.

Adrian Billington, June 2009

Back to Top