subquery factoring in oracle 9i

Subquery factoring, also known as the WITH clause, provides a convenient and flexible way for us to define subqueries and in-line views in Oracle 9i. The primary purpose of subquery factoring is to reduce repeated table accesses by generating temporary datasets during query execution. However, even if we do not take advantage of this internal query optimisation, we can use subquery factoring to structure our complex SQL statements in a more logical and understandable format. In this article we will look at how we can benefit from this new syntax and related optimisation.

syntax overview

The syntax for subquery factoring is as follows.

WITH subquery_name AS (
        SELECT ...
        )
,    another_subquery_name AS (
        SELECT ...
        )
SELECT ...
FROM   subquery_name          sq1
,      another_subquery_name  sq2
WHERE  sq1... = sq2...

As stated in the introduction to this article, subquery factoring is invoked via the new WITH clause. The WITH keyword is the first in the syntax diagram for SELECT, as we can see in the syntax example above. This example defines two subqueries and then joins them together in the "main body" of the SELECT statement. This is the semantic equivalent of the following pseudo-select.

SELECT ...
FROM  (SELECT ...) inline_view_name
,     (SELECT ...) another_inline_view_name
WHERE ...

The following syntax example is possibly more common in everyday use. We define two named subqueries, but the second builds on the first.

WITH subquery_name AS (
        SELECT ...
        FROM   table
        )
,    another_subquery_name AS (
        SELECT ...
        FROM   subquery_name
        )
SELECT ...
FROM   another_subquery_name
WHERE  ...

This form of subquery factoring is the semantic equivalent of the following pseudo-SQL.

SELECT ...
FROM  (SELECT ...
       FROM  (
              SELECT ...))

This is where subquery factoring is at its most useful. Using subqueries in this way helps to break down complex SQL statements and structure them more "procedurally" without having to rely on multiple levels of nested in-line views. This is particularly important when using analytic functions which often need at one least level of nesting. Oracle, meanwhile, manages the optimisation of such "procedural SQL" statements through its query merging techniques, such that what we define in our SQL is not necessarily what Oracle runs under the covers.

a simple optimisation example

We have seen subquery factoring introduced above as a convenient SQL-writing option (and its use as such cannot be understated). Its primary objective, however, is as a performance optimisation technique. Many of the subqueries we write using the WITH clause will be treated exactly as an-line view by Oracle. Sometimes, however, Oracle will go a stage further and "materialise" a temporary resultset from the subquery and use this in later parts of the SQL statement. It does this by creating an "on-the-fly" temporary table for use in the execution of that SQL statement only (subsequent uses of the same cursor will each generate a new temporary table).

In the following example, we will see this optimisation in operation. First, we will create a SALES table, using the ALL_OBJECTS view to generate 1 million rows of dummy sales data.

SQL> CREATE TABLE sales
  2  NOLOGGING
  3  AS
  4     SELECT a1.owner       AS region
  5     ,      a1.object_type AS product
  6     ,      a1.object_id   AS order_amt
  7     FROM   all_objects a1
  8     ,      all_objects a2
  9     WHERE  ROWNUM <= 1000000;

Table created.

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER,'sales');

PL/SQL procedure successfully completed.

Next we will write a query that will tell us which sales regions accounted for more than one-third of all sales revenue. We will begin with a standard method (involving two accesses of SALES) and measure resource and time costs using Autotrace and sqlplus timing.

SQL> set autotrace on
SQL> set timing on

SQL> SELECT region
  2  ,      total_sales
  3  FROM  (
  4         SELECT region
  5         ,      NVL(SUM(order_amt),0) AS total_sales
  6         FROM   sales
  7         GROUP  BY
  8                region
  9        ) ilv
 10  WHERE total_sales > (SELECT SUM(order_amt)/3 AS one_third_sales
 11                       FROM   sales);

REGION                         TOTAL_SALES
------------------------------ -----------
PUBLIC                          6842060686
SYS                             6008003545

2 rows selected.

Elapsed: 00:00:03.02

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2433 Card=2 Bytes=22)
   1    0   FILTER
   2    1     SORT (GROUP BY) (Cost=2433 Card=2 Bytes=22)
   3    2       TABLE ACCESS (FULL) OF 'SALES' (Cost=320 Card=1000000 Bytes=11000000)
   4    1     SORT (AGGREGATE)
   5    4       TABLE ACCESS (FULL) OF 'SALES' (Cost=320 Card=1000000 Bytes=5000000)


Statistics
----------------------------------------------------------
        255  recursive calls
          0  db block gets
       6670  consistent gets
       5883  physical reads
          0  redo size
        517  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          5  sorts (memory)
          0  sorts (disk)
          2  rows processed

This query took just 3 seconds to complete and used moderate levels of resources (including the hard-parse). However, we needed to access SALES twice in full to achieve this result.

Subquery factoring can reduce the SALES table access to a single scan by materialising the results of the first scan and use these to resolve the second subquery. We can see this below.

SQL> WITH regional_sales AS (
  2     SELECT region
  3     ,      NVL(SUM(order_amt),0) AS total_sales
  4     FROM   sales
  5     GROUP  BY
  6            region
  7     )
  8  SELECT region
  9  ,      total_sales
 10  FROM   regional_sales
 11  WHERE  total_sales > (SELECT SUM(total_sales)/3 AS one_third_sales
 12                        FROM   regional_sales);

REGION                         TOTAL_SALES
------------------------------ -----------
PUBLIC                          6842060686
SYS                             6008003545

2 rows selected.

Elapsed: 00:00:02.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=28 Bytes=840)
   1    2     RECURSIVE EXECUTION OF 'SYS_LE_2_0'
   2    0   TEMP TABLE TRANSFORMATION
   3    2     VIEW (Cost=2 Card=28 Bytes=840)
   4    3       TABLE ACCESS (FULL) OF 'SYS_TEMP_0FD9D660C_16AFD3' (Cost=2 Card=28 Bytes=308)
   5    3       SORT (AGGREGATE)
   6    5         VIEW (Cost=2 Card=28 Bytes=364)
   7    6           TABLE ACCESS (FULL) OF 'SYS_TEMP_0FD9D660C_16AFD3' (Cost=2 Card=28 Bytes=308)

Statistics
----------------------------------------------------------
       1002  recursive calls
         10  db block gets
       3649  consistent gets
       2944  physical reads
       1344  redo size
        517  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         15  sorts (memory)
          0  sorts (disk)
          2  rows processed

This version of the query was much quicker and included hard-parse costs in addition to the cost of setting up a temporary table. Subsequent executions of both of the example SQL statements reduced each by 1 second, making the subquery version twice as fast. We can see from the statistics output that the amount of work done by Oracle to answer the query in this way is significantly less, in particular the physical I/O.

Looking at the execution plan, we can see that Oracle has introduced a new TEMP TABLE TRANSFORMATION step (Step 2) where it creates and loads a global temporary table from the first scan of SALES. It then uses this temporary dataset to determine the cut-off sales value needed to answer the overall question of which regions account for more than one-third of sales revenue (Step 7).

Note that the Autotrace plan makes no mention of the SALES table. There is clearly some missing information in our plan. We can use the DBMS_XPLAN package for a second opinion as follows (note that DBMS_XPLAN is an Oracle 9.2 feature).

SQL> EXPLAIN PLAN SET statement_id = 'SQ_FCT'
  2  FOR
  3     WITH regional_sales AS (
  4        SELECT region
  5        ,      NVL(SUM(order_amt),0) AS total_sales
  6        FROM   sales
  7        GROUP  BY
  8               region
  9        )
 10     SELECT region
 11     ,      total_sales
 12     FROM   regional_sales
 13     WHERE  total_sales > (SELECT SUM(total_sales)/3 AS one_third_sales
 14                           FROM   regional_sales);

Explained.

SQL> SELECT plan_table_output
  2  FROM   TABLE(DBMS_XPLAN.DISPLAY('PLAN_TABLE','SQ_FCT'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------
| Id  | Operation                  |  Name                      | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                            |    28 |   840 |     2 |
|   2 |  TEMP TABLE TRANSFORMATION |                            |       |       |       |
|   1 |   RECURSIVE EXECUTION      | SYS_LE_2_0                 |       |       |       |
|   0 |    INSERT STATEMENT        |                            |    28 |   308 |  2433 |
|   1 |     LOAD AS SELECT         |                            |       |       |       |
|   2 |      SORT GROUP BY         |                            |    28 |   308 |  2433 |
|   3 |       TABLE ACCESS FULL    | SALES                      |  1000K|    10M|   320 |
|*  3 |   VIEW                     |                            |    28 |   840 |     2 |
|   4 |    TABLE ACCESS FULL       | SYS_TEMP_0FD9D6623_16AFD3  |    28 |   308 |     2 |
|   5 |    SORT AGGREGATE          |                            |     1 |    13 |       |
|   6 |     VIEW                   |                            |    28 |   364 |     2 |
|   7 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D6623_16AFD3  |    28 |   308 |     2 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("REGIONAL_SALES"."TOTAL_SALES"> (SELECT /*+ */
              SUM("REGIONAL_SALES"."TOTAL_SALES")/3 FROM  (SELECT /*+ CACHE_TEMP_TABLE("T1") */
              "T1"."C0" "REGION","T1"."C1" "TOTAL_SALES" FROM "SYS"."SYS_TEMP_0FD9D6623_16AFD3" "T1")
              "REGIONAL_SALES"))

Note: cpu costing is off

27 rows selected.

With this plan we see the full picture. The SALES table is scanned in full during the direct-path load of the temporary table. DBMS_XPLAN shows us the recursive plan that Autotrace omitted.

As noted earlier, Oracle will not always materialise subqueries in this way. Sometimes it will either merge the subquery into the main query or treat it as a simple in-line view. In either of these cases, the WITH clause behaves no differently to the more familiar in-line view. On large datasets, however, the savings from this optimisation can be quite significant as we have seen. On tiny datasets, the time involved in the temporary table setup can take longer than the original query itself so is not a particularly useful mechanism.

multiple subqueries

As described in the syntax overview earlier in this article, subqueries can reference preceding subqueries. This can be useful for very complex queries that need to build up results in stages (in fact, this makes SQL even more powerful, flexible and descriptive than ever).

We can see an example of this approach to building complex SQL statements in multiple stages below. This example continues with the original regional sales query but breaks down the regional totals by products. Note that for this particular requirement we should ideally be using analytic functions, but it highlights the subquery factoring principle quite well.

SQL> WITH regional_sales AS (
  2          SELECT region
  3          ,      NVL(SUM(order_amt),0) AS total_sales
  4          FROM   sales
  5          GROUP  BY
  6                 region
  7          )
  8  ,    top_regions AS (
  9          SELECT region
 10          FROM   regional_sales
 11          WHERE  total_sales > (SELECT SUM(total_sales)/3 AS one_third_sales
 12                                FROM   regional_sales)
 13          )
 14  SELECT region
 15  ,      product
 16  ,      COUNT(*)       AS product_units
 17  ,      SUM(order_amt) AS product_sales
 18  FROM   sales
 19  WHERE  region IN (SELECT region FROM top_regions)
 20  GROUP  BY
 21         region
 22  ,      product;

REGION     PRODUCT            PRODUCT_UNITS PRODUCT_SALES
---------- ------------------ ------------- -------------
SYS        TYPE                       15575     409075100
SYS        VIEW                       74060     392488355
SYS        INDEX                      11095      24907715
SYS        TABLE                      12285      46731930
SYS        CONTEXT                      105       2375240
SYS        LIBRARY                       70        284305
SYS        PACKAGE                     5880      63982695
SYS        SYNONYM                      210        823095
SYS        FUNCTION                    1435      11199755
SYS        OPERATOR                      70       2189705
SYS        DIRECTORY                    210       6622035
SYS        PROCEDURE                    490       3671185
SYS        JAVA CLASS                315262    4905172276
SYS        JAVA RESOURCE               6194     138006359
SYS        CONSUMER GROUP                70        269150
SYS        EVALUATION CONTEXT            35        204645
PUBLIC     SYNONYM                   427040    6842060686

17 rows selected.

Note how the top_regions subquery builds on the regional_sales subquery to generate the overall picture of our top-performing regions. This subquery itself is then used in a visit back to SALES to generate the product-level summary.

subquery factoring and dml

Subquery factoring is part of the SELECT statement itself, which means it can be used anywhere a SELECT is used. For example, the following statement formats are all valid.

INSERT INTO table_name
WITH subquery_name AS (
        SELECT ...
        FROM   ...
        )
SELECT ...
FROM   subquery_name;

UPDATE table_name
SET    column_name = ( WITH subquery_name AS (
                               SELECT ...
                               FROM   ...
                               )
                       SELECT ...
                       FROM   subquery_name );

DELETE
FROM   table_name
WHERE  column_name IN ( WITH subquery_name AS (
                                SELECT ...
                                FROM   ...
                                )
                        SELECT ...
                        FROM   subquery_name );

INSERT INTO table_name
VALUES ( (WITH subquery_name AS (
                  SELECT ...
                  FROM   ...
                  )
          SELECT ...
          FROM   subquery_name) );

forcing materialisation

We have seen both forms of subquery factoring in this article so far; these being with and without materialisation. As stated, the latter is an optimisation technique that can provide good performance gains with large datasets. Sometimes we might wish to force a subquery to be materialised on the occasions when the CBO chooses not to. We can do this via the MATERIALIZE hint. Note that this hint is currently undocumented so there is always a risk that it will be either deprecated or will simply stop working.

In the following example, we will include a subquery that is unlikely to be materialised by Oracle and then attempt to force the temporary table execution using the MATERIALIZE hint. We will begin with a version of the query without hints.

SQL> set autotrace traceonly explain

SQL> WITH inline_view AS (
  2     SELECT region
  3     ,      SUM(order_amt) AS total_orders
  4     FROM   sales
  5     GROUP  BY
  6            region
  7     )
  8  SELECT *
  9  FROM   inline_view;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2433 Card=28 Bytes=308)
   1    0   SORT (GROUP BY) (Cost=2433 Card=28 Bytes=308)
   2    1     TABLE ACCESS (FULL) OF 'SALES' (Cost=320 Card=1000000 Bytes=11000000)

It is not surprising that Oracle has chosen not to materialise this subquery. It is only used once so creating and loading a temporary table would be a waste of resources. However, we can force Oracle into taking this path by using the MATERIALIZE hint as follows.

SQL> WITH inline_view AS (
  2     SELECT /*+ materialize */
  3            region
  4     ,      SUM(order_amt) AS total_orders
  5     FROM   sales
  6     GROUP  BY
  7            region
  8     )
  9  SELECT *
 10  FROM   inline_view;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=28 Bytes=840)
   1    2     RECURSIVE EXECUTION OF 'SYS_LE_2_0'
   2    0   TEMP TABLE TRANSFORMATION
   3    2     VIEW (Cost=2 Card=28 Bytes=840)
   4    3       TABLE ACCESS (FULL) OF 'SYS_TEMP_0FD9D660A_16AFD3' (Cost=2 Card=28 Bytes=308)

This time we have the temporary table plan that we saw in previous examples (remember that Autotrace omits the recursive part of the plan that describes the direct-path load of the temporary table from the SALES table).

restrictions

Finally, there are a small number of restrictions with subquery factoring. These are outlined below.

unreferenced subqueries

Any subqueries we define must be referenced at least once otherwise Oracle will raise an exception as follows.

SQL> WITH unused_subquery AS (
  2     SELECT 'X' AS column_alias
  3     FROM   dual
  4     )
  5  SELECT SYSDATE
  6  FROM   dual;

   FROM   dual
          *
ERROR at line 3:
ORA-32035: unreferenced query name defined in WITH clause

subqueries in views

Subqueries can be included in views but there is a small "gotcha". Some DDL-generators wrap the view's SQL in parentheses and this raises an exception when subqueries are used, as follows.

SQL> CREATE VIEW view_with_subquery
  2  AS
  3     (
  4      WITH subquery_name AS (
  5         SELECT SYSDATE AS date_column
  6         FROM   dual
  7         )
  8      SELECT date_column
  9      FROM   subquery_name
 10     );

   )
   *
ERROR at line 10:
ORA-32034: unsupported use of WITH clause

The correct way to code this is to remove the outer parentheses from the SQL, as follows.

SQL> CREATE OR REPLACE VIEW view_with_subquery
  2  AS
  3     WITH subquery_name AS (
  4        SELECT SYSDATE AS date_column
  5        FROM   dual
  6        )
  7     SELECT date_column
  8     FROM   subquery_name;

View created.

nesting subqueries

Subqueries can reference preceding subqueries (as we saw earlier), but they cannot be nested inside each other. The following example attempts to nest a subquery.

SQL> WITH outer_subquery AS (
  2          WITH nested_subquery AS (
  3                  SELECT SYSDATE AS date_column
  4                  FROM   dual
  5                  )
  6          )
  7  SELECT date_column
  8  FROM   outer_subquery;

        WITH nested_subquery AS (
        *
ERROR at line 2:
ORA-32034: unsupported use of WITH clause

subqueries and object types

This is a slightly more esoteric restriction, but if we use any object columns in our subqueries, Oracle will not materialise the dataset, even if we try to force it. In the following example, we create a small object type and try to materialise a dataset that includes it.

SQL> CREATE TYPE object_type AS OBJECT
  2  ( attr1 VARCHAR2(30)
  3  , attr2 VARCHAR2(30)
  4  );
  5  /

Type created.

SQL> set autotrace traceonly explain

SQL> WITH subquery_with_object AS (
  2     SELECT /*+ materialize */
  3            object_type(region,product)
  4     FROM   sales
  5     )
  6  SELECT COUNT(*)
  7  FROM   subquery_with_object;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=320 Card=1)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'SALES' (Cost=320 Card=1000000)

We can verify that the subquery can be materialised without the object type as follows.

SQL> WITH subquery_without_object AS (
  2     SELECT /*+ materialize */
  3            region
  4     ,      product
  5     FROM   sales
  6     )
  7  SELECT COUNT(*)
  8  FROM   subquery_without_object;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1)
   1    2     RECURSIVE EXECUTION OF 'SYS_LE_2_0'
   2    0   TEMP TABLE TRANSFORMATION
   3    2     SORT (AGGREGATE)
   4    3       VIEW (Cost=2 Card=1000000)
   5    4         TABLE ACCESS (FULL) OF 'SYS_TEMP_0FD9D660F_16AFD3' (Cost=2 Card=1000000 Bytes=15000000)

For more information read the subquery factoring clause in the SQL Reference.

source code

The source code for the examples in this article can be downloaded from here.

Adrian Billington, December 2005 (updated June 2007)

oracle-developer.net