row-generation without i/o

This article demonstrates various methods for generating dummy records for use in SQL queries. Data generation has many uses in SQL (for example, to fill sparse data, to pivot data or to generate test data). There are several techniques for generating dummy data, but the particular focus of this article is on I/O-free (or I/O-reduced) generation.

We shall be looking at the following methods:

DUAL..CONNECT BY;
pipelined functions; and
MODEL.

As we are only interested in "low-cost" methods, we will not be looking at queries against ALL_OBJECTS (or similar) to generate data. Note also that this article summarises popular methods and, where applicable, the sources are credited.

a note on method

To compare the various methods, we will generate 10,000 dummy rows from each SQL statement. All examples are executed in sqlplus with autotrace (traceonly statistics) on a 10.1 database. They are executed twice to compare initial resource costs with the subsequent I/O-free running costs. The "wall-clock" timings are derived from the TIMER package available on this site.

dual..connect by

Several methods are available that use DUAL as the source of data generation. The most efficient of these is to run a constrained CONNECT BY ROWNUM query, as follows.

SQL> exec timer.snap;

PL/SQL procedure successfully completed.

SQL> SELECT ROWNUM AS rn
  2  FROM   dual
  3  CONNECT BY ROWNUM < 10000;

10000 rows selected.


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          0  consistent gets
          0  physical reads
          0  redo size
     174310  bytes sent via SQL*Net to client
       7990  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL> exec timer.show('DUAL');

[DUAL] 0.20 seconds

PL/SQL procedure successfully completed.

Interestingly, this method is "get-free" from the very first execution. We can see from the autotrace output that there are no physical or logical I/Os with this method. There is a caveat to this, however. This example has been run on a 10g database, which means it is subject to FAST DUAL optimisation. The second time we run this example, we see the recursive calls drop as there is no hard parse and the time reduces slightly as a result.

SQL> exec timer.snap;

PL/SQL procedure successfully completed.

SQL> SELECT ROWNUM AS rn
  2  FROM   dual
  3  CONNECT BY ROWNUM < 10000;

10000 rows selected.


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          0  consistent gets
          0  physical reads
          0  redo size
     174310  bytes sent via SQL*Net to client
       7990  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL> exec timer.show('DUAL');

[DUAL] 0.18 seconds

PL/SQL procedure successfully completed.

To emulate 9i behaviour, we can re-run the example but ensure that Oracle accesses DUAL itself (we can do this by selecting the DUMMY column, rather than an expression. This time, a small I/O cost is incurred.

SQL> exec timer.snap;

PL/SQL procedure successfully completed.

SQL> SELECT dummy
  2  FROM   dual
  3  CONNECT BY ROWNUM < 10000;

10000 rows selected.


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
     172448  bytes sent via SQL*Net to client
       7990  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL> exec timer.show('DUAL WITH DUMMY');

[DUAL WITH DUMMY] 0.29 seconds

PL/SQL procedure successfully completed.

On the second execution, we remove the cost of the hard-parse, but not the I/O.

SQL> exec timer.snap;

PL/SQL procedure successfully completed.

SQL> SELECT dummy
  2  FROM   dual
  3  CONNECT BY ROWNUM < 10000;

10000 rows selected.


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
     172448  bytes sent via SQL*Net to client
       7990  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL> exec timer.show('DUAL WITH DUMMY');

[DUAL WITH DUMMY] 0.18 seconds

PL/SQL procedure successfully completed.

Regardless of how many times we run this SQL, or even the number of records we generate, it costs us three logical I/Os (which has little to no impact on overall runtimes). On 9i (test database version 9.2.0.6) the cost is 7 logical I/Os and 5 memory sorts.

pipelined functions

Ever since Oracle 8.0, it has been possible to select from a function that returns a collection. This is known as a "table function". As a means to manipulate and return large datasets, however, table functions are non-scalable and PGA memory-intensive. In 9.0.1, Oracle improved on this dramatically with pipelined functions. Like table functions, pipelined functions also return collections of data, but in streams. Small "packets" of data are handed back to the consumer as they are produced, meaning that the PGA footprint is small and generally non-invasive.

Given this, we can create a simple pipelined function to generate n rows of dummy data. First, we require a collection type as follows.

SQL> CREATE OR REPLACE TYPE number_ntt AS TABLE OF NUMBER;
  2  /

Type created.

Next we create a pipelined function to generate our data.

SQL> CREATE FUNCTION row_generator (
  2                  rows_in IN PLS_INTEGER
  3                  ) RETURN number_ntt PIPELINED IS
  4  BEGIN
  5     FOR i IN 1 .. rows_in LOOP
  6        PIPE ROW (i);
  7     END LOOP;
  8     RETURN;
  9  END;
 10  /

Function created.

As with the DUAL examples above, we will use this pipelined function to generate 10,000 rows of data twice. The results of the first execution are as follows.

SQL> exec timer.snap;

PL/SQL procedure successfully completed.

SQL> SELECT *
  2  FROM   TABLE(row_generator(10000));

10000 rows selected.


Statistics
----------------------------------------------------------
         27  recursive calls
          0  db block gets
         67  consistent gets
          0  physical reads
          0  redo size
     174316  bytes sent via SQL*Net to client
       7990  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL> exec timer.show('PIPELINED FUNCTION');

[PIPELINED FUNCTION] 0.18 seconds

PL/SQL procedure successfully completed.

The first execution of this SQL has higher resource costs than the DUAL method, but is significantly quicker. If we run it a second time, we can see the I/O and recursive SQL costs reduce to nothing. This in turn has a slight benefit for our runtime.

SQL> exec timer.snap;

PL/SQL procedure successfully completed.

SQL> SELECT *
  2  FROM   TABLE(row_generator(10000));

10000 rows selected.


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          0  consistent gets
          0  physical reads
          0  redo size
     174316  bytes sent via SQL*Net to client
       7990  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL> exec timer.show('PIPELINED FUNCTION');

[PIPELINED FUNCTION] 0.16 seconds

PL/SQL procedure successfully completed.

model

The final technique to look at is the MODEL clause. This is a new feature of 10g and gives the SQL language spreadsheet-like capabilities. It is a difficult technique to learn and most developers (me included) look at it with some trepidation! Generating dummy data is not exactly stretching MODEL's capabilities, but is relatively simple as follows.

SQL> exec timer.snap;

PL/SQL procedure successfully completed.

SQL> SELECT *
  2  FROM   dual
  3  MODEL
  4     DIMENSION BY (0 dim)
  5     MEASURES (0 rnum)
  6     RULES ITERATE (10000) (
  7        rnum[ITERATION_NUMBER] = ITERATION_NUMBER
  8        );

10000 rows selected.


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          0  consistent gets
          0  physical reads
          0  redo size
     214168  bytes sent via SQL*Net to client
       7990  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL> exec timer.show('MODEL');

[MODEL] 0.26 seconds

PL/SQL procedure successfully completed.

We can see that MODEL generates no I/O, even on the first execution. In the example SQL, we are not accessing any segment data, hence this is a CPU-only exercise. It is, however, slower than the other techniques discussed above. On the second execution we see the following results.

SQL> exec timer.snap;

PL/SQL procedure successfully completed.

SQL> SELECT *
  2  FROM   dual
  3  MODEL
  4     DIMENSION BY (0 dim)
  5     MEASURES (0 rnum)
  6     RULES ITERATE (10000) (
  7        rnum[ITERATION_NUMBER] = ITERATION_NUMBER
  8        );

10000 rows selected.


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          0  consistent gets
          0  physical reads
          0  redo size
     214168  bytes sent via SQL*Net to client
       7990  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

SQL> exec timer.show('MODEL');

[MODEL] 0.26 seconds

PL/SQL procedure successfully completed.

As expected, subsequent runtimes are comparable. The only change in resource cost is the single recursive SQL cost during parsing.

acknowledgements

Credit for a variation of the MODEL example goes to an Ask Tom reader named Ant, as detailed in the Nov/Dec 2004 Oracle Magazine. I can't remember where I first saw the DUAL..CONNECT BY technique, but it is commonly used in online articles and forum postings. It no doubt has some lineage back to Tom Kyte!

source code

The source code for the examples in this article can be downloaded from here.

Adrian Billington, November 2004

oracle-developer.net