tuning pl/sql file i/o

Unloading Oracle data to flat-files is still very common. There are numerous "unloader" utilities on the web for this purpose and there are also many related topics in the Oracle forums. Methods for writing data to flat-files vary, but strangely Oracle has never provided a tool to do this for us. The tools that are provided by Oracle (export, Data Pump, writeable external tables) write data quickly, but in a proprietary format, so for true ASCII flat-files, we have to resort to our own homegrown utilities.

There are several ways to generate flat-files from data stored in Oracle. We can use Perl, C, C++, Pro*C, Java and other languages that interface with Oracle but it is far more common for us to use SQL or PL/SQL to perform this operation. For example, sqlplus can spool data to flat-files very quickly, but it is a command utility and not part of the database. This makes it generally unsuited to being part of a robust database application.

UTL_FILE, on the other hand, is a built-in package around which we can very easily base our unloader utilities. The main issue with UTL_FILE, however, is that it is a relatively slow tool for unloading data, but in this article we will demonstrate some methods for speeding this up. We will see that with simple techniques we can achieve significant performance gains for our data unloads.

setup

We are going to test several versions of a standard data-unload routine. We will run these on an 11g Release 1 database, but all examples are compatible with 10g and most of them with 9i. The data to be dumped to flat-file will be sourced from a single table of 1 million rows, which we will create as follows.

SQL> CREATE TABLE source_data
  2  ( x, y, z
  3  , CONSTRAINT source_data_pk
  4       PRIMARY KEY (x,y,z)
  5  )
  6  ORGANIZATION INDEX
  7  AS
  8     SELECT ROWNUM           AS x
  9     ,      RPAD('x',50,'x') AS y
 10     ,      RPAD('y',50,'y') AS z
 11     FROM   dual
 12     CONNECT BY ROWNUM <= 1000000;

Table created.

We have created an IOT (index-organized table) to enable us to fully cache the data and eliminate any physical I/O. Each test we run will therefore incur roughly the same I/O costs. Using Autotrace, we will run a couple of full scans of this data until it is all in the buffer cache, as follows.

SQL> set autotrace traceonly statistics

SQL> SELECT * FROM source_data;

1000000 rows selected.


Statistics
----------------------------------------------------------
         88  recursive calls
          0  db block gets
      17779  consistent gets
      15647  physical reads
          0  redo size
  108077685  bytes sent via SQL*Net to client
      16231  bytes received via SQL*Net from client
       2001  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

SQL> SELECT * FROM source_data;

1000000 rows selected.


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      17668  consistent gets
          0  physical reads
          0  redo size
  108077883  bytes sent via SQL*Net to client
      16231  bytes received via SQL*Net from client
       2001  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

We have reduced the physical I/O as intended, so to complete our setup we will create an Oracle directory below. This is where we will write our flat-files.

SQL> CREATE DIRECTORY dump_dir AS '/u01/app/oracle/dir';

Directory created.

baseline utl_file performance

Before we can tune our code, we need to know our baseline performance. In this case, it will be the time it takes to unload 1 million rows to a flat-file using UTL_FILE.PUT_LINE calls, using the following PL/SQL block.

SQL> DECLARE
  2
  3     v_file  UTL_FILE.FILE_TYPE;
  4     v_name  VARCHAR2(128) := 'utl_file_untuned.txt';
  5     v_lines PLS_INTEGER := 0;
  6
  7  BEGIN
  8
  9     v_file := UTL_FILE.FOPEN('DUMP_DIR',v_name,'W',32767);
 10
 11     FOR r IN (SELECT x || ',' || y || ',' || z AS csv
 12               FROM   source_data)
 13     LOOP
 14        UTL_FILE.PUT_LINE(v_file, r.csv);
 15        v_lines := v_lines + 1;
 16     END LOOP;
 17
 18     UTL_FILE.FCLOSE(v_file);
 19
 20     DBMS_OUTPUT.PUT_LINE('File='||v_name||'; Lines='||v_lines);
 21
 22  END;
 23  /

File=utl_file_untuned.txt; Lines=1000000

PL/SQL procedure successfully completed.

Elapsed: 00:01:06.53

Our baseline performance is approximately 66 seconds to write the data to file. Note the following:

Line 11: each record is pre-formatted with a delimiter in the SQL cursor for convenience. A dump utility would usually accept the delimiter as a parameter and concatenate a record piecemeal (especially if using dynamic SQL);
Lines 11-13: we are using an implicit cursor-for-loop. From Oracle 10g Release 1, this will be optimised by the PL/SQL compiler into bulk collects with an array-size of 100 rows. All examples in this article will use either implicit or explicit bulk fetches of the same size. Note that readers on 9i databases will need to convert this and other implicit cursor examples to use explicit array fetches with BULK COLLECT to ensure that all examples are comparable.

tuning (1): buffering to reduce utl_file i/o

Our baseline code has two repetitive operations. First there is an incremented counter which is used to instrument the example. This has a negligible impact overall. More importantly there are 1 million UTL_FILE I/O operations and these have a far more significant impact. Tracing the baseline example with the PL/SQL Profiler and the new PL/SQL Hierarchical Profiler clearly shows that 50% of the elapsed time is accounted for by our 1 million UTL_FILE.PUT_LINE calls.

Our first technique for tuning this, therefore, will be to reduce the number of I/O operations by buffering the output data. This is quite a simple method. We will use a local variable to buffer up to 32K of data before writing it to file, as follows.

SQL> DECLARE
  2
  3     v_file    UTL_FILE.FILE_TYPE;
  4     v_buffer  VARCHAR2(32767);
  5     v_name    VARCHAR2(128) := 'utl_file_buffered.txt';
  6     v_lines   PLS_INTEGER := 0;
  7     c_eol     CONSTANT VARCHAR2(1) := CHR(10);
  8     c_eollen  CONSTANT PLS_INTEGER := LENGTH(c_eol);
  9     c_maxline CONSTANT PLS_INTEGER := 32767;
 10
 11  BEGIN
 12
 13     v_file := UTL_FILE.FOPEN('DUMP_DIR',v_name,'W',32767);
 14
 15     FOR r IN (SELECT x || ',' || y || ',' || z AS csv
 16               FROM   source_data)
 17     LOOP
 18
 19        IF LENGTH(v_buffer) + c_eollen + LENGTH(r.csv) <= c_maxline THEN
 20           v_buffer := v_buffer || c_eol || r.csv;
 21        ELSE
 22           IF v_buffer IS NOT NULL THEN
 23              UTL_FILE.PUT_LINE(v_file, v_buffer);
 24           END IF;
 25           v_buffer := r.csv;
 26        END IF;
 27
 28        v_lines := v_lines + 1;
 29
 30     END LOOP;
 31
 32     UTL_FILE.PUT_LINE(v_file, v_buffer);
 33     UTL_FILE.FCLOSE(v_file);
 34
 35     DBMS_OUTPUT.PUT_LINE('File='||v_name||'; Lines='||v_lines);
 36
 37  END;
 38  /

File=utl_file_buffered.txt; Lines=1000000

PL/SQL procedure successfully completed.

Elapsed: 00:00:30.06

The algorithm (highlighted above) is reasonably self-explanatory, but we will describe it anyway:

Lines 19-20: UTL_FILE has a maximum write-size of 32,767 bytes. If there is enough room in the buffer for a new record to be added (including newline), then we simply append the new data together with a newline;
Lines 21-26: if we can't fit the new record into the buffer, or the buffer is null (in which case the buffer-length calculation will also evaluate to null), then we must either flush it or start it. The buffer will only be null on the very first entry to the loop, so we make it the very last test;
Lines 22-24: when the buffer is too full to accept any new data, we need to write it to file with UTL_FILE;
Line 25: regardless of whether the buffer has been written or hasn't been started it, we reset/start it with the current data record;
Line 32: once the loop is complete, we write the remaining buffered data to the flat-file.

This technique is simple yet extremely effective. We have reduced our elapsed time to less than half the baseline: in other words, this unloading routine is now twice as fast (for this dataset). With an average record size of 105 bytes, we can buffer approximately 310 records, which means we reduce our UTL_FILE calls to roughly 3,200.

Given the effectiveness of the buffering technique, we will continue to use it for the remaining examples in this article.

a note on utl_file buffering

It is possible to replicate this buffering algorithm using the UTL_FILE.PUT, UTL_FILE.NEW_LINE and UTL_FILE.FFLUSH procedures, but a test with our sample data took over 85 seconds, which is considerably slower than our baseline. For this reason, the example is omitted from this article, but is included in the accompanying download.

tuning (2): using temporary clobs

From Oracle 10g onwards, it is possible to write a CLOB to a file with a single call, using the DBMS_XSLPROCESSOR.CLOB2FILE procedure. In the following example, we will prepare a temporary CLOB with our data instead of writing it with UTL_FILE. When all source data has been added to the CLOB, we will write it to a flat-file in a single call.

SQL> DECLARE
  2
  3     v_file    CLOB;
  4     v_buffer  VARCHAR2(32767);
  5     v_name    VARCHAR2(128) := 'clob2file_buffered.txt';
  6     v_lines   PLS_INTEGER := 0;
  7     v_eol     VARCHAR2(2);
  8     v_eollen  PLS_INTEGER;
  9     c_maxline CONSTANT PLS_INTEGER := 32767;
 10
 11  BEGIN
 12
 13     v_eol := CASE
 14                 WHEN DBMS_UTILITY.PORT_STRING LIKE 'IBMPC%'
 15                 THEN CHR(13)||CHR(10)
 16                 ELSE CHR(10)
 17              END;
 18     v_eollen := LENGTH(v_eol);
 19
 20     DBMS_LOB.CREATETEMPORARY(v_file, TRUE);
 21
 22     FOR r IN (SELECT x || ',' || y || ',' || z AS csv
 23               FROM   source_data)
 24     LOOP
 25
 26        IF LENGTH(v_buffer) + v_eollen + LENGTH(r.csv) <= c_maxline THEN
 27           v_buffer := v_buffer || v_eol || r.csv;
 28        ELSE
 29           IF v_buffer IS NOT NULL THEN
 30              DBMS_LOB.WRITEAPPEND(
 31                 v_file, LENGTH(v_buffer) + v_eollen, v_buffer || v_eol
 32                 );
 33           END IF;
 34           v_buffer := r.csv;
 35        END IF;
 36
 37        v_lines := v_lines + 1;
 38
 39     END LOOP;
 40
 41     IF LENGTH(v_buffer) > 0 THEN
 42        DBMS_LOB.WRITEAPPEND(
 43           v_file, LENGTH(v_buffer) + v_eollen, v_buffer || v_eol
 44           );
 45     END IF;
 46
 47     DBMS_XSLPROCESSOR.CLOB2FILE(v_file, 'DUMP_DIR', v_name);
 48     DBMS_LOB.FREETEMPORARY(v_file);
 49
 50     DBMS_OUTPUT.PUT_LINE('File='||v_name||'; Lines='||v_lines);
 51
 52  END;
 53  /

File=clob2file_buffered.txt; Lines=1000000

PL/SQL procedure successfully completed.

Elapsed: 00:00:28.65

The CLOB-specific code is highlighted above and is self-explanatory (perhaps with the exception of the end-of-line character assignment on lines 13-17 which is different for Windows. UTL_FILE manages the port-specific end-of-line conversions for us, but with CLOBs we must manage this ourselves).

Of particular interest is the DBMS_XSLPROCESSOR call on line 47, which is our only write operation to the destination flat-file. We can see overall that this technique is similar in performance to our buffered UTL_FILE mechanism (the CLOB method was slightly quicker). We therefore have an alternative method for writing data, but there will be additional costs associated with using CLOBs (for example, temporary tablespace and buffer cache). If the volume of data to be dumped is high, then this method might put too much stress on our temporary tablespace and cause problems for other users (large sort operations, hash joins, global temporary tables etc). Care should therefore be taken when using this method.

tuning (3): parallel execution with pipelined functions

Our final technique for quicker data-dumping from PL/SQL will be to split the workload and have the data written in separate streams. We can do this quite simply by using multiple sessions, with each reading and dumping different ranges of source data (Tom Kyte calls this "DIY parallelism"). A simpler (and possibly more elegant) method is to use Oracle's parallel query (PQ) mechanism and create a parallel pipelined function to split the source data and dump it using multiple PQ slaves.

For this test, we will create a parallel pipelined function that writes the source data to flat-file and returns a single summary record per session. This record will include details of the file and the session that wrote it. It is assumed that readers are familiar with the concept of parallel pipelined functions (some background reading is available if required).

required types

We will begin by creating an object type to define the structure of our pipelined function's return data, as follows.

SQL> CREATE TYPE dump_ot AS OBJECT
  2  ( file_name  VARCHAR2(128)
  3  , no_records NUMBER
  4  , session_id NUMBER
  5  );
  6  /

Type created.

Pipelined functions return collections of data, so we will create a nested table type based on our new object.

SQL> CREATE TYPE dump_ntt AS TABLE OF dump_ot;
  2  /

Type created.

We are now ready to create our parallel pipelined function. We will create and test two versions: one with UTL_FILE and one with a CLOB, because there has been little to distinguish them in the timed comparisons so far.

tuning (3a): parallel utl_file solution

We will begin by creating a parallel pipelined function that uses UTL_FILE to write data, as follows.

SQL> CREATE FUNCTION parallel_dump (
  2                  p_source    IN SYS_REFCURSOR,
  3                  p_filename  IN VARCHAR2,
  4                  p_directory IN VARCHAR2
  5                  ) RETURN dump_ntt
  6                    PIPELINED
  7                    PARALLEL_ENABLE (PARTITION p_source BY ANY) AS
  8
  9     TYPE row_ntt IS TABLE OF VARCHAR2(32767);
 10     v_rows    row_ntt;
 11     v_file    UTL_FILE.FILE_TYPE;
 12     v_buffer  VARCHAR2(32767);
 13     v_sid     NUMBER;
 14     v_name    VARCHAR2(128);
 15     v_lines   PLS_INTEGER := 0;
 16     c_eol     CONSTANT VARCHAR2(1) := CHR(10);
 17     c_eollen  CONSTANT PLS_INTEGER := LENGTH(c_eol);
 18     c_maxline CONSTANT PLS_INTEGER := 32767;
 19
 20  BEGIN
 21
 22     SELECT sid INTO v_sid FROM v$mystat WHERE ROWNUM = 1;
 23     v_name := p_filename || '_' || TO_CHAR(v_sid) || '.txt';
 24     v_file := UTL_FILE.FOPEN(p_directory, v_name, 'w', 32767);
 25
 26     LOOP
 27       FETCH p_source BULK COLLECT INTO v_rows LIMIT 100;
 28
 29        FOR i IN 1 .. v_rows.COUNT LOOP
 30
 31           IF LENGTH(v_buffer) + c_eollen + LENGTH(v_rows(i)) <= c_maxline THEN
 32              v_buffer := v_buffer || c_eol || v_rows(i);
 33           ELSE
 34              IF v_buffer IS NOT NULL THEN
 35                 UTL_FILE.PUT_LINE(v_file, v_buffer);
 36              END IF;
 37              v_buffer := v_rows(i);
 38           END IF;
 39
 40        END LOOP;
 41
 42        v_lines := v_lines + v_rows.COUNT;
 43
 44        EXIT WHEN p_source%NOTFOUND;
 45     END LOOP;
 46     CLOSE p_source;
 47
 48     UTL_FILE.PUT_LINE(v_file, v_buffer);
 49     UTL_FILE.FCLOSE(v_file);
 50
 51     PIPE ROW (dump_ot(v_name, v_lines, v_sid));
 52     RETURN;
 53
 54  END parallel_dump;
 55  /

Function created.

Before we run a timed test using this function, note the following:

Line 7: we parallel-enable the function and state that Oracle can divide the source data amongst the PQ slaves any way it wishes;
Lines 22-23: each PQ slave will write to its own file, so we append the SID to the filename to make each one unique;
Lines 26-27: the source cursor will be passed as a refcursor parameter, so we must explicitly code a bulk fetch of 100 records to emulate the other examples in this article (i.e. those that benefit from 10g PL/SQL optimisation);
Line 51: on completion of the flat-file, we return a single record summarising the file details.

We will execute this function, as follows.

SQL> SELECT *
  2  FROM   TABLE(
  3            parallel_dump(
  4               CURSOR(SELECT /*+ PARALLEL(s,4) */
  5                             x ||','||
  6                             y ||','||
  7                             z AS csv
  8                      FROM   source_data s),
  9               'utl_file_parallel_pipelined',
 10               'DUMP_DIR'
 11               )) nt;

FILE_NAME                                          NO_RECORDS SESSION_ID
-------------------------------------------------- ---------- ----------
utl_file_parallel_pipelined_136.txt                    190758        136
utl_file_parallel_pipelined_135.txt                    192640        135
utl_file_parallel_pipelined_117.txt                    288960        117
utl_file_parallel_pipelined_121.txt                    327642        121

4 rows selected.

Elapsed: 00:00:10.26

This is impressive! We have reduced the elapsed time of our write operation from a baseline of 66 seconds to just 10 seconds. Of course, we now have four files instead of one, but we can easily append these files together with a script of some description (Perl, shell etc). Preferably, we would leave them as separate files and simply read them as though they were a single file by using an external table with an appropriate LOCATION setting (i.e. one that includes all four files).

tuning (3b): parallel temporary clob solution

To complete our tuning tests, we will re-create the parallel pipelined function to use the CLOB method (as noted earlier, this requires at least 10g).

SQL> CREATE OR REPLACE FUNCTION parallel_dump (
  2                             p_source    IN SYS_REFCURSOR,
  3                             p_filename  IN VARCHAR2,
  4                             p_directory IN VARCHAR2
  5                             ) RETURN dump_ntt
  6                               PIPELINED
  7                               PARALLEL_ENABLE (PARTITION p_source BY ANY) AS
  8
  9     TYPE row_ntt IS TABLE OF VARCHAR2(32767);
 10     v_rows    row_ntt;
 11     v_file    CLOB;
 12     v_buffer  VARCHAR2(32767);
 13     v_sid     NUMBER;
 14     v_name    VARCHAR2(128);
 15     v_lines   PLS_INTEGER := 0;
 16     v_eol     VARCHAR2(2);
 17     v_eollen  PLS_INTEGER;
 18     c_maxline CONSTANT PLS_INTEGER := 32767;
 19
 20  BEGIN
 21
 22     v_eol := CASE
 23                 WHEN DBMS_UTILITY.PORT_STRING LIKE 'IBMPC%'
 24                 THEN CHR(13)||CHR(10)
 25                 ELSE CHR(10)
 26              END;
 27     v_eollen := LENGTH(v_eol);
 28
 29     SELECT sid INTO v_sid FROM v$mystat WHERE ROWNUM = 1;
 30     v_name := p_filename || '_' || TO_CHAR(v_sid) || '.txt';
 31
 32     DBMS_LOB.CREATETEMPORARY(v_file, TRUE);
 33
 34     LOOP
 35       FETCH p_source BULK COLLECT INTO v_rows LIMIT 100;
 36
 37        FOR i IN 1 .. v_rows.COUNT LOOP
 38
 39           IF LENGTH(v_buffer) + v_eollen + LENGTH(v_rows(i)) <= c_maxline THEN
 40              v_buffer := v_buffer || v_eol || v_rows(i);
 41           ELSE
 42              IF v_buffer IS NOT NULL THEN
 43                 DBMS_LOB.WRITEAPPEND(
 44                    v_file, LENGTH(v_buffer) + v_eollen, v_buffer || v_eol
 45                    );
 46              END IF;
 47              v_buffer := v_rows(i);
 48           END IF;
 49
 50        END LOOP;
 51
 52        v_lines := v_lines + v_rows.COUNT;
 53
 54        EXIT WHEN p_source%NOTFOUND;
 55     END LOOP;
 56     CLOSE p_source;
 57
 58     IF LENGTH(v_buffer) > 0 THEN
 59        DBMS_LOB.WRITEAPPEND(
 60           v_file, LENGTH(v_buffer) + v_eollen, v_buffer || v_eol
 61           );
 62     END IF;
 63
 64     DBMS_XSLPROCESSOR.CLOB2FILE(v_file, p_directory, v_name);
 65     DBMS_LOB.FREETEMPORARY(v_file);
 66
 67     PIPE ROW (dump_ot(v_name, v_lines, v_sid));
 68     RETURN;
 69
 70  END parallel_dump;
 71  /

Function created.

We have already described the CLOB and parallel pipelined function techniques, so we will proceed with our timing test, as follows.

SQL> SELECT *
  2  FROM   TABLE(
  3            parallel_dump(
  4               CURSOR(SELECT /*+ PARALLEL(s,4) */
  5                             x ||','||
  6                             y ||','||
  7                             z AS csv
  8                      FROM   source_data s),
  9               'clob2file_parallel_pipelined',
 10               'DUMP_DIR'
 11               )) nt;

FILE_NAME                                          NO_RECORDS SESSION_ID
-------------------------------------------------- ---------- ----------
clob2file_parallel_pipelined_200.txt                   248504        200
clob2file_parallel_pipelined_196.txt                   232768        196
clob2file_parallel_pipelined_198.txt                   248192        198
clob2file_parallel_pipelined_192.txt                   270536        192

4 rows selected.

Elapsed: 00:00:27.84

The CLOB technique does not scale at all. It takes approximately the same time in serial or parallel mode, meaning that we should probably avoid this technique and use UTL_FILE instead.

file validation

We will finish our testing by examining the flat-files to ensure our techniques are comparable. For brevity, we will simply look at file sizes and byte-counts as follows.

/u01/app/oracle/dir> ls -rtl *.txt
-rw-r--r--   1 oracle   dba      108888896 Feb  8 18:22 utl_file_untuned.txt
-rw-r--r--   1 oracle   dba      108888896 Feb  8 18:25 utl_file_buffered.txt
-rw-r--r--   1 oracle   dba      108888896 Feb  8 18:30 clob2file_buffered.txt
-rw-r--r--   1 oracle   dba      20769834 Feb  8 18:33 utl_file_parallel_pipelined_136.txt
-rw-r--r--   1 oracle   dba      20978496 Feb  8 18:33 utl_file_parallel_pipelined_135.txt
-rw-r--r--   1 oracle   dba      31477376 Feb  8 18:33 utl_file_parallel_pipelined_117.txt
-rw-r--r--   1 oracle   dba      35663190 Feb  8 18:33 utl_file_parallel_pipelined_121.txt
-rw-r--r--   1 oracle   dba      27036864 Feb  8 18:52 clob2file_parallel_pipelined_198.txt
-rw-r--r--   1 oracle   dba      25355584 Feb  8 18:52 clob2file_parallel_pipelined_196.txt
-rw-r--r--   1 oracle   dba      27049851 Feb  8 18:52 clob2file_parallel_pipelined_200.txt
-rw-r--r--   1 oracle   dba      29446597 Feb  8 18:52 clob2file_parallel_pipelined_192.txt

/u01/app/oracle/dir> wc -c utl_file_parallel*
 20769834 utl_file_parallel_pipelined_136.txt
 20978496 utl_file_parallel_pipelined_135.txt
 31477376 utl_file_parallel_pipelined_117.txt
 35663190 utl_file_parallel_pipelined_121.txt
 108888896 total

/u01/app/oracle/dir> wc -c clob2file_parallel*
 29446597 clob2file_parallel_pipelined_192.txt
 25355584 clob2file_parallel_pipelined_196.txt
 27036864 clob2file_parallel_pipelined_198.txt
 27049851 clob2file_parallel_pipelined_200.txt
 108888896 total

conclusions

We can summarise our findings as follows:

we can achieve good performance gains by using a buffer to reduce our I/O operations;
using a CLOB reduces our file I/O to a single operation and in serial mode can perform as well as UTL_FILE;
the buffering APIs in UTL_FILE (PUT, FFLUSH, NEW_LINE) do not perform as well as our own PL/SQL variable buffering;
our time savings from buffering are proportional to the time spent in file-writing only. If most of the routine's time is spent in data fetching (i.e. an intensive source cursor), the overall impact of UTL_FILE tuning will be reduced;
dividing the workload by using parallel pipelined functions can provide significant performance increases (six times faster with our sample data in this article). This can also reduce the overall time it takes to access all of the source data;
care should be taken over the resources required by the CLOB technique, particularly in PQ scenarios where it becomes unscalable.

For sites that write a lot of flat-file data from PL/SQL, one or more of these techniques can be adopted to improve the general performance of these routines.

source code

The source code for the examples in this article can be downloaded from here.

Adrian Billington, February 2008

oracle-developer.net