tuning pl/sql file i/o
Unloading Oracle data to flat-files is still very common. There are numerous "unloader" utilities on the web for this purpose and there are also many related topics in the Oracle forums. Methods for writing data to flat-files vary, but strangely Oracle has never provided a tool to do this for us. The tools that are provided by Oracle (export, Data Pump, writeable external tables) write data quickly, but in a proprietary format, so for true ASCII flat-files, we have to resort to our own homegrown utilities.
There are several ways to generate flat-files from data stored in Oracle. We can use Perl, C, C++, Pro*C, Java and other languages that interface with Oracle but it is far more common for us to use SQL or PL/SQL to perform this operation. For example, sqlplus can spool data to flat-files very quickly, but it is a command utility and not part of the database. This makes it generally unsuited to being part of a robust database application.
UTL_FILE, on the other hand, is a built-in package around which we can very easily base our unloader utilities. The main issue with UTL_FILE, however, is that it is a relatively slow tool for unloading data, but in this article we will demonstrate some methods for speeding this up. We will see that with simple techniques we can achieve significant performance gains for our data unloads.
We are going to test several versions of a standard data-unload routine. We will run these on an 11g Release 1 database, but all examples are compatible with 10g and most of them with 9i. The data to be dumped to flat-file will be sourced from a single table of 1 million rows, which we will create as follows.
SQL> CREATE TABLE source_data 2 ( x, y, z 3 , CONSTRAINT source_data_pk 4 PRIMARY KEY (x,y,z) 5 ) 6 ORGANIZATION INDEX 7 AS 8 SELECT ROWNUM AS x 9 , RPAD('x',50,'x') AS y 10 , RPAD('y',50,'y') AS z 11 FROM dual 12 CONNECT BY ROWNUM <= 1000000;
We have created an IOT (index-organized table) to enable us to fully cache the data and eliminate any physical I/O. Each test we run will therefore incur roughly the same I/O costs. Using Autotrace, we will run a couple of full scans of this data until it is all in the buffer cache, as follows.
SQL> set autotrace traceonly statistics SQL> SELECT * FROM source_data;
1000000 rows selected. Statistics ---------------------------------------------------------- 88 recursive calls 0 db block gets 17779 consistent gets 15647 physical reads 0 redo size 108077685 bytes sent via SQL*Net to client 16231 bytes received via SQL*Net from client 2001 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000000 rows processed
SQL> SELECT * FROM source_data;
1000000 rows selected. Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 17668 consistent gets 0 physical reads 0 redo size 108077883 bytes sent via SQL*Net to client 16231 bytes received via SQL*Net from client 2001 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000000 rows processed
We have reduced the physical I/O as intended, so to complete our setup we will create an Oracle directory below. This is where we will write our flat-files.
SQL> CREATE DIRECTORY dump_dir AS '/u01/app/oracle/dir';
baseline utl_file performance
Before we can tune our code, we need to know our baseline performance. In this case, it will be the time it takes to unload 1 million rows to a flat-file using UTL_FILE.PUT_LINE calls, using the following PL/SQL block.
SQL> DECLARE 2 3 v_file UTL_FILE.FILE_TYPE; 4 v_name VARCHAR2(128) := 'utl_file_untuned.txt'; 5 v_lines PLS_INTEGER := 0; 6 7 BEGIN 8 9 v_file := UTL_FILE.FOPEN('DUMP_DIR',v_name,'W',32767); 10 11 FOR r IN (SELECT x || ',' || y || ',' || z AS csv 12 FROM source_data) 13 LOOP 14 UTL_FILE.PUT_LINE(v_file, r.csv); 15 v_lines := v_lines + 1; 16 END LOOP; 17 18 UTL_FILE.FCLOSE(v_file); 19 20 DBMS_OUTPUT.PUT_LINE('File='||v_name||'; Lines='||v_lines); 21 22 END; 23 /
File=utl_file_untuned.txt; Lines=1000000 PL/SQL procedure successfully completed. Elapsed: 00:01:06.53
Our baseline performance is approximately 66 seconds to write the data to file. Note the following:
- Line 11: each record is pre-formatted with a delimiter in the SQL cursor for convenience. A dump utility would usually accept the delimiter as a parameter and concatenate a record piecemeal (especially if using dynamic SQL);
- Lines 11-13: we are using an implicit cursor-for-loop. From Oracle 10g Release 1, this will be optimised by the PL/SQL compiler into bulk collects with an array-size of 100 rows. All examples in this article will use either implicit or explicit bulk fetches of the same size. Note that readers on 9i databases will need to convert this and other implicit cursor examples to use explicit array fetches with BULK COLLECT to ensure that all examples are comparable.
tuning (1): buffering to reduce utl_file i/o
Our baseline code has two repetitive operations. First there is an incremented counter which is used to instrument the example. This has a negligible impact overall. More importantly there are 1 million UTL_FILE I/O operations and these have a far more significant impact. Tracing the baseline example with the PL/SQL Profiler and the new PL/SQL Hierarchical Profiler clearly shows that 50% of the elapsed time is accounted for by our 1 million UTL_FILE.PUT_LINE calls.
Our first technique for tuning this, therefore, will be to reduce the number of I/O operations by buffering the output data. This is quite a simple method. We will use a local variable to buffer up to 32K of data before writing it to file, as follows.
SQL> DECLARE 2 3 v_file UTL_FILE.FILE_TYPE; 4 v_buffer VARCHAR2(32767); 5 v_name VARCHAR2(128) := 'utl_file_buffered.txt'; 6 v_lines PLS_INTEGER := 0; 7 c_eol CONSTANT VARCHAR2(1) := CHR(10); 8 c_eollen CONSTANT PLS_INTEGER := LENGTH(c_eol); 9 c_maxline CONSTANT PLS_INTEGER := 32767; 10 11 BEGIN 12 13 v_file := UTL_FILE.FOPEN('DUMP_DIR',v_name,'W',32767); 14 15 FOR r IN (SELECT x || ',' || y || ',' || z AS csv 16 FROM source_data) 17 LOOP 18 19 IF LENGTH(v_buffer) + c_eollen + LENGTH(r.csv) <= c_maxline THEN 20 v_buffer := v_buffer || c_eol || r.csv; 21 ELSE 22 IF v_buffer IS NOT NULL THEN 23 UTL_FILE.PUT_LINE(v_file, v_buffer); 24 END IF; 25 v_buffer := r.csv; 26 END IF; 27 28 v_lines := v_lines + 1; 29 30 END LOOP; 31 32 UTL_FILE.PUT_LINE(v_file, v_buffer); 33 UTL_FILE.FCLOSE(v_file); 34 35 DBMS_OUTPUT.PUT_LINE('File='||v_name||'; Lines='||v_lines); 36 37 END; 38 /
File=utl_file_buffered.txt; Lines=1000000 PL/SQL procedure successfully completed. Elapsed: 00:00:30.06
The algorithm (highlighted above) is reasonably self-explanatory, but we will describe it anyway:
- Lines 19-20: UTL_FILE has a maximum write-size of 32,767 bytes. If there is enough room in the buffer for a new record to be added (including newline), then we simply append the new data together with a newline;
- Lines 21-26: if we can't fit the new record into the buffer, or the buffer is null (in which case the buffer-length calculation will also evaluate to null), then we must either flush it or start it. The buffer will only be null on the very first entry to the loop, so we make it the very last test;
- Lines 22-24: when the buffer is too full to accept any new data, we need to write it to file with UTL_FILE;
- Line 25: regardless of whether the buffer has been written or hasn't been started it, we reset/start it with the current data record;
- Line 32: once the loop is complete, we write the remaining buffered data to the flat-file.
This technique is simple yet extremely effective. We have reduced our elapsed time to less than half the baseline: in other words, this unloading routine is now twice as fast (for this dataset). With an average record size of 105 bytes, we can buffer approximately 310 records, which means we reduce our UTL_FILE calls to roughly 3,200.
Given the effectiveness of the buffering technique, we will continue to use it for the remaining examples in this article.
a note on utl_file buffering
It is possible to replicate this buffering algorithm using the UTL_FILE.PUT, UTL_FILE.NEW_LINE and UTL_FILE.FFLUSH procedures, but a test with our sample data took over 85 seconds, which is considerably slower than our baseline. For this reason, the example is omitted from this article, but is included in the accompanying download.
tuning (2): using temporary clobs
From Oracle 10g onwards, it is possible to write a CLOB to a file with a single call, using the DBMS_XSLPROCESSOR.CLOB2FILE procedure. In the following example, we will prepare a temporary CLOB with our data instead of writing it with UTL_FILE. When all source data has been added to the CLOB, we will write it to a flat-file in a single call.
SQL> DECLARE 2 3 v_file CLOB; 4 v_buffer VARCHAR2(32767); 5 v_name VARCHAR2(128) := 'clob2file_buffered.txt'; 6 v_lines PLS_INTEGER := 0; 7 v_eol VARCHAR2(2); 8 v_eollen PLS_INTEGER; 9 c_maxline CONSTANT PLS_INTEGER := 32767; 10 11 BEGIN 12 13 v_eol := CASE 14 WHEN DBMS_UTILITY.PORT_STRING LIKE 'IBMPC%' 15 THEN CHR(13)||CHR(10) 16 ELSE CHR(10) 17 END; 18 v_eollen := LENGTH(v_eol); 19 20 DBMS_LOB.CREATETEMPORARY(v_file, TRUE); 21 22 FOR r IN (SELECT x || ',' || y || ',' || z AS csv 23 FROM source_data) 24 LOOP 25 26 IF LENGTH(v_buffer) + v_eollen + LENGTH(r.csv) <= c_maxline THEN 27 v_buffer := v_buffer || v_eol || r.csv; 28 ELSE 29 IF v_buffer IS NOT NULL THEN 30 DBMS_LOB.WRITEAPPEND( 31 v_file, LENGTH(v_buffer) + v_eollen, v_buffer || v_eol 32 ); 33 END IF; 34 v_buffer := r.csv; 35 END IF; 36 37 v_lines := v_lines + 1; 38 39 END LOOP; 40 41 IF LENGTH(v_buffer) > 0 THEN 42 DBMS_LOB.WRITEAPPEND( 43 v_file, LENGTH(v_buffer) + v_eollen, v_buffer || v_eol 44 ); 45 END IF; 46 47 DBMS_XSLPROCESSOR.CLOB2FILE(v_file, 'DUMP_DIR', v_name); 48 DBMS_LOB.FREETEMPORARY(v_file); 49 50 DBMS_OUTPUT.PUT_LINE('File='||v_name||'; Lines='||v_lines); 51 52 END; 53 /
File=clob2file_buffered.txt; Lines=1000000 PL/SQL procedure successfully completed. Elapsed: 00:00:28.65
The CLOB-specific code is highlighted above and is self-explanatory (perhaps with the exception of the end-of-line character assignment on lines 13-17 which is different for Windows. UTL_FILE manages the port-specific end-of-line conversions for us, but with CLOBs we must manage this ourselves).
Of particular interest is the DBMS_XSLPROCESSOR call on line 47, which is our only write operation to the destination flat-file. We can see overall that this technique is similar in performance to our buffered UTL_FILE mechanism (the CLOB method was slightly quicker). We therefore have an alternative method for writing data, but there will be additional costs associated with using CLOBs (for example, temporary tablespace and buffer cache). If the volume of data to be dumped is high, then this method might put too much stress on our temporary tablespace and cause problems for other users (large sort operations, hash joins, global temporary tables etc). Care should therefore be taken when using this method.
tuning (3): parallel execution with pipelined functions
Our final technique for quicker data-dumping from PL/SQL will be to split the workload and have the data written in separate streams. We can do this quite simply by using multiple sessions, with each reading and dumping different ranges of source data (Tom Kyte calls this "DIY parallelism"). A simpler (and possibly more elegant) method is to use Oracle's parallel query (PQ) mechanism and create a parallel pipelined function to split the source data and dump it using multiple PQ slaves.
For this test, we will create a parallel pipelined function that writes the source data to flat-file and returns a single summary record per session. This record will include details of the file and the session that wrote it. It is assumed that readers are familiar with the concept of parallel pipelined functions (some background reading is available if required).
We will begin by creating an object type to define the structure of our pipelined function's return data, as follows.
SQL> CREATE TYPE dump_ot AS OBJECT 2 ( file_name VARCHAR2(128) 3 , no_records NUMBER 4 , session_id NUMBER 5 ); 6 /
Pipelined functions return collections of data, so we will create a nested table type based on our new object.
SQL> CREATE TYPE dump_ntt AS TABLE OF dump_ot; 2 /
We are now ready to create our parallel pipelined function. We will create and test two versions: one with UTL_FILE and one with a CLOB, because there has been little to distinguish them in the timed comparisons so far.
tuning (3a): parallel utl_file solution
We will begin by creating a parallel pipelined function that uses UTL_FILE to write data, as follows.
SQL> CREATE FUNCTION parallel_dump ( 2 p_source IN SYS_REFCURSOR, 3 p_filename IN VARCHAR2, 4 p_directory IN VARCHAR2 5 ) RETURN dump_ntt 6 PIPELINED 7 PARALLEL_ENABLE (PARTITION p_source BY ANY) AS 8 9 TYPE row_ntt IS TABLE OF VARCHAR2(32767); 10 v_rows row_ntt; 11 v_file UTL_FILE.FILE_TYPE; 12 v_buffer VARCHAR2(32767); 13 v_sid NUMBER; 14 v_name VARCHAR2(128); 15 v_lines PLS_INTEGER := 0; 16 c_eol CONSTANT VARCHAR2(1) := CHR(10); 17 c_eollen CONSTANT PLS_INTEGER := LENGTH(c_eol); 18 c_maxline CONSTANT PLS_INTEGER := 32767; 19 20 BEGIN 21 22 SELECT sid INTO v_sid FROM v$mystat WHERE ROWNUM = 1; 23 v_name := p_filename || '_' || TO_CHAR(v_sid) || '.txt'; 24 v_file := UTL_FILE.FOPEN(p_directory, v_name, 'w', 32767); 25 26 LOOP 27 FETCH p_source BULK COLLECT INTO v_rows LIMIT 100; 28 29 FOR i IN 1 .. v_rows.COUNT LOOP 30 31 IF LENGTH(v_buffer) + c_eollen + LENGTH(v_rows(i)) <= c_maxline THEN 32 v_buffer := v_buffer || c_eol || v_rows(i); 33 ELSE 34 IF v_buffer IS NOT NULL THEN 35 UTL_FILE.PUT_LINE(v_file, v_buffer); 36 END IF; 37 v_buffer := v_rows(i); 38 END IF; 39 40 END LOOP; 41 42 v_lines := v_lines + v_rows.COUNT; 43 44 EXIT WHEN p_source%NOTFOUND; 45 END LOOP; 46 CLOSE p_source; 47 48 UTL_FILE.PUT_LINE(v_file, v_buffer); 49 UTL_FILE.FCLOSE(v_file); 50 51 PIPE ROW (dump_ot(v_name, v_lines, v_sid)); 52 RETURN; 53 54 END parallel_dump; 55 /
Before we run a timed test using this function, note the following:
- Line 7: we parallel-enable the function and state that Oracle can divide the source data amongst the PQ slaves any way it wishes;
- Lines 22-23: each PQ slave will write to its own file, so we append the SID to the filename to make each one unique;
- Lines 26-27: the source cursor will be passed as a refcursor parameter, so we must explicitly code a bulk fetch of 100 records to emulate the other examples in this article (i.e. those that benefit from 10g PL/SQL optimisation);
- Line 51: on completion of the flat-file, we return a single record summarising the file details.
We will execute this function, as follows.
SQL> SELECT * 2 FROM TABLE( 3 parallel_dump( 4 CURSOR(SELECT /*+ PARALLEL(s,4) */ 5 x ||','|| 6 y ||','|| 7 z AS csv 8 FROM source_data s), 9 'utl_file_parallel_pipelined', 10 'DUMP_DIR' 11 )) nt;
FILE_NAME NO_RECORDS SESSION_ID -------------------------------------------------- ---------- ---------- utl_file_parallel_pipelined_136.txt 190758 136 utl_file_parallel_pipelined_135.txt 192640 135 utl_file_parallel_pipelined_117.txt 288960 117 utl_file_parallel_pipelined_121.txt 327642 121 4 rows selected. Elapsed: 00:00:10.26
This is impressive! We have reduced the elapsed time of our write operation from a baseline of 66 seconds to just 10 seconds. Of course, we now have four files instead of one, but we can easily append these files together with a script of some description (Perl, shell etc). Preferably, we would leave them as separate files and simply read them as though they were a single file by using an external table with an appropriate LOCATION setting (i.e. one that includes all four files).
tuning (3b): parallel temporary clob solution
To complete our tuning tests, we will re-create the parallel pipelined function to use the CLOB method (as noted earlier, this requires at least 10g).
SQL> CREATE OR REPLACE FUNCTION parallel_dump ( 2 p_source IN SYS_REFCURSOR, 3 p_filename IN VARCHAR2, 4 p_directory IN VARCHAR2 5 ) RETURN dump_ntt 6 PIPELINED 7 PARALLEL_ENABLE (PARTITION p_source BY ANY) AS 8 9 TYPE row_ntt IS TABLE OF VARCHAR2(32767); 10 v_rows row_ntt; 11 v_file CLOB; 12 v_buffer VARCHAR2(32767); 13 v_sid NUMBER; 14 v_name VARCHAR2(128); 15 v_lines PLS_INTEGER := 0; 16 v_eol VARCHAR2(2); 17 v_eollen PLS_INTEGER; 18 c_maxline CONSTANT PLS_INTEGER := 32767; 19 20 BEGIN 21 22 v_eol := CASE 23 WHEN DBMS_UTILITY.PORT_STRING LIKE 'IBMPC%' 24 THEN CHR(13)||CHR(10) 25 ELSE CHR(10) 26 END; 27 v_eollen := LENGTH(v_eol); 28 29 SELECT sid INTO v_sid FROM v$mystat WHERE ROWNUM = 1; 30 v_name := p_filename || '_' || TO_CHAR(v_sid) || '.txt'; 31 32 DBMS_LOB.CREATETEMPORARY(v_file, TRUE); 33 34 LOOP 35 FETCH p_source BULK COLLECT INTO v_rows LIMIT 100; 36 37 FOR i IN 1 .. v_rows.COUNT LOOP 38 39 IF LENGTH(v_buffer) + v_eollen + LENGTH(v_rows(i)) <= c_maxline THEN 40 v_buffer := v_buffer || v_eol || v_rows(i); 41 ELSE 42 IF v_buffer IS NOT NULL THEN 43 DBMS_LOB.WRITEAPPEND( 44 v_file, LENGTH(v_buffer) + v_eollen, v_buffer || v_eol 45 ); 46 END IF; 47 v_buffer := v_rows(i); 48 END IF; 49 50 END LOOP; 51 52 v_lines := v_lines + v_rows.COUNT; 53 54 EXIT WHEN p_source%NOTFOUND; 55 END LOOP; 56 CLOSE p_source; 57 58 IF LENGTH(v_buffer) > 0 THEN 59 DBMS_LOB.WRITEAPPEND( 60 v_file, LENGTH(v_buffer) + v_eollen, v_buffer || v_eol 61 ); 62 END IF; 63 64 DBMS_XSLPROCESSOR.CLOB2FILE(v_file, p_directory, v_name); 65 DBMS_LOB.FREETEMPORARY(v_file); 66 67 PIPE ROW (dump_ot(v_name, v_lines, v_sid)); 68 RETURN; 69 70 END parallel_dump; 71 /
We have already described the CLOB and parallel pipelined function techniques, so we will proceed with our timing test, as follows.
SQL> SELECT * 2 FROM TABLE( 3 parallel_dump( 4 CURSOR(SELECT /*+ PARALLEL(s,4) */ 5 x ||','|| 6 y ||','|| 7 z AS csv 8 FROM source_data s), 9 'clob2file_parallel_pipelined', 10 'DUMP_DIR' 11 )) nt;
FILE_NAME NO_RECORDS SESSION_ID -------------------------------------------------- ---------- ---------- clob2file_parallel_pipelined_200.txt 248504 200 clob2file_parallel_pipelined_196.txt 232768 196 clob2file_parallel_pipelined_198.txt 248192 198 clob2file_parallel_pipelined_192.txt 270536 192 4 rows selected. Elapsed: 00:00:27.84
The CLOB technique does not scale at all. It takes approximately the same time in serial or parallel mode, meaning that we should probably avoid this technique and use UTL_FILE instead.
We will finish our testing by examining the flat-files to ensure our techniques are comparable. For brevity, we will simply look at file sizes and byte-counts as follows.
/u01/app/oracle/dir> ls -rtl *.txt -rw-r--r-- 1 oracle dba 108888896 Feb 8 18:22 utl_file_untuned.txt -rw-r--r-- 1 oracle dba 108888896 Feb 8 18:25 utl_file_buffered.txt -rw-r--r-- 1 oracle dba 108888896 Feb 8 18:30 clob2file_buffered.txt -rw-r--r-- 1 oracle dba 20769834 Feb 8 18:33 utl_file_parallel_pipelined_136.txt -rw-r--r-- 1 oracle dba 20978496 Feb 8 18:33 utl_file_parallel_pipelined_135.txt -rw-r--r-- 1 oracle dba 31477376 Feb 8 18:33 utl_file_parallel_pipelined_117.txt -rw-r--r-- 1 oracle dba 35663190 Feb 8 18:33 utl_file_parallel_pipelined_121.txt -rw-r--r-- 1 oracle dba 27036864 Feb 8 18:52 clob2file_parallel_pipelined_198.txt -rw-r--r-- 1 oracle dba 25355584 Feb 8 18:52 clob2file_parallel_pipelined_196.txt -rw-r--r-- 1 oracle dba 27049851 Feb 8 18:52 clob2file_parallel_pipelined_200.txt -rw-r--r-- 1 oracle dba 29446597 Feb 8 18:52 clob2file_parallel_pipelined_192.txt /u01/app/oracle/dir> wc -c utl_file_parallel* 20769834 utl_file_parallel_pipelined_136.txt 20978496 utl_file_parallel_pipelined_135.txt 31477376 utl_file_parallel_pipelined_117.txt 35663190 utl_file_parallel_pipelined_121.txt 108888896 total /u01/app/oracle/dir> wc -c clob2file_parallel* 29446597 clob2file_parallel_pipelined_192.txt 25355584 clob2file_parallel_pipelined_196.txt 27036864 clob2file_parallel_pipelined_198.txt 27049851 clob2file_parallel_pipelined_200.txt 108888896 total
We can summarise our findings as follows:
- we can achieve good performance gains by using a buffer to reduce our I/O operations;
- using a CLOB reduces our file I/O to a single operation and in serial mode can perform as well as UTL_FILE;
- the buffering APIs in UTL_FILE (PUT, FFLUSH, NEW_LINE) do not perform as well as our own PL/SQL variable buffering;
- our time savings from buffering are proportional to the time spent in file-writing only. If most of the routine's time is spent in data fetching (i.e. an intensive source cursor), the overall impact of UTL_FILE tuning will be reduced;
- dividing the workload by using parallel pipelined functions can provide significant performance increases (six times faster with our sample data in this article). This can also reduce the overall time it takes to access all of the source data;
- care should be taken over the resources required by the CLOB technique, particularly in PQ scenarios where it becomes unscalable.
For sites that write a lot of flat-file data from PL/SQL, one or more of these techniques can be adopted to improve the general performance of these routines.
For more information on the UTL_FILE, DBMS_LOB and DBMS_XSLPROCESSOR packages, see the PL/SQL Packages and Types Reference. For a quicker method of dumping data (and where Oracle's proprietary Data Pump format is aceptable), read this article on writeable external tables. Tom Kyte also has a range of utilities for unloading data, including a fast Pro*C program. A Data Dump utility is also available on oracle-developer.net. This currently buffers single records with UTL_FILE.PUT (due to the generic nature of the utility), but will be converted in the near future to include the PL/SQL buffering that we've seen in this article. Finally, William Robertson has a prototype of a parallel launcher utility which uses pipelined functions as a multi-threading mechanism for executing PL/SQL packages in parallel.
The source code for the examples in this article can be downloaded from here.
Adrian Billington, February 2008Back to Top