external table enhancements in 11g

This article introduces a small number of new features for external tables in Oracle 11g. Read-only external tables were introduced in Oracle 9i and are now commonplace in most database applications that need to "import" flat-file data. Writeable external tables were introduced in Oracle 10g (as part of the Data Pump family of utilities) and they enable us to extract data from the database while at the same time creating an external table to read the data back again. Both types of external table have been enhanced in 11g, as we will see below.

This article covers the following new features:

background reading

Developers who are unfamiliar with external tables can read about the evolution of this feature in the following oracle-developer.net articles:

compressed external tables

In Oracle 10g, writeable external tables (i.e. those using the ORACLE_DATAPUMP access driver) unload data to a proprietory file format that includes both file and projection metadata and the data itself (in an uncompressed format). Given the potential for large dump files and the need to transfer this data across networks, Oracle 11g enables us to compress the dataset as it is written to file. This new feature greatly reduces the amount of data being transmitted, as we will see below.

To see the effect of compression with writeable external tables, we will need to compare it with an uncompressed external dataset. First, however, we need an Oracle directory to write the external table data to. We create this as follows.

SQL> CREATE DIRECTORY xt_dir AS 'D:\Oracle\dir';

Directory created.

We will now create a simple external table by unloading all of the data in the ALL_OBJECTS view, as follows.

SQL> CREATE TABLE all_objects_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     LOCATION ( 'all_objects_xt.dmp' )
  7  )
  8  AS
  9     SELECT *
 10     FROM   all_objects
 11     WHERE  object_name NOT LIKE 'ALL_OBJECTS%XT';

Table created.

The default file option for this table is no compression. To compress the dump file, we need to explicitly enable compression using a new 11g access parameter, as follows.

SQL> CREATE TABLE all_objects_compressed_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     ACCESS PARAMETERS (COMPRESSION ENABLED)
  7     LOCATION ( 'all_objects_compressed_xt.dmp' )
  8  )
  9  AS
 10     SELECT *
 11     FROM   all_objects
 12     WHERE  object_name NOT LIKE 'ALL_OBJECTS%XT';

Table created.

We have now unloaded the same data twice, both with and without compression. We can see the effect of compression in the file sizes below.

SQL> host dir /od D:\Oracle\dir\ALL*.DMP

 Volume in drive D is USER
 Volume Serial Number is 7476-8930

 Directory of D:\Oracle\dir

01/09/2009  18:21         7,213,056 ALL_OBJECTS_XT.DMP
01/09/2009  18:21           843,776 ALL_OBJECTS_COMPRESSED_XT.DMP
               2 File(s)      8,056,832 bytes
               0 Dir(s)   9,129,951,232 bytes free

The compressed file is roughly 12% of the size of the uncompressed version; a considerable reduction and one which will improve transmission times for data passed around networked systems (whilst also saving on storage).

performance impact: writing compressed data

By compressing data, we expect our CPU consumption to increase (as data is compressed) but the amount of file I/O to decrease (as fewer bytes of data are being written to file). We might also expect the I/O savings to outweigh the CPU costs. To test these assumptions, we will compare the relative performance of the compressed external tables below. First, we will use a variation of Tom Kyte's RUNSTATS utility to compare the overall resource consumption of compressed and uncompressed external tables. Second, we will compare the CPU time taken by each approach. In both cases, we will generate a file of 10 times the volume of ALL_OBJECTS, beginning with the uncompressed version as follows (note that the ALL_OBJECTS view data is cached in advance).

SQL> exec runstats_pkg.rs_start;

PL/SQL procedure successfully completed.

SQL> CREATE TABLE all_objects_big_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     LOCATION ( 'all_objects_big_xt.dmp' )
  7  )
  8  AS
  9     SELECT *
 10     FROM   all_objects
 11     ,     (SELECT ROWNUM AS r FROM dual CONNECT BY ROWNUM <= 10)
 12     WHERE  object_name NOT LIKE 'ALL_OBJECTS%XT';

Table created.

We will now create the compressed version of this table below.

SQL> exec runstats_pkg.rs_middle;

PL/SQL procedure successfully completed.

SQL> CREATE TABLE all_objects_big_compressed_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     ACCESS PARAMETERS (COMPRESSION ENABLED)
  7     LOCATION ( 'all_objects_big_compressed_xt.dmp' )
  8  )
  9  AS
 10     SELECT *
 11     FROM   all_objects
 12     ,     (SELECT ROWNUM AS r FROM dual CONNECT BY ROWNUM <= 10)
 13     WHERE  object_name NOT LIKE 'ALL_OBJECTS%XT';

Table created.

In our comparison report below, we'll just compare timings, latches and the output of two statistics measuring CPU usage and physical bytes being written.

SQL> BEGIN
  2     runstats_pkg.rs_stop(
  3        runstats_pkg.statname_ntt(
  4           'physical write total bytes',
  5           'CPU used by this session'));
  6  END;
  7  /

================================================================================
Runstats report : 01-SEP-2009 18:23:10
================================================================================


--------------------------------------------------------------------------------
1. Summary timings
--------------------------------------------------------------------------------
Run1 ran in 622 hsecs
Run2 ran in 722 hsecs
Run1 was 13.9% quicker than Run2


--------------------------------------------------------------------------------
2. Statistics report
--------------------------------------------------------------------------------


Type  Name                                        Run1         Run2         Diff
----- ----------------------------------- ------------ ------------ ------------
STAT  CPU used by this session                     519          711          192
STAT  physical write total bytes            74,465,280    3,055,616  -71,409,664


--------------------------------------------------------------------------------
3. Latching report
--------------------------------------------------------------------------------
Run1 used 207,223 latches
Run2 used 207,162 latches
Run2 used 0% fewer latches than Run1


================================================================================
End of report
================================================================================

PL/SQL procedure successfully completed.

As expected, the compressed external table uses slightly more CPU but writes significantly fewer bytes of data. Despite this, the timings (and latches) are comparable (in fact, the uncompressed external table was slightly faster). This is a surprising result given the reduced I/O that the compressed external table has to perform. We can confirm our results with an external file listing, as follows.

SQL> host dir /od D:\Oracle\dir\ALL*BIG*.DMP
 Volume in drive D is USER
 Volume Serial Number is 7476-8930

 Directory of D:\Oracle\dir

01/09/2009  18:23        74,457,088 ALL_OBJECTS_BIG_XT.DMP
01/09/2009  18:23         3,047,424 ALL_OBJECTS_BIG_COMPRESSED_XT.DMP
               2 File(s)     77,504,512 bytes
               0 Dir(s)   9,052,446,720 bytes free

The files sizes match the statistics we saw in Oracle. This time we achieved a much better compression ratio of 96%. This is a remarkable compression ratio and it is related to the nature of the source data. In our example, we created 10 copies of each row in ALL_OBJECTS, and this ordered rowsource will naturally favour a better degree of compression. If, by comparison, we order the same source resultset randomly (using DBMS_RANDOM) we achieve less compression (approximately 75% in tests).

To conclude, although our compressed external table didn't extract any quicker than the uncompressed version, it generates significantly less data to store or transmit to other systems.

reading from compressed external files

When we create a writeable external table, we generate two objects. First, we create a physical file on our filesystem. Second, we are left with an external table that can now be used to read the external file back again (should we wish to do so). Interestingly, we can use an uncompressed external table to read back a compressed file (assuming the column format is the same). Remember in our original example we created a simple uncompressed external file of ALL_OBJECTS. We can query the external file's record count as follows.

SQL> SELECT COUNT(*) FROM all_objects_xt;

  COUNT(*)
----------
     68789

1 row selected.

We will change the location of this external table to use the compressed file (i.e the one that we created with a compressed version of the external table), as follows.

SQL> ALTER TABLE all_objects_xt LOCATION ('all_objects_compressed_xt.dmp');

Table altered.

We will repeat our COUNT(*) query, as follows.

SQL> SELECT COUNT(*) FROM all_objects_xt;

  COUNT(*)
----------
     68789

1 row selected.

Despite the fact that our external table was used to create an uncompressed file, we can read the compressed file via the Data Pump driver (the Data Pump files contain sufficient metadata to enable this seamless switch between compressed and uncompressed files). Similarly, we can use the compressed external table to read the uncompressed file, as follows.

SQL> ALTER TABLE all_objects_compressed_xt LOCATION ('all_objects_xt.dmp');

Table altered.

SQL> SELECT COUNT(*) FROM all_objects_compressed_xt;

  COUNT(*)
----------
     68789

1 row selected.

Note that we will use our uncompressed external table in later examples, so we will reset it before continuing below.

SQL> ALTER TABLE all_objects_xt LOCATION ('all_objects_xt.dmp');

Table altered.

encrypted external tables

In addition to compressed external tables, the Data Pump driver also supports encryption. This uses Oracle's Transparent Data Encryption feature (TDE), which is a separately licensed feature of 10g Release 2 onwards. Continuing with our ALL_OBJECTS example, we will encrypt our base data as follows.

SQL> CREATE TABLE all_objects_encrypted_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     ACCESS PARAMETERS (ENCRYPTION ENABLED)
  7     LOCATION ( 'all_objects_encrypted_xt.dmp' )
  8  )
  9  AS
 10     SELECT *
 11     FROM   all_objects
 12     WHERE  object_name NOT LIKE 'ALL_OBJECTS%XT';

Table created.

To enable Transparent Data Encryption, an encryption key and wallet must first be created. The wallet needs to be opened every time the instance is started and is password protected. See the Advanced Security Administrator's Guide referenced at the end of this article for details.

Similar to compression, the encryption property is enabled in the external table's access parameters as highlighted above. We can confirm that our file was created with a file listing, as follows.

SQL> host dir /od D:\Oracle\dir\all*.dmp
 Volume in drive D is USER
 Volume Serial Number is 7476-8930

 Directory of D:\Oracle\dir

01/09/2009  18:21         7,213,056 ALL_OBJECTS_XT.DMP
01/09/2009  18:21           843,776 ALL_OBJECTS_COMPRESSED_XT.DMP
01/09/2009  18:23        74,457,088 ALL_OBJECTS_BIG_XT.DMP
01/09/2009  18:23         3,047,424 ALL_OBJECTS_BIG_COMPRESSED_XT.DMP
01/09/2009  18:26         7,213,056 ALL_OBJECTS_ENCRYPTED_XT.DMP
               5 File(s)     92,774,400 bytes
               0 Dir(s)   9,045,241,856 bytes free

The encrypted file is the same size as the original unencrypted version (to the exact byte). We will look inside the file shortly, but first we'll repeat the performance comparison we ran for compressed external tables.

performance impact: writing encrypted data

As with compressed external tables, we might expect an encrypted external table to be more CPU-intensive than its unencrypted alternative. This time, however, we should expect the data volumes to be the same, meaning that the performance impact of encryption should only be CPU-related (although we discovered that compressed external tables made little time-saving from reduced I/O). We will validate our assumption with a similar RUNSTATS test, beginning with a large unencrypted file of ten copies of ALL_OBJECTS below.

SQL> exec runstats_pkg.rs_start;

PL/SQL procedure successfully completed.

SQL> CREATE TABLE all_obj_big_unencr_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     LOCATION ( 'all_objects_big_unencrypted_xt.dmp' )
  7  )
  8  AS
  9     SELECT *
 10     FROM   all_objects
 11     ,     (SELECT ROWNUM AS r FROM dual CONNECT BY ROWNUM <= 10)
 12     WHERE  object_name NOT LIKE 'ALL_OBJECTS%XT';

Table created.

We will now produce an encrypted copy of the data and output the comparison report, as follows.

SQL> exec runstats_pkg.rs_middle;

PL/SQL procedure successfully completed.

SQL> CREATE TABLE all_obj_big_encr_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     ACCESS PARAMETERS (ENCRYPTION ENABLED)
  7     LOCATION ( 'all_objects_big_encrypted_xt.dmp' )
  8  )
  9  AS
 10     SELECT *
 11     FROM   all_objects
 12     ,     (SELECT ROWNUM AS r FROM dual CONNECT BY ROWNUM <= 10)
 13     WHERE  object_name NOT LIKE 'ALL_OBJECTS%XT';

Table created.

SQL> BEGIN
  2     runstats_pkg.rs_stop(
  3        runstats_pkg.statname_ntt(
  4           'physical write total bytes',
  5           'CPU used by this session'));
  6  END;
  7  /
================================================================================
Runstats report : 01-SEP-2009 18:30:15
================================================================================


--------------------------------------------------------------------------------
1. Summary timings
--------------------------------------------------------------------------------
Run1 ran in 621 hsecs
Run2 ran in 757 hsecs
Run1 was 18% quicker than Run2


--------------------------------------------------------------------------------
2. Statistics report
--------------------------------------------------------------------------------


Type  Name                                        Run1         Run2         Diff
----- ----------------------------------- ------------ ------------ ------------
STAT  physical write total bytes            74,469,376   74,469,376            0
STAT  CPU used by this session                     545          668          123


--------------------------------------------------------------------------------
3. Latching report
--------------------------------------------------------------------------------
Run1 used 205,274 latches
Run2 used 205,610 latches
Run1 used .2% fewer latches than Run2


================================================================================
End of report
================================================================================

PL/SQL procedure successfully completed.

The profile for encrypted external tables is broadly the same as that for compressed tables; that is, increased CPU cost and almost no I/O time savings. We can confirm our file sizes with a directory listing, as follows.

SQL> host dir /od D:\Oracle\dir\ALL*BIG*ENCRYPTED*.DMP
 Volume in drive D is USER
 Volume Serial Number is 7476-8930

 Directory of D:\Oracle\dir

01/09/2009  18:30        74,461,184 ALL_OBJECTS_BIG_UNENCRYPTED_XT.DMP
01/09/2009  18:30        74,461,184 ALL_OBJECTS_BIG_ENCRYPTED_XT.DMP
               2 File(s)    148,922,368 bytes
               0 Dir(s)   8,896,315,392 bytes free

reading encrypted files

As we saw earlier, when we create an external file, we can use the table to read the data back into Oracle. Using our original example, we will read a small number of records from the encrypted file, as follows.

SQL> SELECT owner, object_name, object_type
  2  FROM   all_objects_encrypted_xt
  3  WHERE  ROWNUM <= 5;

OWNER                          OBJECT_NAME                    OBJECT_TYPE
------------------------------ ------------------------------ -------------------
SYS                            ICOL$                          TABLE
SYS                            I_USER1                        INDEX
SYS                            CON$                           TABLE
SYS                            UNDO$                          TABLE
SYS                            C_COBJ#                        CLUSTER

5 rows selected.

Naturally, the decryption is seamless to us. We can alternatively use an unencrypted external table to read the encrypted data (if the column structure is the same, of course). In the following example, we will set the location of our vanilla external table to read the encrypted file.

SQL> ALTER TABLE all_objects_xt LOCATION ('all_objects_encrypted_xt.dmp');

Table altered.

SQL> SELECT owner, object_name, object_type
  2  FROM   all_objects_xt
  3  WHERE  ROWNUM <= 5;

OWNER                          OBJECT_NAME                    OBJECT_TYPE
------------------------------ ------------------------------ -------------------
SYS                            ICOL$                          TABLE
SYS                            I_USER1                        INDEX
SYS                            CON$                           TABLE
SYS                            UNDO$                          TABLE
SYS                            C_COBJ#                        CLUSTER

5 rows selected.

As we can see, the metadata in the external dump file enables Oracle to decrypt the data even if it is through an external table that doesn't have encryption enabled.

confirming data encryption

External table dump files are in a proprietory format, but it is still possible to identify that encryption is happening. To demonstrate this, we will create a single row external file with some recognisable strings in the data, as follows.

SQL> CREATE TABLE one_row_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     LOCATION ( 'one_row_xt.dmp' )
  7  )
  8  AS
  9     SELECT CAST('This is some data'      AS VARCHAR2(20)) AS some_data
 10     ,      CAST('This is some more data' AS VARCHAR2(30)) AS some_more_data
 11     FROM   dual;

Table created.

We will also create an encrypted version of the same table below.

SQL> CREATE TABLE one_row_encrypted_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     ACCESS PARAMETERS (ENCRYPTION ENABLED)
  7     LOCATION ( 'one_row_encrypted_xt.dmp' )
  8  )
  9  AS
 10     SELECT CAST('This is some data'      AS VARCHAR2(20)) AS some_data
 11     ,      CAST('This is some more data' AS VARCHAR2(30)) AS some_more_data
 12     FROM   dual;

Table created.

As usual, we will confirm that our files were created with a directory listing, as follows.

SQL> host dir /od D:\Oracle\dir\one*.dmp
 Volume in drive D is USER
 Volume Serial Number is 7476-8930

 Directory of D:\Oracle\dir

02/09/2009  18:14            12,288 ONE_ROW_XT.DMP
02/09/2009  18:14            12,288 ONE_ROW_ENCRYPTED_XT.DMP
               2 File(s)         24,576 bytes
               0 Dir(s)   8,896,016,384 bytes free

To prove that encryption is happening, we will look at the contents of each of these files. Remember that they are in a proprietory format that includes some binary data, so we will need to read them in RAW format. To enable us to do this, we will create a "reader" external table that reads the Data Pump files as fixed RAW records.

SQL> CREATE TABLE flat_xt
  2  ( line RAW(64) )
  3  ORGANIZATION EXTERNAL
  4  (
  5     TYPE ORACLE_LOADER
  6     DEFAULT DIRECTORY xt_dir
  7     ACCESS PARAMETERS
  8     (
  9        RECORDS FIXED 64
 10        READSIZE 64
 11        FIELDS TERMINATED BY EOF
 12        ( line RAW(64) )
 13     )
 14     LOCATION ('')
 15  );

Table created.

Note that we are using a fixed readsize of 64 bytes. This divides into our external filesize (12,188) with no remainder (to prevent an external table error message related to having an incomplete last line). For convenience (and display purposes), our recordsize is fixed at the same size. To format the RAW data as VARCHAR2 and to extract only the lines of interest from our external files, we will create a view over the reader external table, as follows.

SQL> CREATE VIEW flat_vxt
  2  AS
  3     SELECT line
  4     FROM  (
  5            SELECT REGEXP_REPLACE(
  6                      UTL_RAW.CAST_TO_VARCHAR2(line),
  7                      '[^[:print:]]'
  8                      ) AS line
  9            FROM   flat_xt
 10           )
 11  WHERE line IS NOT NULL;

View created.

We are ready to read the contents of our external files. We will begin with the unencrypted file below.

SQL> ALTER TABLE flat_xt LOCATION ('one_row_xt.dmp');

Table altered.

SQL> SELECT * FROM flat_vxt;

LINE
---------------------------------------------------------------
jZg+-ͥP(>@m=R;(w+BA+
P"SCOTT"."U"
IBMPC/WIN_NT-8.1.0
WE8MSWIN1252

11.01.00.00.00
001:001:000001:000001
$$-H
<?xml version="1.0"?><ROWSET> <ROW>  <STRMTABLE_T>   <VERS
_MAJOR>1</VERS_MAJOR>   <VERS_MINOR>0 </VERS_MINOR>   <VERS_DP
API>3</VERS_DPAPI>   <ENDIANNESS>0</ENDIANNESS>   <CHARSET>WE8
MSWIN1252</CHARSET>   <NCHARSET>AL16UTF16</NCHARSET>   <DBTIME
ZONE>+00:00</DBTIMEZONE>   <OWNER_NAME>SCOTT</OWNER_NAME>   <N
AME>ONE_ROW_XT</NAME>   <COL_LIST>     <COL_LIST_ITEM>      <
COL_NUM>1</COL_NUM>      <NAME>SOME_DATA</NAME>      <TYPE_NUM
>1</TYPE_NUM>      <LENGTH>20</LENGTH>      <PRECISION_NUM>0</
PRECISION_NUM>      <SCALE>0</SCALE>      <CHARSETID>178</CHAR
SETID>      <CHARSETFORM>1</CHARSETFORM>      <CHARLENGTH>20</
CHARLENGTH>     </COL_LIST_ITEM>     <COL_LIST_ITEM>      <CO
L_NUM>2</COL_NUM>      <NAME>SOME_MORE_DATA</NAME>      <TYPE_
NUM>1</TYPE_NUM>      <LENGTH>30</LENGTH>      <PRECISION_NUM>
0</PRECISION_NUM>      <SCALE>0</SCALE>      <CHARSETID>178</C
HARSETID>      <CHARSETFORM>1</CHARSETFORM>      <CHARLENGTH>3
0</CHARLENGTH>     </COL_LIST_ITEM>   </COL_LIST>  </STRMTABL
E_T> </ROW></ROWSET><--This is some dataThis is some more
 data

26 rows selected.

We can clearly see that our source table data is present in the external file. We can also see the table metadata. We can repeat this for the encrypted file, as follows.

SQL> ALTER TABLE flat_xt LOCATION ('one_row_encrypted_xt.dmp');

Table altered.

SQL> SELECT * FROM flat_vxt;

LINE
-------------------------------------------------------
g+w!_FZiQBA+n
h"SCOTT"."U"
IBMPC/WIN_NT-8.1.0
WE8MSWIN1252

11.01.00.00.00
001:001:000001:000001Lh9=
SO+k|`+n`-z-vt@Y-Oy+a5DS
(+6uzO~!mM1"p2m>j+_z-+5{`nB70EM
L~2+Y:Ϋƨ+[N_J~4%\+dvt@Y-Oy+a5DS
-h9PX<\5e^n_1mZ+.+v-+y#2!u
$\pq+-]{Wza.-P;_mHy(~M/?j-dH+N
=L̯H+1E+GMdwc%;+5+1#D+NQjͭt+
=f+=ks=_6|"f-բ+-A/d_WR8V\2E-
IQGj.樨Ƚc9vxNff&+-3+#^ҦA3+eM-NLfT@
%Iv+ttxns
$$!-(X!
-63+WR9#kpx
 >o4lZm_NHm¦OȽ#ֶjl)N[D{a}-.rUYOzp.
k~ 4Cc+w+X+\za>+uڦ${Iįn4?b̻A0W-EQ+
+`-*fo+ѲK;Z:1;q-b++UmXz,UD7U++Կ+=F_
<;y$zSe]`+?-|-̮+|n+F~T+:. !Ks#H
뷦TZ*
(ƥ2ib;Wx+~vפ_p2@F6+ġAc1udr+jd-
!T9Ƽd-zE/#4E9j-+ֽʯ-ݾ}V;;NR+ygE֧
F<=˦T;?x6_-+qJ|W9+RdklTjo~)
7t7ta"GV(XOB=sU=6;4_,=lM-MU
+Ohk+Mh2IcOIT"`MR/4ek٦qp[Co桦
.e{l}.

36 rows selected.

This time we can see that both our metadata and data are completely illegible, so we can safely assume that the encryption is happening!

combining compression and encryption

Our examples so far have concentrated on either compression or encryption, but not both. It is, however, possible to combine the two features simply by enabling both in the access parameters clause, as follows.

SQL> CREATE TABLE all_objects_both_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     ACCESS PARAMETERS (ENCRYPTION ENABLED COMPRESSION ENABLED)
  7     LOCATION ( 'all_objects_both_xt.dmp' )
  8  )
  9  AS
 10     SELECT *
 11     FROM   all_objects
 12     WHERE  object_name NOT LIKE 'ALL_OBJECTS%XT';

Table created.

We will check the resulting file size with a directory listing below.

SQL> host dir /od D:\Oracle\dir\all*xt.dmp
 Volume in drive D is USER
 Volume Serial Number is 7476-8930

 Directory of D:\Oracle\dir

01/09/2009  18:21         7,213,056 ALL_OBJECTS_XT.DMP
01/09/2009  18:21           843,776 ALL_OBJECTS_COMPRESSED_XT.DMP
01/09/2009  18:23        74,457,088 ALL_OBJECTS_BIG_XT.DMP
01/09/2009  18:23         3,047,424 ALL_OBJECTS_BIG_COMPRESSED_XT.DMP
01/09/2009  18:26         7,213,056 ALL_OBJECTS_ENCRYPTED_XT.DMP
01/09/2009  18:30        74,461,184 ALL_OBJECTS_BIG_UNENCRYPTED_XT.DMP
01/09/2009  18:30        74,461,184 ALL_OBJECTS_BIG_ENCRYPTED_XT.DMP
02/09/2009  18:16           843,776 ALL_OBJECTS_BOTH_XT.DMP
               8 File(s)    242,540,544 bytes
               0 Dir(s)   8,669,671,424 bytes free

Note the file size as highlighted. It is exactly the same as our original compressed (but unencrypted) file of ALL_OBJECTS data. As we might expect by now, we can use any of our external tables with matching columns to read this data back, as follows.

SQL> ALTER TABLE all_objects_xt LOCATION ('all_objects_both_xt.dmp');

Table altered.

SQL> SELECT COUNT(*) FROM all_objects_xt;

  COUNT(*)
----------
     68795

1 row selected.

external table preprocessor

External tables are generally superior to SQL*Loader for the majority of standard requirements, but there are certain scenarios where SQL*Loader is still invaluable:

It is the third bullet point that we are interested in. Loading data direct from compressed files is quite a common technique in batch systems (particularly on UNIX). In such systems, compressed files are uncompressed to named pipes and SQL*Loader loads directly from the pipes, rather than files. This technique can save considerable time and prevents us from ever having to uncompress an entire file.

It has always been technically possible to achieve something similar with external tables, although in a much less robust way (which is probably why it has never been published as a technique). However, in both the 11.1.0.7 patchset and the 11.2 release, Oracle has added an external table preprocessor to the ORACLE_LOADER driver that enables us to read directly from compressed files (the preprocessor is invoked as we query the external table). In fact, the preprocessor is intended to support a range of pre-loading operations that convert an input rowsource into the expected format for the external table, but it is compressed files that are the obvious target for this feature, so we will concentrate on this below.

reading a compressed flat-file

To demonstrate the external table preprocessor, we will generate a small flat-file of 1,000 records, compress it and read it using an external table. We will generate the flat-file by using the Data Dump utility from oracle-developer.net as follows.

SQL> BEGIN
  2     data_dump(
  3        query_in     => q'[SELECT ROWNUM AS r
  4                           ,      owner || '.' || object_name AS object_name
  5                           FROM   all_objects
  6                           WHERE  ROWNUM <= 1000]',
  7        file_in      => 'small_flat_file.txt',
  8        directory_in => 'XT_DIR'
  9        );
 10  END;
 11  /

PL/SQL procedure successfully completed.

We will verify that our file was created and then compress it using gzip, as follows.

SQL> host dir /od D:\Oracle\dir\small_flat_file.txt
 Volume in drive D is USER
 Volume Serial Number is 7476-8930

 Directory of D:\Oracle\dir

02/09/2009  18:17            23,889 small_flat_file.txt
               1 File(s)         23,889 bytes
               0 Dir(s)   8,895,135,744 bytes free

SQL> host gzip -1 -9 D:\Oracle\dir\small_flat_file.txt

SQL> host dir /od D:\Oracle\dir\small_flat_file.*
 Volume in drive D is USER
 Volume Serial Number is 7476-8930

 Directory of D:\Oracle\dir

02/09/2009  18:17             6,801 small_flat_file.txt.gz
               1 File(s)          6,801 bytes
               0 Dir(s)   8,895,152,128 bytes free

In addition to the standard read-write Oracle directory that we need for our external table, we also need an additional executable directory object for the preprocessor. This directory defines the location of the executables used by the preprocessor (we will be using gzip below). As far as Oracle is concerned, an executable directory is one that has EXECUTE privileges granted on it (this is an 11g feature specifically to support the preprocessor).

We will use a SYSDBA account to create an executable directory and grant the EXECUTE privilege to user SCOTT below.

SQL> conn / as sysdba
Connected.

SQL> CREATE DIRECTORY bin_dir AS 'C:\Progra~1\GNUWin32\bin';

Directory created.

SQL> GRANT EXECUTE ON DIRECTORY bin_dir TO scott;

Grant succeeded.

The ability to invoke executables from an external tables (via an executable Oracle directory) has security implications and should be tightly controlled. It is recommended that a directory containing only the required preprocessor executables is created for this purpose.

Moving on, we will now create an external table with a preprocessor to uncompress our small compressed file, as follows.

SQL> conn scott/tiger
Connected.

SQL> CREATE TABLE compressed_xt
  2  (  rec_no      INTEGER
  3  ,  object_name VARCHAR2(255)
  4  )
  5  ORGANIZATION EXTERNAL
  6  (
  7    TYPE ORACLE_LOADER
  8    DEFAULT DIRECTORY xt_dir
  9    ACCESS PARAMETERS
 10    (
 11       RECORDS DELIMITED by NEWLINE
 12       PREPROCESSOR bin_dir: 'gzip.exe' OPTIONS '-dc'
 13       FIELDS TERMINATED BY ','
 14    )
 15    LOCATION ('small_flat_file.txt.gz')
 16  )
 17  REJECT LIMIT UNLIMITED;

Table created.

Note the preprocessor syntax. It comprises the executable directory (the directory specification), the executable itself and options as required (the file specification). In our case we have used a call to gzip with the -dc options, but the preprocessor can invoke any executable that generates a rowsource to stdout (including shell/batch scripts which we will describe below).

The OPTIONS clause demonstrated above will only work in Oracle 11g Release 1 (11.1.0.7) and has been removed in 11g Release 2 (11.2.0.x) altogether. To work around this is simple, however, because we can put the full preprocessor command in an executable shell script instead. See "using preprocessor scripts" below for details of how to convert this example to batch/shell script.

We will now query a small number of rows from our external table, reading directly from the compressed file, as follows.

SQL> SELECT *
  2  FROM   compressed_xt
  3  WHERE  rec_no <= 10;

    REC_NO OBJECT_NAME
---------- ------------------------------
         1 SYS.ICOL$
         2 SYS.I_USER1
         3 SYS.CON$
         4 SYS.UNDO$
         5 SYS.C_COBJ#
         6 SYS.I_OBJ#
         7 SYS.PROXY_ROLE_DATA$
         8 SYS.I_IND1
         9 SYS.I_CDEF2
        10 SYS.I_OBJ5

10 rows selected.

Some care should be taken with this feature, however. These examples were run on a Windows XP laptop with a dual-core Intel processor, 2Gb RAM and an 11.1.0.7 database. On this system, the preprocessor works well with small zipped files (remember that the above example used just 1,000 records). However, for larger compressed files, the feature is not so stable (at least on this Windows system). With a larger dataset of approximately 700,000 records (only 2Mb compressed), the query hangs on the external table read wait event. The query returns its data only when the separate gzip process is killed. A range of other symptoms suggest that this feature is not quite suited to this laptop, such as alert log entries, ORA-7445, session coredumps and having to bounce the instance to enable any more connections. Of course, readers might have different experiences with different setups.

using preprocessor scripts

In addition to executables such as gzip, the preprocessor can be used with any shell or batch script that generates a rowsource to stdout (of course the rowsource must be in the record format defined for the external table). The most obvious use for this feature is to enable us to query a list of files in a directory from within Oracle and this is the subject of a separate oracle-developer.net article.

To provide a small demonstration for the purposes of this current article, however, we will wrap our gzip example in a batch script and query the compressed file again (and as noted earlier, this method will be necessary for Oracle 11.2 as the OPTIONS preprocessor clause is no longer supported after 11.1). To do this, we will create a batch script (this is a Windows system) named unzip_file.bat containing the following two lines.


@echo off
gzip -dc %1

The first line is mandatory for batch scripts and the script itself must be located in the executable Oracle directory (in our case BIN_DIR). Interestingly, the directory and filename given in the external table's DEFAULT DIRECTORY and LOCATION clauses respectively are accessible to the batch script in the %1 argument as we can see above ($1 in a shell script).

We will now re-create our COMPRESSED_XT external table to use the preprocessor batch script instead of the gzip executable, as follows.

SQL> DROP TABLE compressed_xt;

Table dropped.

SQL> CREATE TABLE compressed_xt
  2  (  rec_no      INTEGER
  3  ,  object_name VARCHAR2(255)
  4  )
  5  ORGANIZATION EXTERNAL
  6  (
  7    TYPE ORACLE_LOADER
  8    DEFAULT DIRECTORY xt_dir
  9    ACCESS PARAMETERS
 10    (
 11       RECORDS DELIMITED by NEWLINE
 12       PREPROCESSOR bin_dir: 'unzip_file.bat'
 13       FIELDS TERMINATED BY ','
 14    )
 15    LOCATION ('small_flat_file.txt.gz')
 16  )
 17  REJECT LIMIT UNLIMITED;

Table created.

To complete this example, we will query a small number of records from the compressed file (preprocessed by our batch script), as follows.

SQL> SELECT *
  2  FROM   compressed_xt
  3  WHERE  rec_no <= 10;

    REC_NO OBJECT_NAME
---------- ------------------------------
         1 SYS.ICOL$
         2 SYS.I_USER1
         3 SYS.CON$
         4 SYS.UNDO$
         5 SYS.C_COBJ#
         6 SYS.I_OBJ#
         7 SYS.PROXY_ROLE_DATA$
         8 SYS.I_IND1
         9 SYS.I_CDEF2
        10 SYS.I_OBJ5

10 rows selected.

As described above, Oracle supplies the path and filename of the compressed file in the %1 argument to the batch script. This enables the gzip command to unzip the file to stdout and therefore be used as an input data stream for the ORACLE_LOADER driver (and hence the external table).

further reading

For more information on all new 11g features for external tables, read the ORACLE_LOADER Access Driver documentation in the online Utilities guide. Information on Transparent Data Encryption can be found in the Advanced Security Administrator's Guide. For more examples of preprocessor batch scripts, read listing files with the external table preprocessor in 11g, which describes how the feature can be used for directory listings from SQL. The variation on Tom Kyte's RUNSTATS utility used in this article can be downloaded from here.

source code

The source code for the examples in this article can be downloaded from here.

Adrian Billington, September 2009 (updated June 2014)

Back to Top