unloading data using external tables in 10g

This is an introduction to a new feature of external tables in 10g, namely the ability to unload data from the database using SQL. Introduced in 9i, external tables provided a new means of reading flat file data without first loading it into a staging table. They offered additional benefits over traditional SQL*Loader methods in that loading data became as simple as an INSERT..SELECT, with all the SQL processing and manipulation options that such a statement implies. It also meant that more of the ETL application could be self-contained in PL/SQL packages, as call-outs to SQL*Loader or batch scripts were no longer necessary to load data.

Oracle 10g has taken external tables a stage further by enabling an external table to be created as a CTAS (Create Table As Select) operation, which enables a one-time unloading of data. When I first saw the headlines about "writing" to an external table, I must admit to being excited by the possibilities of a native data unloader (no more sqlplus spools, UTL_FILE writes, Pro*C unloaders etc). Alas, this is not the case. Oracle has in fact supplied a new access driver known as "ORACLE_DATAPUMP" which means that Oracle will "unload" an external table to a proprietary format usable by the Data Pump set of utilities (significant new 10g feature to replace imp and exp). So, generating an ASCII, CSV file from external tables is not possible, but this new feature is exciting nonetheless.

a simple demonstration

We'll start with a simple demonstration of how to dump the data from the ALL_OBJECTS view to a file. I am assuming that readers are familiar with external table concepts and DIRECTORY objects.

SQL> CREATE DIRECTORY xt_dir AS 'c:\temp\ora';

Directory created.

Now we have a directory to dump the data to, we can create an "unloading" external table.

SQL> CREATE TABLE all_objects_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     LOCATION ( 'all_objects_xt.dmp' )
  7  )
  8  AS
  9     SELECT *
 10     FROM   all_objects;

Table created.

We can verify that the file was created with a simple file lookup.

SQL> host dir c:\temp\ora\ALL*
 Volume in drive C is DRIVE-C
 Volume Serial Number is 6845-F732

 Directory of c:\temp\ora

11/05/2005  20:16         4,648,960 ALL_OBJECTS_XT.DMP
11/05/2005  20:16               312 ALL_OBJECTS_XT_2024_3452.log
               2 File(s)      4,649,272 bytes
               0 Dir(s)   4,308,662,272 bytes free

It is probably worth mentioning at this point that these examples were created on a Windows XP PC, so the file names are case-insensitive. This is apparent from the above output, where the file name quoted in the LOCATION access parameter is lower-case and the subsequent directory listing shows an upper-case file name. On UNIX, more care will have to be taken over the case used to name and reference the file.

Also note the log file. This can be avoided using the NOLOGFILE keyword, but contains information such as the time the file was opened and any error messages that occur during the creation or read-back of the external table.

read-back

Note I mentioned reading in the previous paragraph. An interesting fact about these tables is that they can continue to be used to read the dumped data back in. As the table has been created, it therefore makes sense that it persists as a database object and that as long as an underlying file exists (of the same format), it can continue to be used, as follows.

SQL> desc all_objects_xt
 Name                                      Null?    Type
 ----------------------------------------- -------- ------------------
 OWNER                                              VARCHAR2(30)
 OBJECT_NAME                                        VARCHAR2(30)
 SUBOBJECT_NAME                                     VARCHAR2(30)
 OBJECT_ID                                          NUMBER
 DATA_OBJECT_ID                                     NUMBER
 OBJECT_TYPE                                        VARCHAR2(19)
 CREATED                                            DATE
 LAST_DDL_TIME                                      DATE
 TIMESTAMP                                          VARCHAR2(19)
 STATUS                                             VARCHAR2(7)
 TEMPORARY                                          VARCHAR2(1)
 GENERATED                                          VARCHAR2(1)
 SECONDARY                                          VARCHAR2(1)

SQL> SELECT COUNT(*) FROM all_objects_xt;

  COUNT(*)
----------
     47374

As long as the underlying file is in the Data Pump format and matches the structure of the original file, the external table can continue to be used to read the data (the underlying file(s) do not necessarily have to be the actual files created during the original external table creation).

For the following example, I dumped ALL_OBJECTS on my Solaris database (version 10.1.0.3) to a file named ALL.DMP and transferred it to my Windows XP server (Oracle 10.1.0.2) using DBMS_FILE_TRANSFER. If we rename the ALL_OBJECTS_XT.DMP file to ALL.DMP and issue the following, the SELECT still works.

SQL> ALTER TABLE all_objects_xt LOCATION ( 'ALL.DMP' );

Table altered.

SQL> SELECT COUNT(*) FROM all_objects_xt;

  COUNT(*)
----------
     47374

file protection

Oracle will protect the dump files we create by stopping any further attempts to write to them. For example, if we tried to re-create the ALL_OBJECTS_XT table, but forgot to remove the original file (remember this is now named 'ALL.DMP'), we will get an error.

SQL> DROP TABLE all_objects_xt;

Table dropped.

SQL> CREATE TABLE all_objects_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     LOCATION ( 'ALL.DMP' )
  7  )
  8  AS
  9     SELECT *
 10     FROM   all_objects;
CREATE TABLE all_objects_xt
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-11012: file ALL.DMP in c:\temp\ora already exists
ORA-06512: at "SYS.ORACLE_DATAPUMP", line 19
ORA-06512: at line 1

This means, however, that any attempts by the more UNIX-indoctrinated amongst us to compress the output file through a named pipe will fail, because the "FIFO" will already exist before the CREATE TABLE operation.

Similarly, note that I previously mentioned the writes being a one-time only operation. This means that data is only unloaded at the time of table creation and that statements such as the following will fail.

SQL> INSERT INTO all_objects_xt
  2  SELECT *
  3  FROM   all_objects;
INSERT INTO all_objects_xt
            *
ERROR at line 1:
ORA-30657: operation not supported on external organized table

parallelism for performance

One of the key performance benefits of the original external table implementation is that it makes parallel processing much simpler than the SQL*Loader alternative. The same is also true of the new unloading external tables. Data can be written to multiple files in parallel to add a significant performance boost to unloading processes.

SQL> CREATE TABLE all_objects_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     LOCATION ( 'allobj1.dmp', 'allobj2.dmp', 'allobj3.dmp', 'allobj4.dmp' )
  7  )
  8  PARALLEL 4
  9  AS
 10     SELECT *
 11     FROM   all_objects;

Table created.

SQL> host dir c:\temp\ora\allobj*
 Volume in drive C is DRIVE-C
 Volume Serial Number is 6845-F732

 Directory of c:\temp\ora

11/05/2005  20:22         1,171,456 ALLOBJ1.DMP
11/05/2005  20:22         1,175,552 ALLOBJ2.DMP
11/05/2005  20:22         1,175,552 ALLOBJ3.DMP
11/05/2005  20:22         1,171,456 ALLOBJ4.DMP
               4 File(s)      4,694,016 bytes
               0 Dir(s)   4,302,034,944 bytes free

Oracle has now created four similarly-sized files and when we read the data back in, Oracle will read from each of the files to give us the entire picture.

SQL> SELECT COUNT(*) FROM all_objects_xt;

  COUNT(*)
----------
     47374

Interestingly, each of these files is self-contained, which means that it is possible to read any one or more of them, as the following demonstrates.

SQL> ALTER TABLE all_objects_xt LOCATION ( 'ALLOBJ1.DMP' );

Table altered.

SQL> SELECT COUNT(*) FROM all_objects_xt;

  COUNT(*)
----------
     11843

Incidentally, we must create one file per degree of parallelism. If we request a degree of parallelism greater than the number of files in the LOCATION clause, then Oracle will reduce the parallelism accordingly. If we request any available degree of parallelism (we can use the PARALLEL keyword without specifying a degree) but only include one file, then Oracle will not invoke parallel query at all. Using the V$PQ_SESSTAT dynamic view, we can verify this quite simply.

SQL> host del c:\temp\ora\all*

SQL> DROP TABLE all_objects_xt;

Table dropped.

SQL> CREATE TABLE all_objects_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     LOCATION ( 'allobj.dmp' )
  7  )
  8  PARALLEL
  9  AS
 10     SELECT *
 11     FROM   all_objects;

Table created.

SQL> SELECT last_query
  2  FROM   v$pq_sesstat
  3  WHERE  statistic = 'DDL Parallelized';

LAST_QUERY
----------
         0

Note that the same restriction does not apply when reading the data back. In most cases, the ORACLE_DATAPUMP access driver will be able to read the data from a single file in parallel.

The following shows how Oracle will use parallel processing to unload data when we provide more than a single file.

SQL> host del c:\temp\ora\all*

SQL> DROP TABLE all_objects_xt;

Table dropped.

SQL> CREATE TABLE all_objects_xt
  2  ORGANIZATION EXTERNAL
  3  (
  4     TYPE ORACLE_DATAPUMP
  5     DEFAULT DIRECTORY xt_dir
  6     LOCATION ( 'allobj1.dmp', 'allobj2.dmp' )
  7  )
  8  PARALLEL
  9  AS
 10     SELECT *
 11     FROM   all_objects;

Table created.

SQL> SELECT last_query
  2  FROM   v$pq_sesstat
  3  WHERE  statistic = 'DDL Parallelized';

LAST_QUERY
----------
         1

direct path read/write for performance

In addition to the performance benefits of writing data in parallel, Oracle also treats external data unloads as unlogged direct path operations (which means they won't interfere with your buffer cache). For the following demonstration, I have taken a heap copy of ALL_OBJECTS to simplify the explain plan output (ALL_OBJECTS is a view with a reasonably complicated execution plan - taking a copy ensures we get a simple, single-table access plan).

SQL> EXPLAIN PLAN FOR
  2     CREATE TABLE my_all_objects_xt
  3     ORGANIZATION EXTERNAL
  4     (
  5        TYPE ORACLE_DATAPUMP
  6        DEFAULT DIRECTORY xt_dir
  7        LOCATION ( 'whatever.dmp' )
  8     )
  9     AS
 10        SELECT *
 11        FROM   my_all_objects;

Explained.

SQL> SELECT *
  2  FROM   TABLE( DBMS_XPLAN.DISPLAY );

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------
Plan hash value: 3302953930

-----------------------------------------------------------------------------------------
| Id  | Operation              | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT |                | 48693 |  6086K|   344   (1)| 00:00:05 |
|   1 |  LOAD AS SELECT        |                |       |       |            |          |
|   2 |   TABLE ACCESS FULL    | MY_ALL_OBJECTS | 48693 |  6086K|   149   (1)| 00:00:02 |
-----------------------------------------------------------------------------------------

Note

   - dynamic sampling used for this statement

13 rows selected.

shared data across oracle versions

It is possible to share data between different versions of Oracle using the ORACLE_DATAPUMP driver. Oracle includes an optional VERSION clause to use in the ACCESS PARAMETERS when unloading data that is to be read in a database of a different version. Valid values are "COMPATIBLE", "LATEST" or an actual version string. It is difficult to test this at present with my Solaris 10.1.0.3 and XP 10.1.0.2 databases as the dump files produced by both are interchangeable without any reference to VERSION. In fact, I am currently able to share data between the two databases despite including complete garbage in the VERSION clause. Perhaps we will need a major release (e.g. 10gR2 in Summer 2005) to test backwards compatibility.

uses for writeable external tables

All of which is very interesting and exciting, but given that the unload feature of external tables is proprietary (i.e. the Data Pump format), why would we use it?

Ultimately, Oracle has provided writeable external tables as a means of backing up data or sharing data between schemas or databases, regardless of platform or version (thanks to Data Pump). At its most simple, it is a very useful mechanism for backing up data in a development environment (for example, to save a "before" image of data for regression testing). It is also feasible to use external tables for archiving data.

For example, it is quite common for data warehouses to have "rolling" histories, whereby historical data is purged after a certain period. The most simple and efficient method for this is to use partitioning and simply "age-out" the partitions that contain data beyond the threshold. Partition exchange has always been the preferred mechanism for removing the data from the table, but how that data is archived has not always been so simple. With the requirement being to get the data onto a storage device for archiving, options have often been restricted to Oracle's export utility or even a "hand-rolled" data unloader using UTL_FILE, for example. The former can be fast but requires a switch to a shell environment whereas the latter can be painfully slow across large data volumes. It is not uncommon for people to rely on spooling sqlplus scripts, which offers very little application "visibility" and can make some very well respected application developers shudder at the thought!

To my mind, writeable external tables in 10g excel in this task. Not only are they an excellent mechanism for fast data unloading, they are also fully controllable within the perimeter of a PL/SQL application. An added benefit is that Oracle creates the table that can be used to read the data back should the users ever require it. With the addition of a sensible naming convention, the archiving requirement of a rolling history application becomes faster and considerably easier. Swap out the partition, unload the data for archiving and leave the external table behind to read the data back should it ever be required and as a persistent reminder of what was archived. It is even possible to skip the partition exchange step and simply create the external table as a SELECT from the oldest partition before it is removed from the history table.

It is also feasible that writeable external tables could become a mechanism for sharing data, especially in environments that do not use database links or where the processing requirement is for features that are not supported over links (such as direct path, parallel query, bulk PL/SQL operations). The Data Pump driver makes it possible to share data across database versions and operating systems, even those of different endianness. To continue with the implications for data archiving as described above, what if the data was to be archived to another database rather than storage device? We now have significantly more control of our archiving requirement, all from within the same PL/SQL application. Using features such as DBMS_FILE_TRANSFER (10g) and DBMS_AQ, we can now share data across databases and have the receiving database act on the availability of the new data.

metadata

In addition to the XXX_TABLES family of views, Oracle provides two further view families specific to external tables. These are XXX_EXTERNAL_TABLES and XXX_EXTERNAL_LOCATIONS. The former provides us with information such as the external table type and access parameters, while the latter view family provides information on the underlying flat files and their locations. The XXX_TABLES view family is still of importance, of course, with information critical to the CBO (statistics for external tables can be computed using DBMS_STATS).

conclusions

Writeable external tables are a useful new addition to Oracle 10g's set of utilities. They provide a fast a simple mechanism to backup or share critical and non-critical data in a variety of applications. This paper hopefully outlines some of the interesting new opportunities and benefits and stimulates developers into thinking of alternative approaches to explore when building database applications.

further reading

More information on external tables can be found in the Utilities book in the online Oracle documentation. I always start with the documentation search engine and locate the correct set of manuals from there.

source code

The source code for the examples in this article can be downloaded from here.

Adrian Billington, May 2005

Back to Top