emulating string-to-table functionality using sql

This short article demonstrates a number of methods for emulating "string-to-table" functionality in SQL only (i.e. without a PL/SQL function). I've put this together following my response to a forum posting on DBA Support. The forum question asked how to split a column of single, delimited strings into its constituent elements, but in SQL only. Each element must be represented in its own row, such that 'A,B,C' generates three records.

what's included?

In addition to my solution posted to DBA Support, this article also includes follow-up examples by William Robertson, Adrian Christie and Mohit Agarwal. In total, five methods are included:

setup

First we will create a table with some test data and a collection type, starting with the table (with just a couple of rows for demonstration purposes).

SQL> CREATE TABLE t (str VARCHAR2(30));

Table created.

SQL> INSERT INTO t VALUES ( 'X,Y,Z' );

1 row created.

SQL> INSERT INTO t VALUES ( 'XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG' );

1 row created.

Next we create a collection type for use in the SQL-only implementation.

SQL> CREATE OR REPLACE TYPE number_ntt AS TABLE OF NUMBER;
  2  /

Type created.

sql solution

As noted, the following SQL-only solution works for all versions of 9i and onwards. It transforms each delimited string into multiple rows. Note that in this example, the string-delimiter is hard-coded to a comma. Some further comments on the SQL follow the example.

SQL> WITH ilv AS (
  2      SELECT str || ','                                   AS str
  3      ,     (LENGTH(str) - LENGTH(REPLACE(str, ','))) + 1 AS no_of_elements
  4      FROM   t
  5      )
  6  SELECT RTRIM(str, ',')                              AS original_string
  7  ,      SUBSTR(str, start_pos, (next_pos-start_pos)) AS single_element
  8  ,      element_no
  9  FROM  (
 10         SELECT ilv.str
 11         ,      nt.column_value AS element_no
 12         ,      INSTR(
 13                   ilv.str,
 14                   ',',
 15                   DECODE(nt.column_value, 1, 0, 1),
 16                   DECODE(nt.column_value, 1, 1, nt.column_value-1)) + 1 AS start_pos
 17         ,      INSTR(
 18                   ilv.str,
 19                   ',',
 20                   1,
 21                   DECODE(nt.column_value, 1, 1, nt.column_value)) AS next_pos
 22         FROM   ilv
 23         ,      TABLE(
 24                   CAST(
 25                      MULTISET(
 26                         SELECT ROWNUM FROM dual CONNECT BY ROWNUM < ilv.no_of_elements
 27                         ) AS number_ntt )) nt
 28        );

ORIGINAL_STRING                 SINGLE_ELEMENT  ELEMENT_NO
------------------------------- --------------- ----------
X,Y,Z                           X                        1
X,Y,Z                           Y                        2
X,Y,Z                           Z                        3
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  XXX                      1
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  Y                        2
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  ZZ                       3
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  AAAAA                    4
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  B                        5
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  CCC                      6
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  D                        7
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  E                        8
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  F                        9
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  GGG                     10

13 rows selected.

Note that the above SQL performs the following steps:

model solution

Adrian Christie's method uses the MODEL clause to generate the rows to "pivot" each string. MODEL is supported for all versions of 10g onwards. We'll look at the SQL first and then some points of note.

SQL> col element_count noprint

SQL> WITH ilv AS (
  2     SELECT str                                          AS orig_str
  3     ,      ',' || str || ','                            AS mod_str
  4     ,      1                                            AS start_pos
  5     ,      LENGTH(str)                                  AS end_pos
  6     ,     (LENGTH(str) - LENGTH(REPLACE(str, ','))) + 1 AS element_count
  7     ,      0                                            AS element_no
  8     ,      ROWNUM                                       AS rn
  9     FROM   t
 10     )
 11  SELECT orig_str AS original_string
 12  ,      SUBSTR(mod_str, start_pos, end_pos-start_pos) AS single_element
 13  ,      element_no
 14  ,      element_count
 15  FROM  (
 16         SELECT *
 17         FROM   ilv
 18         MODEL
 19             PARTITION BY (rn, orig_str, mod_str)
 20             DIMENSION BY (element_no)
 21             MEASURES     (start_pos, end_pos, element_count)
 22             RULES
 23                 ITERATE (2000)
 24                 UNTIL (ITERATION_NUMBER+1 = element_count[0])
 25                 (
 26                   start_pos[ITERATION_NUMBER+1] 
 27                       = INSTR(CV(mod_str), ',', 1, CV(element_no)) + 1,
 28                   end_pos[ITERATION_NUMBER+1] 
 29                       = INSTR(CV(mod_str), ',', 1, CV(element_no) + 1)
 30                 )
 31        )
 32  WHERE  element_no != 0
 33  ORDER  BY
 34         mod_str
 35  ,      element_no;

ORIGINAL_STRING                SINGLE_ELEMENT  ELEMENT_NO
------------------------------ --------------- ----------
X,Y,Z                          X                        1
X,Y,Z                          Y                        2
X,Y,Z                          Z                        3
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG XXX                      1
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG Y                        2
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG ZZ                       3
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG AAAAA                    4
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG B                        5
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG CCC                      6
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG D                        7
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG E                        8
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG F                        9
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG GGG                     10

13 rows selected.

Some points to note are as follows.

xml solution

William Robertson's method uses the XML features of Oracle 9.2 upwards. Each string is transformed into a collection of XML elements by replacing the commas with tags and casting the strings into XMLTypes. Once in an XMLType, the elements can be extracted using XPath expressions. Unfortunately, this method will be slow at anything other than low volumes.

SQL> SELECT str                                        AS original_string
  2  ,      EXTRACTVALUE( xt.column_value, 'element' ) AS single_element
  3  FROM   t
  4  ,      TABLE(
  5            XMLSEQUENCE(
  6               EXTRACT(
  7                  XMLTYPE(
  8                     '' ||
  9                     REPLACE( t.str, ',', '' ) ||
 10                     ''
 11                     ),
 12                  '/coll/element'
 13                  )
 14               )
 15            ) xt;

ORIGINAL_STRING                SINGLE_ELEMENT
------------------------------ ---------------
X,Y,Z                          X
X,Y,Z                          Y
X,Y,Z                          Z
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG XXX
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG Y
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG ZZ
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG AAAAA
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG B
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG CCC
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG D
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG E
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG F
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG GGG

13 rows selected.

hierarchical solution

Mohit Agarwal's solution utilises a hierarchical query (CONNECT BY) combined with regular expressions to strip the tokens from each string. This is a much simpler version to read and uses a single in-line view, as follows.

SQL> SELECT str
  2  ,      REGEXP_SUBSTR(str, '[^,]+', 1, LEVEL) AS single_element
  3  ,      LEVEL                                 AS element_no
  4  FROM  (
  5         SELECT ROWNUM AS id
  6         ,      str
  7         FROM   t
  8        )
  9  CONNECT BY INSTR(str, ',', 1, LEVEL-1) > 0
 10         AND id = PRIOR id
 11         AND PRIOR DBMS_RANDOM.VALUE IS NOT NULL;

STR                            SINGLE_ELEMENT                 ELEMENT_NO
------------------------------ ------------------------------ ----------
X,Y,Z                          X                                       1
X,Y,Z                          Y                                       2
X,Y,Z                          Z                                       3
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG XXX                                     1
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG Y                                       2
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG ZZ                                      3
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG AAAAA                                   4
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG B                                       5
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG CCC                                     6
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG D                                       7
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG E                                       8
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG F                                       9
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG GGG                                    10

13 rows selected.

Note that the in-line view is required to alias ROWNUM (as it cannot be used directly in the PRIOR clause on line 10).

sql solution for 8i

We can easily modify the original SQL version to be 8i-compatible. We'll see the SQL first then examine the key differences from the 9i+ version.

SQL> SELECT RTRIM(str, ',')                              AS original_string
  2  ,      SUBSTR(str, start_pos, (next_pos-start_pos)) AS single_element
  3  ,      element_no
  4  FROM  (
  5         SELECT ilv.str
  6         ,      nt.column_value AS element_no
  7         ,      INSTR(
  8                   ilv.str,
  9                   ',',
 10                   DECODE(nt.column_value, 1, 0, 1),
 11                   DECODE(nt.column_value, 1, 1, nt.column_value-1)) + 1 AS start_pos
 12         ,      INSTR(
 13                   ilv.str,
 14                   ',',
 15                   1,
 16                   DECODE(nt.column_value, 1, 1, nt.column_value)) AS next_pos
 17         FROM  (
 18                SELECT str || ',' AS str
 19                ,     (LENGTH(str) - LENGTH(REPLACE(str, ','))) + 1 AS no_of_elements
 20                FROM   t
 21               ) ilv
 22         ,      TABLE(
 23                   CAST(
 24                      MULTISET(
 25                         SELECT ROWNUM FROM all_objects WHERE ROWNUM <= ilv.no_of_elements
 26                         ) AS number_ntt )) nt
 27        );

ORIGINAL_STRING                 SINGLE_ELEMENT  ELEMENT_NO
------------------------------- --------------- ----------
X,Y,Z                           X                        1
X,Y,Z                           Y                        2
X,Y,Z                           Z                        3
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  XXX                      1
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  Y                        2
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  ZZ                       3
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  AAAAA                    4
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  B                        5
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  CCC                      6
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  D                        7
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  E                        8
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  F                        9
XXX,Y,ZZ,AAAAA,B,CCC,D,E,F,GGG  GGG                     10

13 rows selected.

The main differences are as follows.

summary timings

To test the performance of each of these methods, the T table was scaled up to 100,000 rows of 4 elements. We can see the average "wall-clock" timings of each method below.

Method Timing (s)
9i+ SQL 11
8i+ SQL (running on 8i) 14
10g+ MODEL 27
10g+ hierarchical SQL 32
8i+ SQL (running on 10g) 114
9.2+ XML cancelled after 600

further reading

The original thread that prompted this article can be found here.

acknowledgements

Thanks to Adrian Christie, William Robertson and Mohit Agarwal for contributing to this article.

source code

The source code for the examples in this article can be downloaded from here.

Adrian Billington, July 2005 (updated August 2007)

Back to Top