the collect function in 10g

Oracle 10g has introduced an extremely useful new group function, COLLECT. This function enables us to aggregate data into a collection, retaining multiple records of data within a single row (like a nested table). One of the main benefits of this function is that it makes "string aggregation" (one of the web's most-requested Oracle technique) very simple. This article will introduce the COLLECT function and then demonstrate how it can be used to aggregate multiple records into a single value (a technique known as "string aggregation").

an overview of the collect function

We'll start by demonstrating the COLLECT function. We'll run a simple query against the ubiquitous EMP table to collect the names of all employees by department.

SQL> SELECT deptno
  2  ,      COLLECT(ename) AS emps
  3  FROM   emp
  4  GROUP  BY
  5         deptno;

    DEPTNO EMPS
---------- ------------------------------------------------------------------------------------
        10 SYSTPXeCjDqbWSqWrshgYrRPR4Q==('CLARK', 'KING')
        20 SYSTPXeCjDqbWSqWrshgYrRPR4Q==('SMITH', 'JONES', 'SCOTT', 'ADAMS', 'FORD')
        30 SYSTPXeCjDqbWSqWrshgYrRPR4Q==('ALLEN', 'WARD', 'MARTIN', 'BLAKE', 'TURNER', 'JAMES')
        40 SYSTPXeCjDqbWSqWrshgYrRPR4Q==('MILLER')

4 rows selected.

Something looks a little unusual here, but ignoring the strange identifier for a moment, we can see that the COLLECT function has aggregated the employee names per department as requested.

system-generated types

Moving on to the strange identifier in the example output, we can see that Oracle has created a collection type to support the COLLECT function. The behaviour is different between 10g releases 1 and 2, so we'll investigate each separately.

10g release 1

For our example EMP query above (executed in a 10.1 database), Oracle has created a supporting type named "SYSTPXeCjDqbWSqWrshgYrRPR4Q==". We can find this in the dictionary as follows.

SQL> SELECT owner
  2  ,      typecode
  3  FROM   all_types
  4  WHERE  type_name = 'SYSTPXeCjDqbWSqWrshgYrRPR4Q==';

OWNER                          TYPECODE
------------------------------ ------------------------------
SYS                            COLLECTION

1 row selected.

It appears as though Oracle generates a supporting collection type every time it hard-parses a SQL statement that uses the COLLECT function. In Oracle 10.1, the type is created in the SYS schema. If we try to use this new type as follows, we'll find that we cannot.

SQL> SELECT *
  2  FROM   TABLE(
  3            sys."SYSTPXeCjDqbWSqWrshgYrRPR4Q=="('A','B','C') );
          sys."SYSTPXeCjDqbWSqWrshgYrRPR4Q=="('A','B','C') )
          *
ERROR at line 3:
ORA-00904: "SYS"."SYSTPXeCjDqbWSqWrshgYrRPR4Q==": invalid identifier

In addition to the burden this must place on the parsing process, we might also be concerned about the number of system-generated types that might start appearing in our database (for reasons unknown, some DBAs and developers are worried about this sort of thing). If we flush the shared pool or even bounce the database, the type persists, even though it no longer supports a cached SQL statement. Oracle Support's official line on this (which used to be "bounce the database to remove the type") is that SMON cleans up unused types "after a period". Recent experience suggests that this period can be anything up to 24 hours after the bounce, so in an online database with a large shared pool, these types could stick around for some considerable period. Whether we choose to worry about this or not is another matter entirely!

10g release 2

In Oracle 10g Release 2, the type is created in the schema that parses the SQL statement, as we can see in the following example (we must first repeat the original EMP query to generate a type).

SQL> SELECT deptno
  2  ,      COLLECT(ename) AS emps
  3  FROM   emp
  4  GROUP  BY
  5         deptno;

    DEPTNO EMPS
---------- ------------------------------------------------------------------------------------
        10 SYSTPo3itZvoiRAyeH+f5LKv6+Q==('CLARK', 'KING')
        20 SYSTPo3itZvoiRAyeH+f5LKv6+Q==('SMITH', 'JONES', 'SCOTT', 'ADAMS', 'FORD')
        30 SYSTPo3itZvoiRAyeH+f5LKv6+Q==('ALLEN', 'WARD', 'MARTIN', 'BLAKE', 'TURNER', 'JAMES')
        40 SYSTPo3itZvoiRAyeH+f5LKv6+Q==('MILLER')

4 rows selected.

SQL> SELECT owner
  2  ,      typecode
  3  FROM   all_types
  4  WHERE  type_name = 'SYSTPo3itZvoiRAyeH+f5LKv6+Q==';

OWNER                          TYPECODE
------------------------------ ------------------------------
SCOTT                          COLLECTION

1 row selected.

The fact that the parsing schema owns the collection type in Oracle 10.2 means that we can use these types if we wish, as follows.

SQL> SELECT *
  2  FROM   TABLE(
  3            "SYSTPo3itZvoiRAyeH+f5LKv6+Q=="('A','B','C') );

COLUMN_VAL
----------
A
B
C

3 rows selected.

Furthermore, removing this system-generated type is much more simple in 10.2 than in 10.1. Firstly, because we own the type, we can simply drop it as follows (assuming that the SQL statement that generated it is no longer required).

SQL> DROP TYPE "SYSTPo3itZvoiRAyeH+f5LKv6+Q==";

Type dropped.

Bear in mind, however, that this type is supporting a SQL cursor. Therefore, if we do decide to drop the system-generated type as above, the underlying SQL cursor will be removed from the shared pool. Therefore, a re-run of the original SQL statement will need to be hard-parsed and a new type will be created accordingly.

Alternatively, a bounce of the database will drop the type immediately (rather than at some point during the next 24 hours, which is the 10.1 behaviour). Despite this, we are probably best to leave the SQL cursor to age out of the shared pool naturally, leaving SMON to clean up at a later stage.

using our own collection types

It is possible to use our own collection types with COLLECT. The CAST function (available at least as far back as Oracle 8.0 and possibly further) can be used to turn the results of the COLLECT into a type of our choosing. Note that this doesn't stop Oracle creating system-generated types to support the SQL statement, but it does make the results easier to work with.

In the following example, we'll create a standard VARCHAR2 collection type and CAST the results of our collected employee names.

SQL> CREATE OR REPLACE TYPE varchar2_ntt AS TABLE OF VARCHAR2(4000);
  2  /

Type created.
SQL> SELECT deptno
  2  ,      CAST(COLLECT(ename) AS varchar2_ntt) AS emps
  3  FROM   emp
  4  GROUP  BY
  5         deptno;

    DEPTNO EMPS
---------- ---------------------------------------------------------------------
        10 VARCHAR2_NTT('CLARK', 'KING')
        20 VARCHAR2_NTT('SMITH', 'JONES', 'SCOTT', 'ADAMS', 'FORD')
        30 VARCHAR2_NTT('ALLEN', 'WARD', 'MARTIN', 'BLAKE', 'TURNER', 'JAMES')
        40 VARCHAR2_NTT('MILLER')

4 rows selected.

Note that if you are casting collections of numbers, Oracle can be particularly fussy about precisions and scales, as the following example demonstrates. We'll create a general collection of number and then attempt to cast a collection of employee salaries (the EMP.SAL column is defined as NUMBER(7,2)).

SQL> CREATE OR REPLACE TYPE number_ntt AS TABLE OF NUMBER;
  2  /

Type created.
SQL> SELECT deptno
  2  ,      CAST(COLLECT(sal) AS number_ntt) AS sals
  3  FROM   emp
  4  GROUP  BY
  5         deptno;
,      CAST(COLLECT(sal) AS number_ntt) AS sals
            *
ERROR at line 2:
ORA-22814: attribute or element value is larger than specified in type

This is rather a confusing problem, as the unconstrained NUMBER type should easily incorporate a NUMBER(7,2). To wrap this up, however, there are two simple solutions. We can either make the collected column fit the type or the type fit the column, as shown below. First we'll make the column fit the type.

SQL> SELECT deptno
  2  ,      CAST(COLLECT(CAST(sal AS NUMBER)) AS number_ntt) AS sals
  3  FROM   emp
  4  GROUP  BY
  5         deptno;

    DEPTNO SALS
---------- -------------------------------------------------------------------------
        10 NUMBER_NTT(2450, 5000)
        20 NUMBER_NTT(800, 2975, 3000, 1100, 3000)
        30 NUMBER_NTT(1600, 1250, 1250, 2850, 1500, 950)
        40 NUMBER_NTT(1300)

4 rows selected.

Secondly we'll make the type fit the column.

SQL> CREATE TYPE number_7_2_ntt AS TABLE OF NUMBER(7,2);
  2  /

Type created.

SQL> SELECT deptno
  2  ,      CAST(COLLECT(sal) AS number_7_2_ntt) AS sals
  3  FROM   emp
  4  GROUP  BY
  5         deptno;

    DEPTNO SALS
---------- -------------------------------------------------------------------------
        10 NUMBER_7_2_NTT(2450, 5000)
        20 NUMBER_7_2_NTT(800, 2975, 3000, 1100, 3000)
        30 NUMBER_7_2_NTT(1600, 1250, 1250, 2850, 1500, 950)
        40 NUMBER_7_2_NTT(1300)

4 rows selected.

string aggregation using collect

We'll now see a practical demonstration of what the COLLECT function can be used for. One of the FAQs of Oracle developer forums is how to aggregate multiple strings into a single value. From releases of Oracle 8.0 onwards, there have been numerous methods for doing this. The most well-known method undoubtedly utilises Tom Kyte's "STRAGG" user-defined aggregate function. The STRAGG function is popular because it is extremely easy to use and faster than any pre-9i method. The COLLECT function, when combined with a function to turn the elements of a collection into a string, is faster still. We'll look at this below.

We'll continue with our standard VARCHAR2_NTT collection type, but we'll also require a "collection-to-string" function as follows.

SQL> CREATE FUNCTION to_string (
  2                  nt_in        IN varchar2_ntt,
  3                  delimiter_in IN VARCHAR2 DEFAULT ','
  4                  ) RETURN VARCHAR2 IS
  5
  6     v_idx PLS_INTEGER;
  7     v_str VARCHAR2(32767);
  8     v_dlm VARCHAR2(10);
  9
 10  BEGIN
 11
 12     v_idx := nt_in.FIRST;
 13     WHILE v_idx IS NOT NULL LOOP
 14        v_str := v_str || v_dlm || nt_in(v_idx);
 15        v_dlm := delimiter_in;
 16        v_idx := nt_in.NEXT(v_idx);
 17     END LOOP;
 18
 19     RETURN v_str;
 20
 21  END to_string;
 22  /

Function created.

Now we are ready to demonstrate string aggregation using the COLLECT function. We'll again collect the employee names per department, but this time we will display them in a comma-delimited string.

SQL> SELECT deptno
  2  ,      TO_STRING(CAST(COLLECT(ename) AS varchar2_ntt)) AS emps
  3  FROM   emp
  4  GROUP  BY
  5         deptno;

    DEPTNO EMPS
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES

Now we can compare the performance of this method to the STRAGG implementation. We'll start by building a larger dataset to work with. We'll create a table with four sets of DBA_OBJECTS data.

SQL> CREATE TABLE t
  2  AS
  3     SELECT MOD(ROWNUM,100)          AS id
  4     ,      CAST('A' AS VARCHAR2(1)) AS val
  5     FROM   dba_objects
  6     ,      TABLE(varchar2_ntt('A','B','C','D'));

Table created.

SQL> SELECT COUNT(*) FROM t;

  COUNT(*)
----------
    193900
    
1 row selected.

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER,'T');

PL/SQL procedure successfully completed.

Now we have almost 200K rows to work with, we'll aggregate the VAL column into a delimited string, using STRAGG. We'll use a TIMER package for wall-clock timings and autotrace for statistics.

SQL> set autotrace traceonly statistics

SQL> exec timer.snap();

PL/SQL procedure successfully completed.

SQL> SELECT id
  2  ,      STRAGG(val) AS vals
  3  FROM   t
  4  GROUP  BY
  5         id;

100 rows selected.


Statistics
----------------------------------------------------------
        221  recursive calls
          9  db block gets
        551  consistent gets
        395  physical reads
          0  redo size
       5213  bytes sent via SQL*Net to client
        388  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          3  sorts (memory)
          1  sorts (disk)
        100  rows processed

SQL> exec timer.show('STRAGG');
[STRAGG] 7.20 seconds

PL/SQL procedure successfully completed.

Now we have a rough timing for STRAGG, we can move onto the COLLECT function. The syntax is not quite as simple as STRAGG, as we've seen, but the time-savings are significant.

SQL> exec timer.snap();

PL/SQL procedure successfully completed.

SQL> SELECT id
  2  ,      TO_STRING(CAST(COLLECT(val) AS varchar2_ntt)) AS vals
  3  FROM   t
  4  GROUP  BY
  5         id;

100 rows selected.


Statistics
----------------------------------------------------------
       4441  recursive calls
        111  db block gets
       2519  consistent gets
        109  physical reads
      22040  redo size
       5213  bytes sent via SQL*Net to client
        388  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
        104  sorts (memory)
          0  sorts (disk)
        100  rows processed

SQL> exec timer.show('COLLECT');
[COLLECT] 1.21 seconds

PL/SQL procedure successfully completed.

We can see that this is significantly faster. Yet we might also notice that many of the statistics are showing considerably more work being performed by Oracle in support of COLLECT. Most interestingly the number of recursive calls and sorts are much higher than for the STRAGG method. How can the COLLECT function be faster? The answer is not displayed by autotrace; it is context-switching. In the STRAGG implementation, there's a context-switch for every value being aggregated (in our example, roughly 193,000). Yet in the COLLECT example, we are only context-switching 100 times (once for every call to TO_STRING). As we know from the 8i days when BULK COLLECT was making headway, context-switching penalties can be high and we can see this once again.

further reading

For a good summary of common string-aggregation techniques, see this article by Tim Hall and this article by William Robertson.

For a copy of STRAGG, see this thread on Ask Tom. This thread also contains a CONCAT_ALL function by James Padfield which is essentially a re-factored STRAGG but allowing slightly more flexibility with delimiters. For a copy of the TIMER function used in the examples in this article, see the Utilities page on this site.

source code

The source code for the examples in this article can be downloaded from here.

Adrian Billington, June 2004 (updated July 2008)

Back to Top