listagg function in 11g release 2

This article describes the new LISTAGG function in Oracle 11g Release 2. LISTAGG is a built-in function that enables us to perform string aggregation natively. String aggregation is a popular technique, and there are several methods available on the web, so we will compare their performance to the new LISTAGG function later in this article.

sample data

For reference, we will use the following sample data for our examples.


    DEPTNO ENAME      HIREDATE
---------- ---------- ----------
        10 CLARK      09/06/1981
        10 KING       17/11/1981
        10 MILLER     23/01/1982
        20 ADAMS      12/01/1983
        20 FORD       03/12/1981
        20 JONES      02/04/1981
        20 SCOTT      09/12/1982
        20 SMITH      17/12/1980
        30 ALLEN      20/02/1981
        30 BLAKE      01/05/1981
        30 JAMES      03/12/1981
        30 MARTIN     28/09/1981
        30 TURNER     08/09/1981
        30 WARD       22/02/1981

what is string aggregation?

String aggregation is simply the grouping and concatenation of multiple rows of data into a single row per group. For example, consider the following resultset:


   DEPTNO ENAME
--------- ----------
       10 CLARK
       10 KING
       10 MILLER
       20 ADAMS
       20 FORD
       20 JONES

With string aggregation, this resultset would be grouped (by DEPTNO) as follows:


   DEPTNO AGGREGATED_ENAMES
--------- -------------------------
       10 CLARK,KING,MILLER
       20 ADAMS,FORD,JONES

We can see that the employee names have simply been grouped and concatenated into a single column (values are delimited by comma) per group. As stated, there are several techniques available to perform this aggregation (references are provided at the end of this article), but the new LISTAGG function makes it much easier, as we will see below.

listagg syntax overview

The LISTAGG function has the following syntax structure:


LISTAGG( [,]) WITHIN GROUP (ORDER BY ) [OVER (PARTITION BY )]

LISTAGG is an aggregate function that can optionally be used as an analytic (i.e. the optional OVER() clause). The following elements are mandatory:

We will now see some examples of the function below.

listagg as an aggregate function

We will begin with a simple example that aggregates the employee names for each department in the EMP table, using a comma as delimiter.

SQL> SELECT deptno
  2  ,      LISTAGG(ename, ',') WITHIN GROUP (ORDER BY ename) AS employees
  3  FROM   emp
  4  GROUP  BY
  5         deptno;

    DEPTNO EMPLOYEES
---------- ------------------------------------------------------------
        10 CLARK,KING,MILLER
        20 ADAMS,FORD,JONES,SCOTT,SMITH
        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

Note that we chose to order the employees within each aggregation by the employee name. It should be noted that ordering the elements of a string aggregation is not a trivial task in some of the alternative techniques to LISTAGG.

In the following example, we will aggregate the employee names but order them by their respective hire dates.

SQL> SELECT deptno
  2  ,      LISTAGG(ename, ',') WITHIN GROUP (ORDER BY hiredate) AS employees
  3  FROM   emp
  4  GROUP  BY
  5         deptno;

    DEPTNO EMPLOYEES
---------- ------------------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,FORD,SCOTT,ADAMS
        30 ALLEN,WARD,BLAKE,TURNER,MARTIN,JAMES

3 rows selected.

We can see that the order of employee names within each group differs from the first example.

listagg as an analytic function

As with many aggregate functions, LISTAGG can be converted to an analytic function by adding the OVER() clause. The following example demonstrates the analytic equivalent of our previous example (for each department, aggregate employee names in hire date order).

SQL> SELECT deptno
  2  ,      ename
  3  ,      hiredate
  4  ,      LISTAGG(ename, ',')
  5            WITHIN GROUP (ORDER BY hiredate)
  6            OVER (PARTITION BY deptno) AS employees
  7  FROM   emp;

    DEPTNO ENAME      HIREDATE    EMPLOYEES
---------- ---------- ----------- -------------------------------------
        10 CLARK      09/06/1981  CLARK,KING,MILLER
        10 KING       17/11/1981  CLARK,KING,MILLER
        10 MILLER     23/01/1982  CLARK,KING,MILLER
        20 SMITH      17/12/1980  SMITH,JONES,FORD,SCOTT,ADAMS
        20 JONES      02/04/1981  SMITH,JONES,FORD,SCOTT,ADAMS
        20 FORD       03/12/1981  SMITH,JONES,FORD,SCOTT,ADAMS
        20 SCOTT      19/04/1987  SMITH,JONES,FORD,SCOTT,ADAMS
        20 ADAMS      23/05/1987  SMITH,JONES,FORD,SCOTT,ADAMS
        30 ALLEN      20/02/1981  ALLEN,WARD,BLAKE,TURNER,MARTIN,JAMES
        30 WARD       22/02/1981  ALLEN,WARD,BLAKE,TURNER,MARTIN,JAMES
        30 BLAKE      01/05/1981  ALLEN,WARD,BLAKE,TURNER,MARTIN,JAMES
        30 TURNER     08/09/1981  ALLEN,WARD,BLAKE,TURNER,MARTIN,JAMES
        30 MARTIN     28/09/1981  ALLEN,WARD,BLAKE,TURNER,MARTIN,JAMES
        30 JAMES      03/12/1981  ALLEN,WARD,BLAKE,TURNER,MARTIN,JAMES

14 rows selected.

Remember that when use an analytic function, we do not lose any rows from our resultset: rather we see the results of the aggregation on every source row (depending on the analytic function and the ordering/windowing clauses). We can clearly see this above. The string aggregation for each department is available on every corresponding row.

more on ordering

As stated earlier, the ORDER BY clause is mandatory, as following example demonstrates.

SQL> SELECT deptno
  2  ,      LISTAGG(ename, ',') WITHIN GROUP () AS employees
  3  FROM   emp
  4  GROUP  BY
  5         deptno;
,      LISTAGG(ename, ',') WITHIN GROUP () AS employees
                                         *
ERROR at line 2:
ORA-30491: missing ORDER BY clause

If the order of the aggregated elements is irrelevant to us, we can use a constant expression such as NULL, as follows.

SQL> SELECT deptno
  2  ,      LISTAGG(ename, ',') WITHIN GROUP (ORDER BY NULL) AS employees
  3  FROM   emp
  4  GROUP  BY
  5         deptno;

    DEPTNO EMPLOYEES
---------- ------------------------------------------------------------
        10 CLARK,KING,MILLER
        20 ADAMS,FORD,JONES,SCOTT,SMITH
        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

In this example, the elements have been aggregated alphabetically, despite the NULL ordering clause. This appears to be the default behaviour when using a constant ORDER BY expression such as the above.

delimiters

We can use a range of constants or expressions as a delimiter for our aggregated strings. In fact, the delimiter is optional and can be excluded altogether, as the following example demonstrates.

SQL> SELECT deptno
  2  ,      LISTAGG(ename) WITHIN GROUP (ORDER BY ename) AS employees
  3  FROM   emp
  4  GROUP  BY
  5         deptno;

    DEPTNO EMPLOYEES
---------- ------------------------------------------------------------
        10 CLARKKINGMILLER
        20 ADAMSFORDJONESSCOTTSMITH
        30 ALLENBLAKEJAMESMARTINTURNERWARD

3 rows selected.

One restriction is that delimiters have to be either constants (such as a literal) or based on a deterministic expression that includes a column or expression used in the grouping. For example, we cannot use ROWNUM as a delimiter, as we see below.

SQL> SELECT deptno
  2  ,      LISTAGG(ename, '(' || ROWNUM || ')')
  3            WITHIN GROUP (ORDER BY hiredate) AS employees
  4  FROM   emp
  5  GROUP  BY
  6         deptno;
,      LISTAGG(ename, '(' || ROWNUM || ')')
                             *
ERROR at line 2:
ORA-30497: Argument should be a constant or a function of expressions in GROUP BY.

The error message is clear: ROWNUM is neither a constant nor an expression that involves our grouping column (i.e. DEPTNO). If we try to use our grouping column, we are limited to the type of expression we can use, as we can see below.

SQL> SELECT deptno
  2  ,      LISTAGG(ename, '(' || MAX(deptno) || ')')
  3            WITHIN GROUP (ORDER BY hiredate) AS employees
  4  FROM   emp
  5  GROUP  BY
  6         deptno;
,      LISTAGG(ename, '(' || MAX(deptno) || ')')
                                 *
ERROR at line 2:
ORA-30496: Argument should be a constant.

This time, Oracle recognises that we tried to use our grouping column, but we did not use a valid expression (hence we receive another error message; albeit different to the previous example). The following is an example of a deterministic expression on our grouping column that Oracle will accept.

SQL> SELECT deptno
  2  ,      LISTAGG(ename, '(' || CHR(deptno+55) || '); ')
  3            WITHIN GROUP (ORDER BY hiredate) AS employees
  4  FROM   emp
  5  GROUP  BY
  6         deptno;

    DEPTNO EMPLOYEES
---------- ------------------------------------------------------------
        10 CLARK(A); KING(A); MILLER
        20 SMITH(K); JONES(K); FORD(K); SCOTT(K); ADAMS
        30 ALLEN(U); WARD(U); BLAKE(U); TURNER(U); MARTIN(U); JAMES

3 rows selected.

This is a contrived example, but we've simply converted the DEPTNO into an ASCII character to use as a delimiter. This is a deterministic expression on the value of the grouping column and Oracle allows it.

other restrictions

The results of LISTAGG are constrained to the maximum size of VARCHAR2 in SQL (i.e. 4000) as we can see below.

SQL> SELECT LISTAGG(object_name) WITHIN GROUP (ORDER BY NULL)
  2  FROM   all_objects;
FROM   all_objects
       *
ERROR at line 2:
ORA-01489: result of string concatenation is too long

There is no CLOB or larger VARCHAR2 equivalent, so for larger strings we would need to use an alternative means of gathering the elements (such as a collection or a user-defined PL/SQL function).

performance considerations

We will compare the performance of LISTAGG to some of the alternatives that pre-date it. The methods we will compare are as follows:

The main difference with LISTAGG is that it is a built-in function, so we should expect its performance to be at the very least comparable to its alternatives.

setup

For the performance comparisons, we will use a larger source dataset of 1 million rows spread evenly between 2000 groups, as follows.

SQL> CREATE TABLE t
  2  AS
  3     SELECT ROWNUM                     AS id
  4     ,      MOD(ROWNUM,2000)           AS grp
  5     ,      DBMS_RANDOM.STRING('u',5)  AS val
  6     ,      DBMS_RANDOM.STRING('u',30) AS pad
  7     FROM   dual
  8     CONNECT BY ROWNUM <= 1000000;

Table created.

SQL> SELECT COUNT(*) FROM t;

  COUNT(*)
----------
   1000000

1 row selected.

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER, 'T');

PL/SQL procedure successfully completed.

We will compare LISTAGG to the alternatives using both Autotrace and the wall-clock. Note that the sample data is cached prior to the comparisons. We will begin by preparing our environment as follows.

SQL> set autotrace traceonly statistics
SQL> set timing on
SQL> set arrays 500

listagg

Our first test will be with LISTAGG. We will aggregate and order the value string within each of our 2000 groups, as follows.

SQL> SELECT grp
  2  ,      LISTAGG(val, ',') WITHIN GROUP (ORDER BY val) AS vals
  3  FROM   t
  4  GROUP  BY
  5         grp;

2000 rows selected.

Elapsed: 00:00:05.85

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       7092  consistent gets
          0  physical reads
          0  redo size
    6039067  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       2000  rows processed

This executes in just under 6 seconds on the test database, with no physical I/O and all sorting in memory.

stragg/wm_concat

Next we will use the best known technique for string aggregation: Tom Kyte's user-defined aggregate function known as STRAGG. In 10g, Oracle implemented a similar function in the WMSYS schema, so we will assume this to be a direct equivalent of STRAGG and use it in our test below. Note that one of the details of the STRAGG method is that it doesn't enable the strings to be ordered.

SQL> SELECT grp
  2  ,      WMSYS.WM_CONCAT(val) AS vals --<-- WM_CONCAT ~= STRAGG
  3  FROM   t
  4  GROUP  BY
  5         grp;

2000 rows selected.

Elapsed: 00:00:19.45

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       7206  consistent gets
          0  physical reads
          0  redo size
    6039067  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       2000  rows processed

We can see that this is over 3 times slower than LISTAGG (even without element ordering). The user-defined aggregate function carries with it the general overhead of PL/SQL functions (i.e. context-switching) and we can see the impact of this in the results above.

collect (without ordering)

When 10g was released, I devised a quicker alternative to STRAGG using the COLLECT function combined with a "collection-to-string" PL/SQL function (described in this oracle-developer.net article). The 10g version of COLLECT had no ordering facility so we'll start with this variant, as follows. Note that the TO_STRING source code is available in the referenced article.

SQL> SELECT grp
  2  ,      TO_STRING(
  3            CAST(COLLECT(val) AS varchar2_ntt)
  4            ) AS vals
  5  FROM   t
  6  GROUP  BY
  7         grp;

2000 rows selected.

Elapsed: 00:00:02.90

Statistics
----------------------------------------------------------
         10  recursive calls
          0  db block gets
       7197  consistent gets
          0  physical reads
          0  redo size
    6039067  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       2000  rows processed

Without sorting the elements, the original COLLECT/TO_STRING method is twice as fast as LISTAGG. However, this isn't a fair comparison because LISTAGG always incurs the cost of ordering the elements. If the order of the elements is irrelevant, however, the original COLLECT technique will be the quickest.

collect (with ordering)

For a fair comparison between COLLECT and LISTAGG, we must include the sorting of the elements in the COLLECT call (an 11g Release 1 new feature described in this oracle-developer.net article). The following is equivalent to our LISTAGG example.

SQL> SELECT grp
  2  ,      TO_STRING(
  3            CAST(COLLECT(val ORDER BY val) AS varchar2_ntt)
  4            ) AS vals
  5  FROM   t
  6  GROUP  BY
  7         grp;

2000 rows selected.

Elapsed: 00:00:07.08

Statistics
----------------------------------------------------------
         10  recursive calls
          0  db block gets
       7197  consistent gets
          0  physical reads
          0  redo size
    6039067  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       2000  rows processed

This time, when we include the ordering of the elements, the COLLECT method is actually slower than the new LISTAGG function.

model

Our final performance comparison is with an example of string aggregation that uses the MODEL clause. The following example was sourced from Rob van Wijk's About Oracle blog and has been modified to fit our sample data.

SQL> SELECT grp
  2  ,      vals
  3  FROM  (
  4         SELECT grp
  5         ,      RTRIM(vals, ',') AS vals
  6         ,      rn
  7         FROM   t
  8         MODEL
  9            PARTITION BY (grp)
 10            DIMENSION BY (ROW_NUMBER() OVER (PARTITION BY grp ORDER BY val) AS rn)
 11            MEASURES (CAST(val AS VARCHAR2(4000)) AS vals)
 12            RULES
 13            (  vals[ANY] ORDER BY rn DESC = vals[CV()] || ',' || vals[CV()+1]
 14            )
 15        )
 16  WHERE  rn = 1
 17  ORDER  BY
 18         grp;

2000 rows selected.

Elapsed: 00:03:28.15

Statistics
----------------------------------------------------------
       3991  recursive calls
          0  db block gets
       7092  consistent gets
     494791  physical reads
          0  redo size
    6039067  bytes sent via SQL*Net to client
        553  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
        130  sorts (memory)
          0  sorts (disk)
       2000  rows processed

This ran for over 3 minutes! We can see from the statistics that it has incurred a significant number of physical reads, recursive calls and in-memory sorting. In fact, this poor timing is largely due to an enormous number of direct path reads/writes to/from the temporary tablespace during query execution (although the disk sorts are not reported by the Autotrace statistics).

The execution plan for the MODEL string aggregation is as follows.


--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes |TempSpc|
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |       |       |       |
|   1 |  SORT ORDER BY        |      |  1000K|  1934M|  1953M|
|*  2 |   VIEW                |      |  1000K|  1934M|       |
|   3 |    SQL MODEL ORDERED  |      |  1000K|  9765K|       |
|   4 |     WINDOW SORT       |      |  1000K|  9765K|    19M|
|   5 |      TABLE ACCESS FULL| T    |  1000K|  9765K|       |
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("RN"=1)

According to a SQL monitor report (using DBMS_SQLTUNE.REPORT_SQL_MONITOR), the ordering of data within the SQL MODEL ORDERED operation at step 3 accounted for almost 4Gb of temporary space! An alternative example from Gary Myers' Sydney Oracle Lab blog displays similar performance characteristics.

performance summary

To summarise, therefore, the new LISTAGG function is the fastest technique for string aggregation and has the additional benefit of being a simple built-in function. If the ordering of elements is irrelevant, a technique that uses COLLECT without sorting is quicker still, but for ordered string aggregation, LISTAGG is unbeatable. Note that there is an issue with the MODEL examples used in the performance comparisons, but this might be isolated to either the version of Oracle used for the examples (11.2.0.1 Windows 64-bit) or the test database itself.

further reading

For more information on LISTAGG, see the online SQL Language Reference. For a good summary of common string-aggregation techniques, see this article by Tim Hall and this article by William Robertson. As referenced earlier in this article, oracle-developer.net articles on the COLLECT function can be found here and here. For more information on using MODEL for string aggregation, see blog entries by Gary Myers and Rob van Wijk.

source code

The source code for the examples in this article can be downloaded from here.

Adrian Billington, June 2010

Back to Top