user-defined aggregate functions in oracle 9i

In this article, we will take a brief look at user-defined aggregate functions, a new feature of Oracle 9i Release 1 (9.0). This feature enables us to create our own aggregation rules and define them as a function for use in aggregate or analytic SQL statements in the same way as built-ins such as MAX, SUM or AVG.

We will begin the article with a brief overview of these functions and then see a "real-life" example of an aggregate that I wrote when SQL alone could not fulfil the business requirements.

overview of user-defined aggregate functions

User-defined aggregate functions are possible due to Oracle's Data Cartridge model which takes advantage of object types and other extensibility features. According to the online documentation a data-cartridge is "the mechanism for extending the capabilities of the Oracle server". What this actually means is that we can create a set of data rules inside one or more object types and "plug them into" the server to use as indexing schemes, aggregate functions or even CBO extensions. Oracle provides the development framework for us to do this and it is actually a lot more simple than it sounds (as we shall see).

With the development framework for data cartridges mapped out for us by Oracle, user-defined aggregate functions are actually very simple to create. We need two components to implement aggregate functions as follows.

an object type specification and body; and
a PL/SQL function.

There are actually other ways of creating user-defined aggregate functions (for example using Java or OCI directly), but we will concentrate purely on "standard" object types and PL/SQL.

As stated, there is a development framework for data cartridges (think of this as a template) provided by Oracle. This provides the structure of the object type and its methods (down to the detail of how we name them) and also defines how we create our PL/SQL function. For example, our object type will contain the following:

attribute(s) for holding state information. These can be of any existing built-in or user-defined datatype;
a mandatory ODCIAggregateInitialize static method to reset the state attributes at the start of an aggregation;
a mandatory ODCIAggregateIterate member method to apply each input value to the running aggregate value;
a mandatory ODCIAggregateTerminate member method to return the final result of the aggregate; and
an optional ODCIAggregateMerge member method, used to combine the results of more than one stream of aggregation (for example, with parallel query) before returning a result.

The PL/SQL function itself is very simple. It simply declares itself as an aggregate function and defines the object type that implements its rules.

The best way to visualise a user-defined aggregate function is through an example. In the following section we will see an aggregate function used to support the rules of a data-summary routine. This function is named "BREAKSUM".

implementing breaksum

The BREAKSUM function responds to a particular business problem that cannot be solved with SQL and analytic functions. The requirement is simply to keep a running total of account transactions but to reset the value to 0 whenever the addition of a negative transaction value makes the overall running total negative.

sample data

We will start by creating a sample table and data.

SQL> CREATE TABLE accounts ( account NUMBER, cycle NUMBER, val NUMBER(5,2) );

Table created.

SQL> INSERT ALL
  2     INTO accounts VALUES ( 999, 1, 0.11 )
  3     INTO accounts VALUES ( 999, 2, 0.18 )
  4     INTO accounts VALUES ( 999, 3, 0.27 )
  5     INTO accounts VALUES ( 999, 4, 0.35 )
  6     INTO accounts VALUES ( 999, 5, 0.52 )
  7     INTO accounts VALUES ( 999, 6, 0.61 )
  8     INTO accounts VALUES ( 999, 7, -1.51 )
  9     INTO accounts VALUES ( 999, 8, 0.63 )
 10     INTO accounts VALUES ( 999, 9, 92.00 )
 11     INTO accounts VALUES ( 999, 10, 88.00 )
 12     INTO accounts VALUES ( 999, 11, -400 )
 13     INTO accounts VALUES ( 999, 12, 0.8 )
 14  SELECT NULL
 15  FROM   dual;

12 rows created.

To help us to visualise the problem that the aggregate function will solve, we can query the data and keep a running total of the VAL column as follows.

SQL> SELECT account
  2  ,      cycle
  3  ,      val
  4  ,      SUM(val) OVER (ORDER BY cycle) AS running_total
  5  FROM   accounts;

   ACCOUNT      CYCLE        VAL RUNNING_TOTAL
---------- ---------- ---------- -------------
       999          1        .11           .11
       999          2        .18           .29
       999          3        .27           .56
       999          4        .35           .91
       999          5        .52          1.43
       999          6        .61          2.04
       999          7      -1.51           .53
       999          8        .63          1.16
       999          9         92         93.16
       999         10         88        181.16
       999         11       -400       -218.84
       999         12         .8       -218.04

12 rows selected.

Note the running total result at cycle 11. Our requirement specifies that the running total must reset to 0 at this point. It seems quite a simple requirement, but is actually very difficult to implement in a SQL statement that will perform well at high volume.

breaksum type specification

As stated earlier, to create an aggregate function we need an object type and a PL/SQL function. We can begin with the TYP_OBJ_BREAKSUM object type specification as follows. Note that we will code all four available ODCIAggregate methods in our supporting type (we will wish to enable the function for parallel query later on; hence we include the optional ODCIAggregateMerge function). We require only one attribute to maintain the running total for our ACCOUNTS.VAL column.

SQL> CREATE TYPE typ_obj_breaksum AS OBJECT
  2  (
  3    sum    NUMBER,
  4
  5    STATIC FUNCTION ODCIAggregateInitialize (
  6                    sctx IN OUT typ_obj_breaksum
  7                    ) RETURN NUMBER,
  8
  9    MEMBER FUNCTION ODCIAggregateIterate (
 10                    self  IN OUT typ_obj_breaksum,
 11                    value IN     NUMBER
 12                    ) RETURN NUMBER,
 13
 14    MEMBER FUNCTION ODCIAggregateTerminate (
 15                    self   IN  typ_obj_breaksum,
 16                    retval OUT NUMBER,
 17                    flags  IN  NUMBER
 18                    ) RETURN NUMBER,
 19
 20    MEMBER FUNCTION ODCIAggregateMerge (
 21                    self IN OUT typ_obj_breaksum,
 22                    ctx2 IN     typ_obj_breaksum
 23                    ) RETURN NUMBER
 24  );
 25  /

Type created.

The formats for the ODCIAggregate methods are fixed by the data cartridge framework that Oracle provides. We do not need to call any of these methods directly in any code we write, but we do need to set the datatypes to allow our object type to be passed between methods. Each method returns a success constant of type NUMBER.

We do, however, have to determine the attributes to include in the object type. Remember that these are state attributes used to store our intermediate program data for the duration of the aggregate call. The BREAKSUM function is simply a variation on a SUM aggregate so only requires a single numeric state attribute to hold a running total (this is the attribute named "sum" in our type specification).

breaksum type body

The type body is where we implement our aggregation rules for each of the methods. To make the implementation of BREAKSUM easier to follow, we will step through the code one function at a time, beginning with the initialisation method as follows.

SQL> CREATE TYPE BODY typ_obj_breaksum IS
  2
  3     STATIC FUNCTION ODCIAggregateInitialize (
  4                     sctx IN OUT typ_obj_breaksum
  5                     ) RETURN NUMBER IS
  6     BEGIN
  7       sctx := typ_obj_breaksum(0);
  8       RETURN ODCIConst.Success;
  9     END;
 10

As we are summing, initialisation is very simple; we start at 0. We do this by initialising an instance of TYP_OBJ_BREAKSUM as above. Note that the function returns a constant to indicate success to the calling context.

The ODCIAggregateIterate function that follows is where we code the majority of our logic. This is the method that instructs Oracle on how to aggregate the individual elements that are passed to it. In the case of BREAKSUM, each element is a number that must be added to keep a running sum. The rules for this sum dictate that if the running sum drops below 0, then it must be reset to 0. We can see this logic below.

 11     MEMBER FUNCTION ODCIAggregateIterate (
 12                     self  IN OUT typ_obj_breaksum,
 13                     value IN     NUMBER
 14                     ) RETURN NUMBER IS
 15     BEGIN
 16       self.sum := CASE
 17                      WHEN value >= 0
 18                      OR  (value < 0 AND self.sum + value > 0)
 19                      THEN self.sum + value
 20                      ELSE 0
 21                   END;
 22       RETURN ODCIConst.Success;
 23     END;
 24

The iterate method quite simply states that if the incoming value is positive or if the running sum added to a negative value remains positive, then continue with the addition. Otherwise, reset the running total to 0. This will be invoked when the incoming value is a negative number with a greater absolute value than the running total so far.

Next we can see the terminate method. This method is used to piece together the state attributes (in our case, just the "sum" attribute) and return a final value for the aggregation. Our logic is simple as we just need to return the sum value itself.

 25     MEMBER FUNCTION ODCIAggregateTerminate (
 26                     self   IN  typ_obj_breaksum,
 27                     retval OUT NUMBER,
 28                     flags  IN  NUMBER
 29                     ) RETURN NUMBER IS
 30     BEGIN
 31       retval := self.sum;
 32       RETURN ODCIConst.Success;
 33     END;
 34

This completes the mandatory methods for a user-defined aggregate function. We can optionally include an ODCIAggregateMerge method if we are likely to have more than one stream of aggregation (such as in parallel query). As its name suggests, the merge method is where we aggregate the results of two separate streams of processing. The merge logic for BREAKSUM is as follows.

 35     MEMBER FUNCTION ODCIAggregateMerge (
 36                     self IN OUT typ_obj_breaksum,
 37                     ctx2 IN     typ_obj_breaksum
 38                     ) RETURN NUMBER IS
 39     BEGIN
 40       self.sum := CASE
 41                      WHEN self.sum + ctx2.sum > 0
 42                      THEN self.sum + ctx2.sum
 43                      ELSE 0
 44                   END;
 45       RETURN ODCIConst.Success;
 46     END;
 47
 48  END;
 49  /

Type body created.

For some aggregations, the merge method can be quite complex, but for a simple addition like BREAKSUM, we simply need to repeat our iteration logic as above. The main difference is that the two values to be aggregated are both running totals, rather than a running total and iteration value as with the ODCIAggregateIterate method.

This completes our implementation type for BREAKSUM. We can see that it is very simple to code, especially given the fact that Oracle requires us to use a specific format and development framework.

breaksum aggregate function

Now we have our aggregation rules implemented via the TYP_OBJ_BREAKSUM type, we must create the SQL interface to this type. We do this with the following function declaration.

SQL> CREATE FUNCTION breaksum (input NUMBER) RETURN NUMBER
  2     PARALLEL_ENABLE
  3     AGGREGATE USING typ_obj_breaksum;
  4  /

Function created.

We can immediately see that there is no logic in our aggregate function. We simply declare it to be implemented by the TYP_OBJ_BREAKSUM type. Note in particular the following:

aggregate functions can only accept one parameter. In the case of BREAKSUM, this is a NUMBER (as we are adding account transactions), but this can be an object type if more than a single input value from each record needs to contribute to the aggregate result;
we have declared the function to be parallel-query enabled (hence the need for the merge method in the TYP_OBJ_BREAKSUM object type); and
we declare the function an aggregate using the AGGREGATE keyword and supply the name of the implementing object type with the USING [type_name] syntax.

using breaksum

We can now see the results of using BREAKSUM. We can begin with a simple aggregate query; i.e. a GROUP BY on the account number. We only have one account (with twelve cycles) in our sample data, so we will simply output the overall result of the BREAKSUM as follows.

SQL> SELECT account
  2  ,      BREAKSUM (val) AS end_of_cycle_total
  3  FROM   accounts
  4  GROUP  BY
  5         account;

   ACCOUNT END_OF_CYCLE_TOTAL
---------- ------------------
       999                 .8

1 row selected.

Note that user-defined aggregate functions are also enabled for use as analytics. We can take advantage of this to show the intermediate workings of BREAKSUM (i.e. the running total for each cycle) as follows.

SQL> SELECT account
  2  ,      cycle
  3  ,      val
  4  ,      BREAKSUM (val) OVER
  5            (PARTITION BY account ORDER BY cycle) AS running
  6  FROM   accounts;

   ACCOUNT      CYCLE        VAL    RUNNING
---------- ---------- ---------- ----------
       999          1        .11        .11
       999          2        .18        .29
       999          3        .27        .56
       999          4        .35        .91
       999          5        .52       1.43
       999          6        .61       2.04
       999          7      -1.51        .53
       999          8        .63       1.16
       999          9         92      93.16
       999         10         88     181.16
       999         11       -400          0
       999         12         .8         .8

12 rows selected.

Compare this to our original running total before we created BREAKSUM. We saw earlier that the running total at cycle 11 was negative and that BREAKSUM was needed to reset this total to 0. We can see above that BREAKSUM has done exactly this.

We can test BREAKSUM against multiple accounts. In the following example we add ten more accounts to our sample data, each with twelve cycles of data. We then aggregate each account using BREAKSUM.

SQL> INSERT INTO accounts
  2  SELECT MOD(ROWNUM,10)     AS account
  3  ,      MOD(ROWNUM,12)+1   AS cycle
  4  ,      DBMS_RANDOM.VALUE  AS val
  5  FROM   all_objects
  6  WHERE  ROWNUM <= 120;

120 rows created.

SQL> SELECT account
  2  ,      BREAKSUM (val) AS end_of_cycle_total
  3  FROM   accounts
  4  GROUP  BY
  5         account;

   ACCOUNT END_OF_CYCLE_TOTAL
---------- ------------------
         0               5.25
         1                7.1
         2               6.63
         3               4.34
         4               6.82
         5               5.11
         6                7.5
         7               5.67
         8               6.41
         9               5.87
       999                 .8

11 rows selected.

Finally, we can test our merge method by invoking parallel query. In the following example, we run the previous SQL statement with parallel query enabled and then compare the results with the serial output from above.

SQL> SELECT /*+ PARALLEL(accounts 2) */
  2         account
  3  ,      BREAKSUM (val) AS end_of_cycle_total
  4  FROM   accounts
  5  GROUP  BY
  6         account;

   ACCOUNT END_OF_CYCLE_TOTAL
---------- ------------------
         0               5.25
         2               6.63
         7               5.67
         8               6.41
         9               5.87
         1                7.1
         3               4.34
         4               6.82
         5               5.11
         6                7.5
       999                 .8

11 rows selected.

For a final "sanity check", we can confirm that we invoked parallel query as follows.

SQL> SELECT *
  2  FROM   v$pq_sesstat
  3  WHERE  statistic = 'Queries Parallelized';

STATISTIC                      LAST_QUERY SESSION_TOTAL
------------------------------ ---------- -------------
Queries Parallelized                    1             1

1 row selected.

source code

The source code for the examples in this article can be downloaded from here.

Adrian Billington, September 2003

oracle-developer.net