user-defined aggregate functions in oracle 9i
I'm pleased to announce that I've been selected as a finalist in the inaugural Oracle Database Developer Choice Awards. To quote, these awards "celebrate and recognize technical expertise and contributions in the Oracle Database community".
I've been selected for the PL/SQL category, primarily for the articles and utilities that I share here on oracle-developer.net. I'm very pleased to be recognised by the Oracle Community in this way and if you have found oracle-developer.net to be a helpful and/or interesting resource, then I'd appreciate your vote (my voting page is here).
More information on the Awards, including an explanatory video, the rules, other categories and finalists can all be found on the main page.
Thanks for your support!
In this article, we will take a brief look at user-defined aggregate functions, a new feature of Oracle 9i Release 1 (9.0). This feature enables us to create our own aggregation rules and define them as a function for use in aggregate or analytic SQL statements in the same way as built-ins such as MAX, SUM or AVG.
We will begin the article with a brief overview of these functions and then see a "real-life" example of an aggregate that I wrote when SQL alone could not fulfil the business requirements.
overview of user-defined aggregate functions
User-defined aggregate functions are possible due to Oracle's Data Cartridge model which takes advantage of object types and other extensibility features. According to the online documentation a data-cartridge is "the mechanism for extending the capabilities of the Oracle server". What this actually means is that we can create a set of data rules inside one or more object types and "plug them into" the server to use as indexing schemes, aggregate functions or even CBO extensions. Oracle provides the development framework for us to do this and it is actually a lot more simple than it sounds (as we shall see).
With the development framework for data cartridges mapped out for us by Oracle, user-defined aggregate functions are actually very simple to create. We need two components to implement aggregate functions as follows.
- an object type specification and body; and
- a PL/SQL function.
There are actually other ways of creating user-defined aggregate functions (for example using Java or OCI directly), but we will concentrate purely on "standard" object types and PL/SQL.
As stated, there is a development framework for data cartridges (think of this as a template) provided by Oracle. This provides the structure of the object type and its methods (down to the detail of how we name them) and also defines how we create our PL/SQL function. For example, our object type will contain the following:
- attribute(s) for holding state information. These can be of any existing built-in or user-defined datatype;
- a mandatory ODCIAggregateInitialize static method to reset the state attributes at the start of an aggregation;
- a mandatory ODCIAggregateIterate member method to apply each input value to the running aggregate value;
- a mandatory ODCIAggregateTerminate member method to return the final result of the aggregate; and
- an optional ODCIAggregateMerge member method, used to combine the results of more than one stream of aggregation (for example, with parallel query) before returning a result.
The PL/SQL function itself is very simple. It simply declares itself as an aggregate function and defines the object type that implements its rules.
The best way to visualise a user-defined aggregate function is through an example. In the following section we will see an aggregate function used to support the rules of a data-summary routine. This function is named "BREAKSUM".
The BREAKSUM function responds to a particular business problem that cannot be solved with SQL and analytic functions. The requirement is simply to keep a running total of account transactions but to reset the value to 0 whenever the addition of a negative transaction value makes the overall running total negative.
We will start by creating a sample table and data.
SQL> CREATE TABLE accounts ( account NUMBER, cycle NUMBER, val NUMBER(5,2) );
SQL> INSERT ALL 2 INTO accounts VALUES ( 999, 1, 0.11 ) 3 INTO accounts VALUES ( 999, 2, 0.18 ) 4 INTO accounts VALUES ( 999, 3, 0.27 ) 5 INTO accounts VALUES ( 999, 4, 0.35 ) 6 INTO accounts VALUES ( 999, 5, 0.52 ) 7 INTO accounts VALUES ( 999, 6, 0.61 ) 8 INTO accounts VALUES ( 999, 7, -1.51 ) 9 INTO accounts VALUES ( 999, 8, 0.63 ) 10 INTO accounts VALUES ( 999, 9, 92.00 ) 11 INTO accounts VALUES ( 999, 10, 88.00 ) 12 INTO accounts VALUES ( 999, 11, -400 ) 13 INTO accounts VALUES ( 999, 12, 0.8 ) 14 SELECT NULL 15 FROM dual;
12 rows created.
To help us to visualise the problem that the aggregate function will solve, we can query the data and keep a running total of the VAL column as follows.
SQL> SELECT account 2 , cycle 3 , val 4 , SUM(val) OVER (ORDER BY cycle) AS running_total 5 FROM accounts;
ACCOUNT CYCLE VAL RUNNING_TOTAL ---------- ---------- ---------- ------------- 999 1 .11 .11 999 2 .18 .29 999 3 .27 .56 999 4 .35 .91 999 5 .52 1.43 999 6 .61 2.04 999 7 -1.51 .53 999 8 .63 1.16 999 9 92 93.16 999 10 88 181.16 999 11 -400 -218.84 999 12 .8 -218.04 12 rows selected.
Note the running total result at cycle 11. Our requirement specifies that the running total must reset to 0 at this point. It seems quite a simple requirement, but is actually very difficult to implement in a SQL statement that will perform well at high volume.
breaksum type specification
As stated earlier, to create an aggregate function we need an object type and a PL/SQL function. We can begin with the TYP_OBJ_BREAKSUM object type specification as follows. Note that we will code all four available ODCIAggregate methods in our supporting type (we will wish to enable the function for parallel query later on; hence we include the optional ODCIAggregateMerge function). We require only one attribute to maintain the running total for our ACCOUNTS.VAL column.
SQL> CREATE TYPE typ_obj_breaksum AS OBJECT 2 ( 3 sum NUMBER, 4 5 STATIC FUNCTION ODCIAggregateInitialize ( 6 sctx IN OUT typ_obj_breaksum 7 ) RETURN NUMBER, 8 9 MEMBER FUNCTION ODCIAggregateIterate ( 10 self IN OUT typ_obj_breaksum, 11 value IN NUMBER 12 ) RETURN NUMBER, 13 14 MEMBER FUNCTION ODCIAggregateTerminate ( 15 self IN typ_obj_breaksum, 16 retval OUT NUMBER, 17 flags IN NUMBER 18 ) RETURN NUMBER, 19 20 MEMBER FUNCTION ODCIAggregateMerge ( 21 self IN OUT typ_obj_breaksum, 22 ctx2 IN typ_obj_breaksum 23 ) RETURN NUMBER 24 ); 25 /
The formats for the ODCIAggregate methods are fixed by the data cartridge framework that Oracle provides. We do not need to call any of these methods directly in any code we write, but we do need to set the datatypes to allow our object type to be passed between methods. Each method returns a success constant of type NUMBER.
We do, however, have to determine the attributes to include in the object type. Remember that these are state attributes used to store our intermediate program data for the duration of the aggregate call. The BREAKSUM function is simply a variation on a SUM aggregate so only requires a single numeric state attribute to hold a running total (this is the attribute named "sum" in our type specification).
breaksum type body
The type body is where we implement our aggregation rules for each of the methods. To make the implementation of BREAKSUM easier to follow, we will step through the code one function at a time, beginning with the initialisation method as follows.
SQL> CREATE TYPE BODY typ_obj_breaksum IS 2 3 STATIC FUNCTION ODCIAggregateInitialize ( 4 sctx IN OUT typ_obj_breaksum 5 ) RETURN NUMBER IS 6 BEGIN 7 sctx := typ_obj_breaksum(0); 8 RETURN ODCIConst.Success; 9 END; 10
As we are summing, initialisation is very simple; we start at 0. We do this by initialising an instance of TYP_OBJ_BREAKSUM as above. Note that the function returns a constant to indicate success to the calling context.
The ODCIAggregateIterate function that follows is where we code the majority of our logic. This is the method that instructs Oracle on how to aggregate the individual elements that are passed to it. In the case of BREAKSUM, each element is a number that must be added to keep a running sum. The rules for this sum dictate that if the running sum drops below 0, then it must be reset to 0. We can see this logic below.
11 MEMBER FUNCTION ODCIAggregateIterate ( 12 self IN OUT typ_obj_breaksum, 13 value IN NUMBER 14 ) RETURN NUMBER IS 15 BEGIN 16 self.sum := CASE 17 WHEN value >= 0 18 OR (value < 0 AND self.sum + value > 0) 19 THEN self.sum + value 20 ELSE 0 21 END; 22 RETURN ODCIConst.Success; 23 END; 24
The iterate method quite simply states that if the incoming value is positive or if the running sum added to a negative value remains positive, then continue with the addition. Otherwise, reset the running total to 0. This will be invoked when the incoming value is a negative number with a greater absolute value than the running total so far.
Next we can see the terminate method. This method is used to piece together the state attributes (in our case, just the "sum" attribute) and return a final value for the aggregation. Our logic is simple as we just need to return the sum value itself.
25 MEMBER FUNCTION ODCIAggregateTerminate ( 26 self IN typ_obj_breaksum, 27 retval OUT NUMBER, 28 flags IN NUMBER 29 ) RETURN NUMBER IS 30 BEGIN 31 retval := self.sum; 32 RETURN ODCIConst.Success; 33 END; 34
This completes the mandatory methods for a user-defined aggregate function. We can optionally include an ODCIAggregateMerge method if we are likely to have more than one stream of aggregation (such as in parallel query). As its name suggests, the merge method is where we aggregate the results of two separate streams of processing. The merge logic for BREAKSUM is as follows.
35 MEMBER FUNCTION ODCIAggregateMerge ( 36 self IN OUT typ_obj_breaksum, 37 ctx2 IN typ_obj_breaksum 38 ) RETURN NUMBER IS 39 BEGIN 40 self.sum := CASE 41 WHEN self.sum + ctx2.sum > 0 42 THEN self.sum + ctx2.sum 43 ELSE 0 44 END; 45 RETURN ODCIConst.Success; 46 END; 47 48 END; 49 /
Type body created.
For some aggregations, the merge method can be quite complex, but for a simple addition like BREAKSUM, we simply need to repeat our iteration logic as above. The main difference is that the two values to be aggregated are both running totals, rather than a running total and iteration value as with the ODCIAggregateIterate method.
This completes our implementation type for BREAKSUM. We can see that it is very simple to code, especially given the fact that Oracle requires us to use a specific format and development framework.
breaksum aggregate function
Now we have our aggregation rules implemented via the TYP_OBJ_BREAKSUM type, we must create the SQL interface to this type. We do this with the following function declaration.
SQL> CREATE FUNCTION breaksum (input NUMBER) RETURN NUMBER 2 PARALLEL_ENABLE 3 AGGREGATE USING typ_obj_breaksum; 4 /
We can immediately see that there is no logic in our aggregate function. We simply declare it to be implemented by the TYP_OBJ_BREAKSUM type. Note in particular the following:
- aggregate functions can only accept one parameter. In the case of BREAKSUM, this is a NUMBER (as we are adding account transactions), but this can be an object type if more than a single input value from each record needs to contribute to the aggregate result;
- we have declared the function to be parallel-query enabled (hence the need for the merge method in the TYP_OBJ_BREAKSUM object type); and
- we declare the function an aggregate using the AGGREGATE keyword and supply the name of the implementing object type with the USING [type_name] syntax.
We can now see the results of using BREAKSUM. We can begin with a simple aggregate query; i.e. a GROUP BY on the account number. We only have one account (with twelve cycles) in our sample data, so we will simply output the overall result of the BREAKSUM as follows.
SQL> SELECT account 2 , BREAKSUM (val) AS end_of_cycle_total 3 FROM accounts 4 GROUP BY 5 account;
ACCOUNT END_OF_CYCLE_TOTAL ---------- ------------------ 999 .8 1 row selected.
Note that user-defined aggregate functions are also enabled for use as analytics. We can take advantage of this to show the intermediate workings of BREAKSUM (i.e. the running total for each cycle) as follows.
SQL> SELECT account 2 , cycle 3 , val 4 , BREAKSUM (val) OVER 5 (PARTITION BY account ORDER BY cycle) AS running 6 FROM accounts;
ACCOUNT CYCLE VAL RUNNING ---------- ---------- ---------- ---------- 999 1 .11 .11 999 2 .18 .29 999 3 .27 .56 999 4 .35 .91 999 5 .52 1.43 999 6 .61 2.04 999 7 -1.51 .53 999 8 .63 1.16 999 9 92 93.16 999 10 88 181.16 999 11 -400 0 999 12 .8 .8 12 rows selected.
Compare this to our original running total before we created BREAKSUM. We saw earlier that the running total at cycle 11 was negative and that BREAKSUM was needed to reset this total to 0. We can see above that BREAKSUM has done exactly this.
We can test BREAKSUM against multiple accounts. In the following example we add ten more accounts to our sample data, each with twelve cycles of data. We then aggregate each account using BREAKSUM.
SQL> INSERT INTO accounts 2 SELECT MOD(ROWNUM,10) AS account 3 , MOD(ROWNUM,12)+1 AS cycle 4 , DBMS_RANDOM.VALUE AS val 5 FROM all_objects 6 WHERE ROWNUM <= 120; 120 rows created. SQL> SELECT account 2 , BREAKSUM (val) AS end_of_cycle_total 3 FROM accounts 4 GROUP BY 5 account;
ACCOUNT END_OF_CYCLE_TOTAL ---------- ------------------ 0 5.25 1 7.1 2 6.63 3 4.34 4 6.82 5 5.11 6 7.5 7 5.67 8 6.41 9 5.87 999 .8 11 rows selected.
Finally, we can test our merge method by invoking parallel query. In the following example, we run the previous SQL statement with parallel query enabled and then compare the results with the serial output from above.
SQL> SELECT /*+ PARALLEL(accounts 2) */ 2 account 3 , BREAKSUM (val) AS end_of_cycle_total 4 FROM accounts 5 GROUP BY 6 account;
ACCOUNT END_OF_CYCLE_TOTAL ---------- ------------------ 0 5.25 2 6.63 7 5.67 8 6.41 9 5.87 1 7.1 3 4.34 4 6.82 5 5.11 6 7.5 999 .8 11 rows selected.
For a final "sanity check", we can confirm that we invoked parallel query as follows.
SQL> SELECT * 2 FROM v$pq_sesstat 3 WHERE statistic = 'Queries Parallelized';
STATISTIC LAST_QUERY SESSION_TOTAL ------------------------------ ---------- ------------- Queries Parallelized 1 1 1 row selected.
For more information on user-defined aggregate functions and other possibilities for extending Oracle's functionality, read the Data Cartridge Guide, including this section. Possibly the best-known example of a user-defined aggregate on the web is Tom Kyte's STRAGG function for concatenating strings from multiple rows. This link also includes a CONCAT_ALL function by James Padfield which is essentially a re-factored STRAGG but with additional flexibility over delimiters.
The source code for the examples in this article can be downloaded from here.
Adrian Billington, September 2003Back to Top