The question also involves a bit of a
bin-fitting problem, where the goal is to group
rows until some threshold is met and then
start a new group. This is quite challenging to
do in “pure SQL,” so I cheated a little with my
approach, which uses a small bit of PL/SQL.
To implement the PL/SQL pipelined function, I first need to create some object types.
These types, shown in Listing 1, describe the
“table” the function will return.
Next, I need the procedural code that
will return the result I want. Basically, I’ll
be sending this function a result set via a
SYS_REFCURSOR type. This result set will be
an ordered selection of the rows I want to see,
and it will be ordered by STUDY_SITE so that
I can process the data in ascending order—
because the answer requires that. I’ll also
pass in the “threshold” of 65,000 instead of
hard-coding 65,000 in the PL/SQL, and I’ll be
a little more flexible and make it a parameter.
Listing 2 shows the code I implemented.
As you can see, I just read the rows from
the SYS_REFCURSOR, and because my
running total exceeds the threshold that was
passed in, I reset the counters and increment the group number. The PIPE ROW() call
returns the data I am making up. (Pipelined
functions are great for implementing a tiny bit
of procedural processing in relational queries.)
To use this function and see the required
output, I would run the query in Listing 3.
That PL/SQL pipelined function works, but
as I mentioned earlier, there were also some
SQL solutions. The first I’ll use is the new
recursive subquery factoring. This is a new
feature in Oracle Database 11 g Release 2, and
I first looked at this feature when it came out:
bit.ly/t YWtBG. (Look for “Recursive Subquery
Factoring” for an explanation of how it works
and what it does.) Listing 4 contains the
recursive subquery approach.
Let’s look at this piece by piece. First there
is a factored subquery (available since Oracle9i
Database)—the “WITH DATA1 AS” part of the
query. This will take my set of data and assign
a row number from 1 to N to each row in the
base table after sorting it by STUDY_SITE.
Then I have the second factored subquery—REC—which is, in fact, a recursive
factored subquery, in that its definition
references itself. Note the second part of the
UNION ALL—it queries itself.
Here’s what the REC recursive factored
subquery does. When the first part of the
UNION ALL is executed, the subquery finds
the first record from DATA1 and outputs that
record into the result set. It then feeds that
single row it found into the second part of
the query; in short, it joins the first record to
the second record. If the value of R. TOTAL
(the first row’s total) plus the second row’s
total—D. TOT_REC—exceeds the threshold,
then the running total will reset and become
just D. TOT_REC. If R. TOTAL+D. TOT_REC
does not exceed that threshold, the running
total will not reset and will continue to be
R. TOTAL+D. TOT_REC. The FLG column—the
grouping column—will either increment if
R. TOTAL+D. TOT_REC exceeds its threshold or
stay the same if it does not.
Then the subquery takes this second fabricated record and joins it to the third record
in DATA1, performing the same operations.
After that the subquery joins the third record
to the fourth and so on until there are no
records left to process. When you execute
that query in full, it returns the same result
as the pipelined function.
However, before you get too excited
about this approach, think about the work it
requires. You have to take the entire result
set and sort it to assign the row numbers
(which is OK, because all approaches will
pretty much include a sort). Then you have
to query this subresult to find the first row,
scan it again to find the second row, scan
it again to find the third, and so on. Suffice
it to say, this query does a ton of work over
and over again; using it is not the most efficient approach. If you are interested in the
numbers, visit bit.ly/vJiMVw to see a tkprof
report of this query, demonstrating how
much work it performs.
The last approach, using the SQL MODEL
clause, was supplied by Jichao Li from Tianjun,
China. Li supplied the query in Listing 5.
Li used a threshold of 20,000 to demonstrate a different grouping, but the net result
is the same. (Simply replace or bind in a different value wherever you see 20,000 in this
query to change it.)
I used procedural code in PL/SQL to
answer the question, and Li used procedural
code—to a degree—directly in SQL to answer
the question. (See bit.ly/rZtatb for docu-
mentation on the SQL MODEL clause.) This
SQL query takes the same basic approach
as the solutions above but uses different
syntax. It starts by ordering the data by
STUDY_SITE, assigns a row number to each
row, and then uses that row number as an
index on the result set, retrieving specific
rows and columns (which is sort of like
writing functions in a spreadsheet—only this
spreadsheet is a result set). And it does the
same sort of processing as in the PL/SQL and
recursive subquery factoring solutions.
NEXT STEPS
Tom Kyte is a database
evangelist in Oracle’s
Server Technologies
division and has worked
for Oracle since 1993.
ASK Tom
Tom Kyte answers your most difficult technology
questions. Highlights from that forum appear in
this column.
asktom.oracle.com
READ more Tom
Oracle Database Concepts 11g Release 2 ( 11. 2)
bit.ly/aonqPP
Expert Oracle Database Architecture: Oracle
Database 9i, 10g, and 11g Programming
Techniques and Solutions, Second Edition
amzn.to/ckGXaR
READ more about
effective full-table scans
bit.ly/srjb0c
external tables
bit.ly/vYSoDo
Oracle Exadata
oracle.com/exadata
MODEL clause
bit.ly/rZtatb