The method_opt parameter here is key
for this example, and I used it to gather histograms for all the columns in the table. The
resulting histograms will contain 12 rows of
information for each column—for each of
the distinct A–L values I loaded. Because I
have 12 distinct values in each column, this
will provide a perfect picture of the count of
rows for each value.
Now I can use explain plan on a simple
query to see how many rows the optimizer
believes would be returned, given a predicate on each of the columns, as shown in
Listing 1.
As you can see, the optimizer is able to
very accurately estimate the cardinality
of each predicate as long as the string is
32 bytes or less. So in steps 3, 5, 7, and
9 in Listing 1, the estimated cardinality
is very exact: the estimate matches the
actual count(*) values. However, when the
optimizer gets to step 11, the estimated
cardinality goes awry. All of a sudden, the
optimizer starts guessing “every row will
be returned,” regardless of what the string
ends in. This is because the histogram
contains only the first 32 bytes of string
data, and when the optimizer gets to steps
11 through 17, the strings all start with 32 or
more ‘x’s. In the histogram, they all appear
to have the same value.
This does not render histograms useless.
They are very useful on most data you tend to
search on: numbers, dates, and strings less
than 33 bytes in length. If you have longer
strings, you will typically be performing a
text search on them with Oracle Text and the
CONTAINS operator.
You should be aware of this histogram
implementation, however, because it could
prompt you to store some data attributes
differently. For example, suppose you have a
system that stores URLs in a table—an audit
trail, for example. If you look at a URL such
as the ones you see on asktom.oracle.com,
you’ll discover that the first 39 characters of
my URL are constant—
http://asktom.oracle.com/pls/asktom/f?p
—regardless of what page you end up on
in my site. Having a histogram on that data
would not be very useful.
Code Listing 1: Checking the optimizer’s use of histograms on “long” columns
SQL> set autotrace on explain
SQL> select 'len30', count(*) from t where len30 = rpad('x', 30,'x')||'A'
2 union all
3 select 'len30', count(*) from t where len30 = rpad('x', 30,'x')||'B'
4 union all
5 select 'len31', count(*) from t where len31 = rpad('x', 31,'x')||'A'
6 union all
7 select 'len31', count(*) from t where len31 = rpad('x', 31,'x')||'B'
8 union all
9 select 'len32', count(*) from t where len32 = rpad('x', 32,'x')||'A'
10 union all
11 select 'len32', count(*) from t where len32 = rpad('x', 32,'x')||'B'
12 union all
13 select 'len33', count(*) from t where len33 = rpad('x', 33,'x')||'A'
14 union all
15 select 'len33', count(*) from t where len33 = rpad('x', 33,'x')||'B'
16 /
'LEN3 COUNT(*)
——————— ———————————————
len30 24307
len30 4308
len31 24307
len31 4308
len32 24307
len32 4308
len33 24307
len33 4308
8 rows selected.
Execution Plan
————————————————————————————————————————————————————————————————————————————
Plan hash value: 2994643399
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————
| 0 | SELECT STATEMENT | | 8 | 268 | 3079 (88)| 00:00: 37 |
| 1|UNION-ALL | | | | | |
| 2 | SORT AGGREGATE | | 1 | 32 | | |
|* 3 | TABLE ACCESS FULL| T | 24307 | 759K| 385 ( 1)| 00:00:05 |
| 4 | SORT AGGREGATE | | 1 | 32 | | |
|* 5 | TABLE ACCESS FULL| T | 4308 | 134K| 385 ( 1)| 00:00:05 |
| 6 | SORT AGGREGATE | | 1 | 33 | | |
|* 7 | TABLE ACCESS FULL| T | 24307 | 783K| 385 ( 1)| 00:00:05 |
| 8 | SORT AGGREGATE | | 1 | 33 | | |
|* 9 | TABLE ACCESS FULL| T | 4308 | 138K| 385 ( 1)| 00:00:05 |
| 10 | SORT AGGREGATE | | 1 | 34 | | |
|* 11 | TABLE ACCESS FULL| T | 71696 | 2380K| 385 ( 1)| 00:00:05 |
| 12 | SORT AGGREGATE | | 1 | 34 | | |
|* 13 | TABLE ACCESS FULL| T | 71696 | 2380K| 385 ( 1)| 00:00:05 |
| 14 | SORT AGGREGATE | | 1 | 35 | | |
|* 15 | TABLE ACCESS FULL| T | 71696 | 2450K| 385 ( 1)| 00:00:05 |
| 16 | SORT AGGREGATE | | 1 | 35 | | |
|* 17 | TABLE ACCESS FULL| T | 71696 | 2450K| 385 ( 1)| 00:00:05 |
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Predicate Information (identified by operation id):
————————————————————————————————————————————————————————————————————————————