ASK TOM
THE RIGHT NUMBER OF PARTITIONS
I have a table with about one million records,
but the table has the potential to grow to three
million or more records. I am planning to partition this table, using hash partitioning on the
most queried column, but before I do that, I
have a few questions:
1. I have read somewhere that due to the
hashing algorithm Oracle Database uses, it is
better to use a power-of- 2 number of partitions. Is that correct?
2. Is there a recommendation on how many
partitions are optimal for a given number of
records? What would you take into consideration
when deciding on the number of partitions?
3. Do you have any other recommendations?
First, when you’re using hash partitioning, it is paramount to use a power
of 2—2, 4, 8, 16, 32, 64 . . . —and no
other numbers in between. If you do not,
the data will be skewed across partitions
with most of the data in the “middle”
partitions and little data in the “end”
partitions. Also, a hash key should have
lots and lots of distinct values, or else
you won’t achieve even distribution
of data. In Listing 1, I create a four-partition table and a five-partition table
using hash partitions. Note that the data
distribution is clearly skewed in the five-partition table.
As for the question about determining the right number of partitions for a
number of records, the determination
is not record-driven. A million records
might be stored in 10MB, 100MB, or
1TB, so determining the number and
size of partitions is about volume. You
have to ask yourself
O What is the biggest segment I want
to have?
O What am I trying to accomplish with
partitioning?
The answers to these questions will
drive your partition sizes and your partitioning scheme.
You need to first understand why you
are partitioning. Then, and only then,
apply partitioning in a manner that will
achieve what you want.
When you apply partitioning, make
sure you understand the ramifications
of doing so and consider each and every
index independently of every other
index. For example, suppose you have
codeLISTING 1: Four partitions (OK) versus five partitions (not OK)
SQL> CREATE TABLE t1 ( x )
2 PARTITION BY hash(x)
3 ( partition part1 ,
4 partition part2 ,
5 partition part3 ,
6 partition part4
7 )
8 as
9 select rownum from all_objects;
Table created.
SQL> CREATE TABLE t2 ( x )
2 PARTITION BY hash(x)
3 ( partition part1 ,
4 partition part2 ,
5 partition part3 ,
6 partition part4,
7 partition part5
8 )
9 as
10 select rownum from all_objects;
Table created.
SQL> select h,
2 count(*) cnt,
3 sum(count(*)) over () totcnt,
4 substr( rpad(‘*’,100,’*’), 1,
5 100*ratio_to_report(count(*)) over ()) hist
6 from
7 (select 1 h from t1 partition (part1)
8 union all select 2 h from t1 partition(part2)
9 union all select 3 h from t1 partition(part3)
10 union all select 4 h from t1 partition(part4)
11 ) group by h order by h
12 /
H CNT TOTCNT HIST
-------------- ----------------- ------------------ -------------------------------------------------------
1 12409 49877************************
2 12413 49877************************
3 12697 49877************************
4 12358 49877************************
SQL> select h,
2 count(*) cnt,
3 sum(count(*)) over () totcnt,
4 substr( rpad(‘*’,100,’*’), 1,
5 100*ratio_to_report(count(*)) over ()) hist
6 from
7 (select 1 h from t2 partition (part1)
8 union all select 2 h from t2 partition(part2)
9 union all select 3 h from t2 partition(part3)
10 union all select 4 h from t2 partition(part4)
11 union all select 5 h from t2 partition(part5)
12 ) group by h order by h
13 /
H CNT TOTCNT HIST
-------------- ----------------- ------------------ -------------------------------------------------------
1 6234 49883
2 12413 49883************************
3 12698 49883************************
4 12358 49883************************