Database Application Developer and DBA
On Caching and
Evangelizing SQL
One of the talks I gave recently at the Oracle Benelux User Group (OBUG)
conference in Belgium was regarding techniques—some SQL tricks, if you will—you
can use when writing SQL. One of the
techniques I discussed at length was scalar
subqueries and how they can be used to
minimize the number of times a PL/SQL
function is called from SQL. This is important because the overhead of going from
SQL to PL/SQL over and over again can be
quite expensive, and the scalar subquery
caching database feature can dramatically
decrease this overhead.
What exactly is a scalar subquery? It
is a subquery in a SQL statement that
returns exactly one column and zero rows
or one row. That single column can be a
complex object type, so it can consist of
many attributes, but the subquery returns a
single scalar value (or NULL if the subquery
returns zero records). A scalar subquery
can be used anywhere a literal could have
been used. For example, you can obviously
create the query
select deptno, dname,
'Hello world'
from dept;
You can also (since Oracle8i Database
Release 8. 1. 5, anyway) include a scalar
subquery:
select deptno, dname,
(select count(*)
from emp
where emp.deptno = dept.deptno)
from dept;
Note that the complete last query is
semantically equivalent to
select dept.deptno, dept.dname,
count(emp.empno)
from dept left outer join emp
on ( dept.deptno = emp.deptno );
Regarding these last two queries, if your
goal is to optimize your query for initial
response time, you may want to use the
former query, with the scalar subquery,
because Oracle Database would have to
get just the first row from DEPT, run the
scalar subquery (select count…) against
EMP to get the count, and then return it.
The process for returning that first row
would be very fast. The latter query, which
includes an outer join, would likely require
two full table scans, a hash outer join, and
an aggregation—and require all these processes to be completed before the first row
could be returned.
So, now that you know what a scalar
subquery is, you need to understand the
scalar subquery caching feature of Oracle
Database. In the above query with a scalar
subquery, the database would actually
extract the scalar subquery and rewrite it to
use bind variables. In effect, the database
would be executing
(select count(*) from emp
where emp.deptno = ?)
for each row in the DEPT table. Given that
DEPTNO is unique in DEPT, the database
would have to physically execute that
scalar subquery for each row in DEPT. But
what if you are not querying the DEPT table
but, rather, another table that includes
a DEPTNO column? Perhaps a PROJECTS
table with records that have PROJECT_
NAME and DEPTNO columns? Then a query
such as
select project_name, deptno,
(select count(*)
from emp
where emp.deptno =
projects.deptno)
from projects;
will have to execute the scalar subquery at
least as many times as there are unique
DEPTNO values in the PROJEC TS table.
Note that I said, “at least as many times”—
the database does not have to execute
the scalar subquery for every row in the
PROJEC TS table if it caches some of the
results and reuses them, which is what
happens with scalar subquery caching.
When you’re using a scalar subquery,
Oracle Database will set up a small in-memory hash table for the subquery and
its results each time it runs the query. So,
when you run the previous query, Oracle
Database sets up in memory a hash table
that looks like this:
Select count(*) from emp where emp.deptno = :deptno
:deptno Count(*)
….
….
Oracle Database will use this hash table
to remember the scalar subquery and the
inputs to it—just :DEPTNO in this case—
and the output from it. At the beginning of
every query execution, this cache is empty,
but suppose you run the query and the first
PROJECTS row you retrieve has a DEPTNO
value of 10. Oracle Database will assign
the number 10 to a hash value between 1
and 255 (the size of the hash table cache in
Oracle Database 10g and Oracle Database
11 g currently) and will look in that hash table