Database Application Developer and DBA
ASK TOM B Y TOM K Y TE
On Better Loading and
Fact-Checking
Our technologist enjoys the rule of external
tables and debunks index scan myths.
ORACLE DATABASE 11g
I’m working to improve the efficiency of an existing process that starts with staging an
input file (generated by a client program) for
the database. In some situations, the input
file can include 20,000,000 lines or more,
but with one-third or so of those lines being
of no use as input.
At present all the lines are transferred (as
rows in a staging table), with unneeded lines
pruned at the beginning of processing with
a DELETE. The DELETE obviously generates a
lot of redo and undo and takes some time.
Further, we’re transferring lines we don’t
even need to bother with in the first place.
I’m trying to modify the SQL*Loader
control file to discard these unneeded
records to save both the transfer time and
the pruning time. However, I’m running up
against SQL*Loader’s lack of an OR in its condition processing. I’ve tried structuring the
control file to perform multiple tests, so that
each successful test would result in a row’s
being sent to the database. My results show
that nearly all rows are being rejected.
My questions:
1. What’s wrong with my control file (see
below)?
2. Is there a better way to do this?
Here’s the staging table:
create table staging
( rectype char( 1) not null,
id number not null,
name varchar2( 64) not null,
val varchar2(256)
);
Here’s my sample input file, inputdata.csv:
4, 1,"rn",""
4, 2,"rn","nonnull"
4, 3,"rd",""
4, 4,"rd","nonnull"
4, 5,"ac","NO"
4, 6,"ac","YES"
4, 7,"ie","nonnull"
4, 8,"at","nonnull"
4, 9,"ms"," 1. 23"
4, 10,"mb"," 1. 56"
4, 11,"ms",".99"
Here’s my control file:
LOAD DATA REPLACE
INTO TABLE staging
WHEN NAME = 'rn'
AND VAL <> ''
FIELDS TERMINATED B Y ','
OP TIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE, ID, NAME, VAL )
INTO TABLE staging
WHEN NAME = 'rd'
AND VAL <> ''
FIELDS TERMINATED B Y ','
OP TIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( REC T YPE, ID, NAME, VAL )
… other INTO clauses ….
Well, I’ll point out two things here. First:
SQL*Loader documentation states that
SQL*Loader doesn’t “start over” with multiple INTO clauses. That is, the first INTO
clause will get the record and start processing
it, and the second INTO clause will get the
same record and pick up processing where
the first INTO clause stopped. The process
does not go back to column 1. In short, the
documentation ( bit.ly/fmauzu) states
The important point in this example is
that the second empno field is found
immediately after the first ename,
although it is in a separate INTO TABLE
clause. Field scanning does not start
over from the beginning of the record
for a new INTO TABLE clause. Instead,
scanning continues where it left off.
So a quick fix for you would be to use
POSITION( 1) in your control file for each of
the REC T YPE attributes after the first INTO
clause. That is, just change REC T YPE in the
control file to RECT YPE POSITION( 1). Once I
made that change, your data loaded right up.
The second thing I’d like to point out is
that as far as I’m concerned, SQL*Loader
is dead—because external tables rule. The
SQL*Loader control file syntax is very inflexible compared to simple SQL. Also, in your
example, you are loading a staging table,
which likely will be further processed and
placed into “real” tables. Therefore, you are
Taking a flat file and reading it
1.
Loading it into a table
2.
Reading that table
3.
Ultimately loading the table from Steps 2
4.
and 3 into other tables
With external tables you can simplify that
process to
Taking a flat file and reading it (as a table)
1.
Ultimately loading the “table” from
2.
Step 1 into other tables.
You can skip major pieces of your current
processing, including loading a table temporarily and rereading all the data. Let’s
take your control file and use it to create an
external table definition to see how easy this
would all be if you used an external table
instead of SQL*Loader. If I take your control
file and remove all but one of the INTO
LOAD DATA
REPLACE