customers_norm.dmp file produced
earlier. To expand the comparison, he
compressed the customers_norm.dmp
file with the regular gzip command:
gzip - 9 customers_norm.dmp
datapump_masking_pkg package. The
REMAP_DATA parameter value has the
format <TableName>.<ColumnName>:
<PackageName>.<FunctionName>. John
called the random_area_same_phone
function in the package:
The command created a compressed
file named customers_norm.dmp.gz.
John then compared the sizes of the
normal and different compressed files:
expdp sh/sh dumpfile=customers_remap
directory=tmp_dir tables=customers
remap_data=customers.cust_main_phone_
number:datapump_masking_pkg.random_
area_same_phone
Type of File
customers_norm.dmp
customers_comp.dmp
customers_norm.dmp.gz
Size
31.58MB
2.98MB
3.67MB
From John’s demonstration, everyone understood that Oracle Data Pump
compression reduced the original file to
almost the same size as the gzip compression (or even smaller, as shown
in John’s test). What’s more, the compression occurred when the dumpfile
was generated, eliminating a two-step
process and consequently saving valuable
storage. While importing, John used the
same impdp command he’d used earlier,
without any special parameters. The
Oracle Data Pump utility automatically
recognized the dumpfile as compressed,
uncompressed it inline, and imported it.
John pointed out that, by default,
Oracle Data Pump compresses metadata
while exporting. If that compression is
not needed, the COMPRESSION=NONE
setting disables metadata compression
during export.
In the datapump_masking_pkg
package, John included several functions
that use the DBMS_RANDOM package
to produce different patterns. The first
function—random_phone_same_area—
takes a phone number and keeps the area
code intact but randomizes the 7-digit
phone number. For instance, 123-456-
7890 becomes 123 followed by a random
string of 7 digits, for example, 895-5429.
The random_area_same_phone function does just the opposite—it keeps the
7-digit number but randomizes the area
code. The random_all function randomizes all 10 digits, so it produces a number
that resembles a real phone number but
has nothing in common with the original
value. Finally, the mask_all function does
not use DBMS_RANDOM and is very
different—it replaces all digits with X,
so, for example, 123-456-7890 becomes
XXX-XXX-XXXX.
In this case, the security team wanted
to randomize the area code and keep
the same phone number. John addressed
this by using the REMAP_DATA parameter in the expdp command to call the
The dumpfile generated by this
command does not have the actual data
from the cust_main_phone_number
column but, instead, the values supplied
by the random_area_same_phone function. Only that column will be modified
(or masked) in the dumpfile. All other
columns will be exported as is.
After importing the customers_
remap.dmp dumpfile to the target (in
this case, development) database, John
examined the value in one record:
SQL> select cust_main_phone_number
MASKING
Finally, John addressed the issue of PII
data, such as phone numbers, stored in
the database going to the dumpfile for
use in the development database. The
security department wanted the values
randomized instead of sent as is. To
demonstrate the data pump randomizing
solution, John first created a package,
datapump_masking_pkg, that produced
different patterns from the input data, as
shown in Listing 1. This simple package
includes several functions, each of which
accepts one parameter, transforms it as
directed by the corresponding code, and
returns the transformed value.
codeLISTING 1: Package for masking data
create or replace package datapump_masking_pkg
as
function random_phone_same_area (p_in varchar2) return varchar2;
function random_area_same_phone (p_in varchar2) return varchar2;
function random_all (p_in varchar2) return varchar2;
function mask_all (p_in varchar2) return varchar2;
end;
/
create or replace package body datapump_masking_pkg
as
function random_phone_same_area (p_in varchar2) return varchar2 is
begin
return
substr(p_in, 1, 4)||round( dbms_random.value (100,999))||’-’||
lpad(round( dbms_random.value ( 1,9999)), 4,’0’);
end;
function random_area_same_phone (p_in varchar2) return varchar2 is
begin
return round( dbms_random.value (100,999))||substr(p_in, 4, 12);
end;
function random_all (p_in varchar2) return varchar2 is
begin
return round( dbms_random.value (100,999))||’-’||
round( dbms_random.value (100,999))||’-’||
lpad(round( dbms_random.value ( 1,9999)), 4,’0’);
end;
function mask_all (p_in varchar2)
return varchar2 is
begin
return ‘XXX-XXX-XXXX’;
end;
end;
/