There are security implications to consider when using the
PREPROCESSOR clause. See Oracle Database Security Guide for more information.
If the file you want to load contains data records that are not in a format supported by the
ORACLE_LOADER access driver, then use the
PREPROCESSOR clause to specify a user-supplied preprocessor program that will execute for every data file. Note that the program specification must be enclosed in a shell script if it uses arguments (see the description of "file_spec").
The preprocessor program converts the data to a record format supported by the access driver and then writes the converted record data to standard output (stdout), which the access driver reads as input. The syntax of the
PREPROCESSOR clause is as follows:
Specifies the directory object containing the name of the preprocessor program to execute for every data file. The user accessing the external table must have the
EXECUTE privilege for the directory object that is used. If
directory_spec is omitted, then the default directory specified for the external table is used.
For security reasons, Oracle strongly recommends that a separate directory, not the default directory, be used to store preprocessor programs. Do not store any other files in the directory in which preprocessor programs are stored.
The preprocessor program must reside in a directory object, so that access to it can be controlled for security reasons. The OS system manager must create a directory corresponding to the directory object and must verify that OS-user ORACLE has access to that directory. DBAs must ensure that only approved users are allowed access to the directory object associated with the directory path. Although multiple database users can have access to a directory object, only those with the
EXECUTE privilege can run a preprocessor in that directory. No existing database user with read-write privileges to a directory object will be able to use the preprocessing feature. DBAs can prevent preprocessors from ever being used by never granting the
EXECUTE privilege to anyone for a directory object.
Oracle Database SQL Language Reference for information about granting the
The name of the preprocessor program. It is appended to the path name associated with the directory object that is being used (either the
directory_spec or the default directory for the external table). The
file_spec cannot contain an absolute or relative directory path.
If the preprocessor program requires any arguments (for example,
gunzip -c), then you must specify the program name and its arguments in an executable shell script (or on Windows operating systems, in a batch (.bat) file). Shell scripts and batch files have certain requirements, as discussed in the following sections.
It is important to verify that the correct version of the preprocessor program is in the operating system directory.
The following is an example of specifying the
PREPROCESSOR clause without using a shell or batch file:
SQL> CREATE TABLE xtab (recno varchar2(2000)) 2 ORGANIZATION EXTERNAL ( 3 TYPE ORACLE_LOADER 4 DEFAULT DIRECTORY data_dir 5 ACCESS PARAMETERS ( 6 RECORDS DELIMITED BY NEWLINE 7 PREPROCESSOR execdir:'zcat' 8 FIELDS (recno char(2000))) 9 LOCATION ('foo.dat.gz')) 10 REJECT LIMIT UNLIMITED; Table created.
Using Shell Scripts With the PREPROCESSOR Clause on Linux Operating Systems
The shell script must reside in
The full path name must be specified for system commands such as
The preprocessor shell script must have EXECUTE permissions.
The data file listed in the external table
LOCATION clause should be referred to by
The following example shows how to specify a shell script on the
PREPROCESSOR clause when creating an external table.
SQL> CREATE TABLE xtab (recno varchar2(2000)) 2 ORGANIZATION EXTERNAL ( 3 TYPE ORACLE_LOADER 4 DEFAULT DIRECTORY data_dir 5 ACCESS PARAMETERS ( 6 RECORDS DELIMITED BY NEWLINE 7 PREPROCESSOR execdir:'uncompress.sh' 8 FIELDS (recno char(2000))) 9 LOCATION ('foo.dat.gz')) 10 REJECT LIMIT UNLIMITED; Table created.
Using Batch Files With The PREPROCESSOR Clause on Windows Operating Systems
The batch file must reside in
The full path name must be specified for system commands such as
The preprocessor batch file must have EXECUTE permissions.
The first line of the batch file should contain
@echo off. The reason for this requirement is that when the batch file is run, the default is to display the commands being executed, which has the unintended side-effect of the echoed commands being treated as input to the external table access driver.
To represent the input from the location clause,
%1 should be used. (Note that this differs from Linux-style shell scripts where the location clause is referenced by
A full path should be specified to any executables in the batch file (
sed.exe in the following example). Note also that the MKS Toolkit may not exist on all Windows installations so commands such as
sed.exe may not be available.
The batch file used on Windows must have either a
.cmd extension. Failure to do so (e.g. trying to specify the preprocessor script as sed.sh) will result in the following error:
SQL> select * from foo ; select * from foo * ERROR at line 1: ORA-29913: error in executing ODCIEXTTABLEFETCH callout ORA-29400: data cartridge error KUP-04095: preprocessor command C:/Temp\sed.sh encountered error "CreateProcess Failure for Preprocessor: C:/Temp\sed.sh, errorcode: 193
The following is a simple example of using a batch file with the external table
PREPROCESSOR option on Windows. In this example a batch file uses the stream editor (sed.exe) utility to perform a simple transformation of the input data.
create table deptxt ( deptno char(2), dname char(14), loc char(13) ) organization external ( type ORACLE_LOADER default directory def_dir1 access parameters ( records delimited by newline badfile 'deptxt.bad' logfile 'deptxt.log' preprocessor exec_dir:'sed.bat' fields terminated by ',' missing field values are null ) location ('deptxt.dat') ) reject limit unlimited ; select * from deptxt ; Where deptxt.dat contains: 10,ACCOUNTING,NEW YORK 20,RESEARCH,DALLAS 30,SALES,CHICAGO 40,OPERATIONS,BOSTON
The preprocessor program
sed.bat has the following content:
@echo off c:/mksnt/mksnt/sed.exe -e 's/BOSTON/CALIFORNIA/' %1
PREPROCESSOR option passes the input data (
sed.bat. If you then select from the
deptxt table, the results show that the
LOC column in the last row, which used to be
BOSTON, is now
SQL> select * from deptxt ; DE DNAME LOC -- -------------- ------------- 1a ACCOUNTING NEW YORK 20 RESEARCH DALLAS 30 SALES CHICAGO 40 OPERATIONS CALIFORNIA 4 rows selected.
External tables treats each data file specified on the
LOCATION clause as a single granule. To make the best use of parallel processing with the
PREPROCESSOR clause, the data to be loaded should be split into multiple files (granules). This is because external tables limits the degree of parallelism to the number of data files present. For example, if you specify a degree of parallelism of 16, but have only 10 data files, then in effect the degree of parallelism is 10 because 10 slave processes will be busy and 6 will be idle. It is best to not have any idle slave processes. So if you do specify a degree of parallelism, then ideally it should be no larger than the number of data files so that all slave processes are kept busy.
Oracle Database VLDB and Partitioning Guide for more information about granules of parallelism