| Oracle9i Warehouse Builder User's Guide Release 2 (v9.0.2) Part Number A95949-01 |
|
This appendix describes how to retrieve and store data from XML documents in a target schema using the XML Toolkit. This toolkit can extract data from XML documents, which can reside in a variety of sources and formats, and store that data in Oracle8i/9i tables, CLOB database columns, Advanced Queues, and other Oracle8i/9i database objects.
An XML document conforms to the Extensible Markup Language (XML) specification. XML allows developers to design their own customized markup languages which can be optimized for delivery of documents on the World Wide Web and to support implementation of e-commerce and numerous other applications.
The XML Toolkit is a set of PL/SQL procedures, functions, and packages that you can use to retrieve and load data from XML documents into Oracle8i/9i database objects. For example, you can retrieve data from an XML document that resides in a file and load it into several tables that reside in a data warehouse.
You can retrieve data from XML documents that reside in the following:
You identify the source of the data to the XML Toolkit using an element name defined by the Toolkit. For example, you identify a file with the element name file. Subsequent sections and examples describe individual element names, their respective attributes (if any), and the required syntax.
The XML Toolkit can store data retrieved from XML documents in a variety of Oracle8i/9i database objects. The Toolkit can store data in the following objects:
You identify the target for the data using an element name defined by the Toolkit. For example, you identify a target table with the element name target.
The element name target has several attributes that provide control over runtime operations. For example, if the data extracted from the XML document does not match the target table, you can specify an XSL style sheet as an attribute that reformats the data accordingly. Subsequent sections and examples describe the individual element names, their respective attributes, and the required syntax.
During the extraction and load process, you can control when to commit the load as well as control the size of individual batches. These controls are specified using the element name runtimeConfig.
The XML Toolkit is implemented as a set of PL/SQL procedures and functions. To use the Toolkit, you must first create a Warehouse Builder transformation that invokes one of its procedures or functions. After the transformation has been created, a mapping can call the transformation using a pre-map or post-map trigger.
A typical scenario would be to create a transformation that extracts and loads data from an XML document into a staging table. The transformation could be generalized by referencing the XML document as a runtime parameter. After you create the transformation, you could then create a mapping that uses a pre-map trigger to call the transformation to load the staging table. The mapping could then transform the data in the staging table and load it into a target table. A post-map trigger could in turn truncate the staging table before the mapping terminates.
A transformation can invoke the XML Toolkit using one of the following entry points into the API:
Both of these calls extract and load data from XML documents into database targets. The function, however, returns the number of documents read during the operation. The control file, itself an XML document, specifies the source of the XML documents, the targets, and any runtime controls.
After the transformation has been defined, a mapping typically calls the transformation as a pre-map or post-map trigger.
The following example illustrates a script that can be used to implement an Warehouse Builder transformation which extracts data from an XML document stored in the file products.xml and loads it into the target table books.
|
|
begin |
|
|
|
|
|
|
|
|
|
|
'<OWBXMLRuntime>' |
|| |
|
|
|
|| |
|
|
|
|
'</OWBXMLRuntime>'); |
|
|
|
end; |
|
|
The control file itself is an XML document, and the element name OWBXMLRuntime defines the top-level, or root element, of this document. The remaining element names which define the document sources and targets are self-explanatory.
After you create and name this transformation, you can reference it by name in a mapping. The most common reference for the transformation would be in a pre-map or post-map trigger.
The nine example control files in this section are well-formed and valid control files that extract and load data from the following sources into a database target:
The first five examples extract data from XML documents that reside in files; the remaining cases extract data from database objects or a document addressed by a URL. The examples cover all the sources that are managed by the XML Toolkit.
The XML Toolkit can extract data from documents stored in a single file or in multiple files when they reside in the same directory. The Toolkit can also extract data from these sources and store it into multiple tables.
Often, the data in an XML document fails to match the target object in which case you can reformat the data to match the target by including an XSL style sheet. The second example shows you how to reference an XSL style sheet by specifying a value for the XSLFile attribute for the target. The use of style sheets to reformat the source data is probably the most common case as the data from XML documents rarely match the column names of target tables in a data warehouse.
Finally, you can improve load efficiency for large XML documents by splitting the documents into parts and performing a separate load operation for each part. This is accomplished by specifying a value for the splitElement attribute for the source file.
When the element names of an XML document exactly match the column names in a target table and the document resides in a file, you can easily create a control file to extract and load data from the document into the target table.
The XML document below (ORDERS) resides in a file named \ora817\examples\ex1.xml. This file name is relative to the Oracle8i/9i database instance.
<ORDERS>
<?xml version="1.0"?>
<ROW>
<ID>100</ID>
<ORDER_DATE>2000.12.20</ORDER_DATE>
<SHIPTO_NAME>Jeff Q. Vintner</SHIPTO_NAME>
<SHIPTO_STREET>500 Marine World Parkway</SHIPTO_STREET>
<SHIPTO_CITY>Redwood City</SHIPTO_CITY>
<SHIPTO_STATE>CA</SHIPTO_STATE>
<SHIPTO_ZIP>94065</SHIPTO_ZIP>
</ROW>
</ORDERS>
The column names in the target table (PURCHASE_ORDERS) described by the DDL below match the element names of the XML document described above.
create table Purchase_Orders (
id varchar2(10) not null,
order_date date not null,
shipto_name varchar2(60) not null,
shipto_street varchar2(80) not null,
shipto_city varchar2(30) not null,
shipto_state varchar2(2) not null,
shipto_zip varchar2(9))
The control file below directs the XML Toolkit to extract and load the data from the ORDERS document into the Purchase_Orders table.
'<OWBXMLRuntime>'||
'<XMLSource>'||
'<file>\ora817\examples\ex1.xml</file>'||
'</XMLSource>'||
'<targets>'||
'<target dateFormat="yyyy.MM.dd">Purchase_Orders</target>'||
'</targets>'||
'</OWBXMLRuntime>'
The dateFormat attribute defined for the target element is necessary so that the XML SQL Utility (XSU) can correctly store the text data in the table in the order_date column which has a data type of DATE. Refer to the Oracle8i/9i XML Reference for additional information.
From the XSL processor, you can store the stored in various modes. Each example is a control file. When a transformation uses the wb_xml_load procedure of the wb_xml_load_f function to call the XML Toolkit, it must include a complete control file.
The control file is an XML document that describes the sources, the targets, and any runtime controls. This section uses several examples that show you how to define control files for most situations.
If you create a transformation using this control file and call the transform from a pre-map trigger, the following actions occur:
This is a simple case where the document and the table match up exactly. The next example shows you handle the more common case inexact matches.
When the element names in the XML document fail to match the column names in the target column, you must include an XSL style sheet that reformats the data before it is loaded into the target table. This example shows you how to reformat the data by specifying a style sheet using the XSLFile attribute. The control file extracts and reformats the data before loading it into the PURCHASE_ORDERS table.
The XML document below (purchaseORDER) resides in the file \ora817\examples\ex2.xml. This file name is relative to the Oracle8i/9i database instance.
<purchaseOrder>
<id>103123-4</id>
<orderDate>2000-10-20</orderDate>
<shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Mill Valley</city>
<state>CA</state>
<zip>90952</zip>
</shipTo>
<comment>Hurry, my lawn is going wild!</comment>
<items>
<item>
<partNum>872-AA</partNum>
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item>
<item>
<partNum>845-ED</partNum>
<productName>Baby Monitor</productName>
<quantity>1</quantity>
<USPrice>39.98</USPrice>
<shipDate>1999-05-21</shipDate>
</item>
</items>
</purchaseOrder>
The column names in the target table (PURCHASE_ORDERS) are described "The Target Table".
The control file below directs the XML Toolkit to extract and reformat the data from the purchaseORDER document before loading it into the target table PURCHASE_ORDERS.
'<OWBXMLRuntime>'||
' <XMLSource>'||
' <file>\ora817\examples\ex2.xml</file>'||
' </XMLSource>'||
' <targets>'||
' <target XSLFile="\ora817\examples\ex2_1.xsl" dateFormat="yyyy-MM-dd">'||
' Purchase_Orders||'
' </target>'||
' </targets>'||
'</OWBXMLRuntime>'
The only addition to this control file is the XSLFile attribute for the element target.
The style sheet itself is a straightforward XML document that associates a parsed data item from the XML document with a column name in the target table. The following snippet from the style sheet shows how it associates a parsed data item with a column in the target:
<SHIPTO_NAME>
<xsl:value-of select="shipTo/name"/>
</SHIPTO_NAME>
<SHIPTO_STREET>
<xsl:value-of select="shipTo/street"/>
</SHIPTO_STREET>
<SHIPTO_CITY>
<xsl:value-of select="shipTo/city"/>
</SHIPTO_CITY>
<SHIPTO_STATE>
<xsl:value-of select="shipTo/state"/>
</SHIPTO_STATE>
This control statement could now be used in a transform to extract data from the XML document and load it into the target table.
If you create a transformation using this control file and call the transform from a pre-map trigger, the following actions occur:
The Transformation then returns control to the mapping.
This control statement extracts and reformats data from the purchaseORDER document and stores it into two target tables: PURCHASE_ORDERS and ITEMS. The tables have different column names, and so a style sheet must be specified for each target.
The XML script for the previous example describes the source document (purchaseORDER).
The column names for Purchase_Orders are described; the column names for items are not required to understand the control file script.
The control file below directs the XML Toolkit to extract and reformat the data from the purchaseORDER document before loading it into two target tables.
'<OWBXMLRuntime>'||
' <XMLSource>'||
' <file>\ora817\examples\ex2.xml</file>'||
' </XMLSource>'||
' <targets>'||
' <target XSLFile="\ora817\examples\ex2_1.xsl" dateFormat="yyyy-MM-dd">'||
' Purchase_Orders||'
' </target>'||
' <target XSLFile="\ora817\examples\ex2_2.xsl" dateFormat="yyyy-MM-dd">'||
' Items||'
' </target>'||
' </targets>'||
'</OWBXMLRuntime>'
This control file is like the previous example except that it specifies an additional target (ITEMS), and a style sheet (ex2_2.xsl) which associates parsed data items from the document with columns in the Items table.
When you extract data from a very large XML document, the memory requirements for the processing can impact load performance. In this case, you can often reduce the memory requirements and improve the efficiency of the operation by dividing the XML documents into multiple parts.
To divide an XML document into parts, specify a value for the splitElement attribute of the XMLSource element. The value is the name of an element within the source XML document, preferably one that delineates blocks of text.
A good example is an XML document that organizes books into categories using the element name Category. There are many categories, and each category defines numerous books. Thus, if you specify the splitElement attribute as in the example below, then the XML Toolkit divides the source document into as many parts as there are categories.
The control file below directs the XML Toolkit to extract the data from the BookAreUs document by dividing the document according to its catalogs, and then load each part into the target table books.
'<OWBXMLRuntime>'||
' <XMLSource splitElement="Category">'||
' <file>\ora817\examples\ex4.xml</file>'||
' </XMLSource>'||
' <targets>'||
' <target XSLFile="ora817\examples\ex4.xsl">books</target>'||
' </targets>'||
'</OWBXMLRuntime>'
How well this operation improves efficiency depends on the selection of the split element. If the split element defines only a few parts, then memory resources may not be reduced enough to improve performance.
The XML Toolkit can extract data from documents stored as a database object or at a location defined by a URL. The examples in this section illustrate control files for these cases.
This example control file loads the PURCHASE_ORDERS table with data extracted from an XML document that is stored in a table (clientOrders) as a CLOB column.
The clientOrders table contains detailed information about each order, including multimedia information. The order itself is stored in the xmlOrder column, a voice recording of the order is stored in the voiceOrder column, and a picture of the client who made the order is stored in the clientOrder column.
The XML document which is stored in the first row of the clientOrders table is exactly the same as the order described in the first example. The complete XML document that defines the order is described.
The control file below directs the XML Toolkit to extract data from the document that resides in the first row of the clientOrders table and load it into the Purchase_Orders table.
'<OWBXMLRuntime>'||
'<XMLSource>'||
'<CLOB whereClause="where id=''1''">'||
'<table>clientOrders</table>'||
'<CLOBColumn>xmlOrder</CLOBColumn>'||
'</CLOB>'||
'</XMLSource>'||
'<targets>'||
'<target dateFormat="yyyy.MM.dd">Purchase_Orders</target>'||
'</targets>'||
'</OWBXMLRuntime>'
A few comments are in order regarding this control file, especially regarding the values assigned to attributes.
CLOB whereClause="where id=''1''"
The ID column of the clientOrders table identifies each row.
This example control file loads the PURCHASE_ORDERS table with data extracted from XML documents that are placed on an Oracle8i/9i Advanced Queue (AQ).
An e-commerce application can use an Advanced Queue to automate the acceptance, processing, routing, and completion of orders. You can configure an AQ to remain active and wait for entries to be placed on the queue and also specify a maximum wait time before the queue deactivates itself.
The following control file loads the PURCHASE_ORDERS table with data extracted XML documents that are periodically placed on newOrders, an Object-Based AQ. The control file specifies that the queue wait for new entries and specifies a time-out value in seconds or forever. The CLOBColumn attribute indicated the column in the object type containing the XML to be loaded.
For additional information on Advanced Queues, refer to Oracle8i/9i Application Developer's Guide - Advanced Queuing.
The XML documents which are periodically placed on the newOrders queue is exactly the same as the order described in the first example. The complete XML document that defines the order is described.
The control file below directs the XML Toolkit to extract data from documents that are periodically place on the newOrdersAQ and store them into the PURCHASE_ORDERS table.
'<OWBXMLRuntime>'||
'<XMLSource>'||
'<AQ wait="WAIT" waitTime="WAIT_FOREVER">'||
'<AQName>newOrders</AQName>'||
'</AQ>'||
'</XMLSource>'||
'<targets>'||
'<target dateFormat="yyyy.MM.dd">Purchase_Orders</target>'||
'</targets>'||
'</OWBXMLRuntime>');
The waitTime attribute is configured so that a dequeue call wait until an entry is available on the queue before it is released.
In this example, a control file loads the PURCHASE_ORDERS table with data extracted from XML documents that are placed on an Oracle8i/9i Advanced Queue (AQ).
An e-commerce application can use an Advanced Queue to automate the acceptance, processing, routing, and completion of orders. You can configure an AQ to remain active and wait for entries to be placed on the queue and also specify a maximum wait time before the queue deactivates itself.
The following control file loads the PURCHASE_ORDERS table with data extracted XML documents that are periodically placed on newOrders, a RAW AQ. The control file specifies that the queue wait for new entries and specifies a time-out value in seconds or forever. The next example shows how to rewrite the control file for a queue that resides in an object based Advanced Queue.
For additional information on Advanced Queues, refer to Oracle8i/9i Application Developer's Guide - Advanced Queuing.
The XML documents which are periodically placed on the newOrdersAQ queue is exactly the same as the order described in the first example. The complete XML document that defines the order is described.
The control file below directs the XML Toolkit to extract data from documents that are periodically place on the newOrdersAQ Advanced Queue and store them into the Purchase_Orders table.
'<OWBXMLRuntime>'||
'<XMLSource>'||
'<AQ wait="WAIT" waitTime="20">'||
'<AQName>newOrdersAQ</AQName>'||
'<ObjectType>xmlMessages</ObjectType>'||
'<CLOBColumn>xmlEntry</CLOBColumn>'||
'</AQ>'||
'</XMLSource>'||
'<targets>'||
'<target dateFormat="yyyy.MM.dd">Purchase_Orders</target>'||
'</targets>'||
'</OWBXMLRuntime>');
The waitTime attribute specifies that a dequeue call made on an empty queue will wait a maximum of twenty seconds for a document to be placed on the queue.
This example control file loads the Books table with data extracted from a document produced by a URL.
'<OWBXMLRuntime>'||
'<XMLSource>'||
'<URL parameterTable="ptable">'||
' <![CDATA[http://oracle.com//xmlserv/library]]>'||
'</URL>'||
'</XMLSource>'||
'<targets>'||
'<target XSLFile="\ora817\examples\lib.xsl" ignoreCase="TRUE">Books'||
'</target>'||
'</targets>'||
'</OWBXMLRuntime>'
The targets element includes two attributes: one for a style sheet to match the source element names with the target column names, and another to ignore case. The parameter table is used when dynamically generating XML, the column names of the parameter table are used as parameter names, and the data as parameter values. Only VARCHAR2, VARCHAR, CHAR, and NUMBER columns are used to form dynamic parameters.
The control file for the XML Toolkit is an XML document, and the next two sections describe all of its elements. The first section informally describes each element that can be used to create a control file; the second section is the Document Type Definition (DTD) of the control file, which formally defines the control file structure.
The following description of the control file elements is divided into three parts: elements that describe the source for XML documents, elements that describe the target of the load operation, and elements that determine the runtime environment for the XML Toolkit.
This table describes all the elements used to define sources of XML documents and their respective attributes.
This table describes all the elements used to define the load targets and their respective attributes. You can define multiple targets for the data parsed from a set of XML documents.
This table describes elements that determine the runtime environment.
| Name | Description |
|---|---|
|
runtimeConfig |
Specifies the runtime configuration for the XML Toolkit. If TRUE, then issue commit at the end of the load. |
The Document Type Definition below describes all the possible elements which may occur in a control file for the XML Toolkit, their respective attributes, their ordering, and the conditions of their use.
<!ELEMENT OWBXMLRuntime (XMLSource, targets, runtimeConfig?)>
<!ELEMENT XMLSource ((file | directory | URL | CLOB | AQ ),DOMParserConfig?)>
<!ATTLIST XMLSource splitElement CDATA #IMPLIED>
<!ELEMENT file (#PCDATA)>
<!ELEMENT directory (#PCDATA)>
<!ATTLIST directory mask CDATA "*.xml">
<!ELEMENT CLOB (table, CLOBColumn)>
<!ELEMENT table (#PCDATA)>
<!ATTLIST CLOB whereClause CDATA #IMPLIED>
<!ELEMENT URL (#PCDATA)>
<!ATTLIST URL proxy CDATA #IMPLIED
proxyPort CDATA "80"
parameterTable CDATA #IMPLIED
lowerParameterNames (TRUE | FALSE) "TRUE">
<!ELEMENT AQ (AQName, (ObjectType, CLOBColumn)?)>
<!ELEMENT AQName (#PCDATA)>
<!ELEMENT ObjectType (#PCDATA)>
<!ELEMENT CLOBColumn (#PCDATA)>
<!ATTLIST AQ
consumerName CDATA #IMPLIED
dequeueMode (REMOVE | BROWSE | LOCKED) "REMOVE"
navigation (NEXT_MESSAGE | NEXT_TRANSACTION | FIST_MESSAGE
"NEXT_MESSAGE"
visibility (ON_COMMIT | IMMEDIATE) "IMMEDIATE"
wait (WAIT_FOREVER | WAIT_NONE | WAIT) "WAIT_FOREVER"
waitTime CDATA #IMPLIED
messageID CDATA #IMPLIED
correlation CDATA #IMPLIED>
<!ELEMENT targets (target+)>
<!ELEMENT target (#PCDATA)>
<!ATTLIST target
truncateFirst (TRUE | FALSE) "TRUE"
truncateBeforeEachLoad (TRUE | FALSE) "FALSE"
XSLFile CDATA #IMPLIED
XSLRefURL CDATA #IMPLIED
loadType (INSERT| UPDATE | DELETE) "INSERT"
ignoreCase (TRUE | FALSE) "TRUE"
commitBatch CDATA "0"
rowTag CDATA "ROW"
dateFormat CDATA "MM/dd/yyyy HH:mm:ss"
batchSize CDATA "17"
keyColumnList NMTOKENS #IMPLIED
updateColumnList NMTOKENS #IMPLIED>
<!ELEMENT runtimeConfig EMPTY>
<!ATTLIST
runtimeConfig
commitAfterLoad (TRUE | FALSE) "TRUE"
batchSize CDATA "10">
<!ELEMENT DOMParserConfig EMPTY>
<!ATTLIST DOMParserConfig
showWarnings (TRUE | FALSE) "FALSE"
retainCDATASection (TRUE | FALSE) "FALSE"
debugMode (TRUE | FALSE) "FALSE"
preserveWhitespace (TRUE | FALSE) "FALSE"
validationMode (TRUE | FALSE) "FALSE"
baseURL CDATA #IMPLIED>
The XML Toolkit employs several Oracle8i/9i XML SDK software components. For more information, refer to the following:
Web sites that contain information on the XML specification are:
http://www.w3.org/XML/
http://www.XML.com
|
|
![]() Copyright © 1996, 2002 Oracle Corporation. All Rights Reserved. |
|