Previous     Contents     Index     Next     
iPlanet Market Maker 4.5 Catalog Import Guide



Chapter 2   Representing Catalog Information


This chapter describes the conventions the Import utility uses to represent the catalog ontologies and do the mapping.

The following sections are contained in this chapter:

Chapter 1 "Catalog Concepts" described the concepts that relate to mapping the categories from seller catalogs to the Public Master catalog (PMC). You need to be familiar with these concepts before you can prepare to import your catalog. The next step is to understand the conventions the Import utility uses to represent the catalog ontologies and do the mapping.

Each catalog ontology is a hierarchy of categories and sub-categories. The Import utility uses the extensible markup language (XML) to represent the ontologies of a seller catalog and the PMC.



Representing Catalogs



A seller catalog is represented in two files. The first file contains the unstructured catalog data, which is the standard iPlanet Market Maker catalog input. The second file provides the structural information to interpret the data in the first file. Table 2-1 describes these files.

Table 2-1    Seller Catalog Files

Seller Catalog File

Format

Content

Character-separated fields (CSF)  

Text  

Contains the catalog data in which the hierarchy is represented as a table. Each row is a line and each column is field. Each field is separated by a special character, such as a tilde ("~").  

Column structure specification (CSS)  

XML  

Contains the structural information that the Import utility uses to interpret the catalog data in the CSF file.  

The result of interpreting the CSF and CSS files is a load file that the Import utility uses to create the catalog database for an iPlanet Market Maker marketplace. The seller catalog categories are represented by XML elements in the load file. The XML category names are derived from the seller catalog category names. The XML element hierarchy represents the seller catalog hierarchy.

The load file contains the most compact representation of the seller catalog categories. The default load file name is mm.xml.


Creating the Seller Catalog and PMC in XML

There are two steps to convert the seller catalog data to an XML file that the Import utility can load into the iPlanet Market Maker catalog database. The first step is to create an XML representation of the seller catalog using the CSF and CSS files. The Import utility creates this XML representation in a file called vendor.xml.

The second step is to apply ontology mapping and attribute normalization rules to the vendor.xml file. The result of this step is an mm.xml file. The Import utility uses UTF8 encoding to process the vendor.xml and mm.xml files.

Figure 2-1 summarizes the process of creating the XML files.

Figure 2-1    Flow to Create the Seller Catalog and PMC in XML Format



Adding Column Structure to the CSF File

An example of a CSF file is show in Figure 2-2. Note that this text file does not contain any structural information.

Figure 2-2    CSF File Example



Hard Drives~SCSI~1234452~case~15000.00~9.1 Gig HD~MassFastSCSI~add

Hard Drives~EIDE~1234472~case~15000.00~9.1 Gig HD~MassFastSCSI~add


The Import utility treats each row in the CSF file as an item. The columns define the categories and attributes. To define the column structure of the CSF file in Figure 2-2, you need to determine which columns represent the categories and attributes. Figure 2-3 shows an example of how you might apply column names to a CSF file.

Figure 2-3    CSF File Annotated with Structural Information

To represent this structure in XML format, it is useful to think of the column structure in terms of a table, such as the example shown in Table 2-2. Note that the "Key" column identifies the SKU attribute as the unique identifier for the items. You must define at least one attribute as a key in the CSS file, but you can use multiple columns to define a key.


Table 2-2    Table Format Representation of the CSF File

Column Name

Column Number

Column Type

Key

Hard Drives  

1  

Category  

 

SCSI/EIDE  

2  

Sub-category  

 

SKU  

3  

Attribute  

X  

Unit of Measurement (UOM)  

4  

Attribute  

 

Price  

5  

Attribute  

 

Description  

6  

Attribute  

 

Product Name  

7  

Attribute  

 

Action column  

8  

 

 

To implement a column structure for the CSF file such as the one shown in Table 2-2, you create a CSS file in XML format. The CSS file must conform to the ColumnStructureSpecification.dtd file in the <imm_install_dir>/catalogtools/dtd directory.

Figure 2-4 shows an example of a CSS file that describes the column structure of the CSF file shown in Figure 2-2. The columns are numbered from left to right. Each column specification defines a category or attribute, except for column 8 (see the following section).


Adding, Deleting, and Updating Items

Column 8 in Table 2-2 is an action column that defines which action to take when loading this item in the PMC. The result of the "add" action is an item defined in the PMC just as it was defined in the load file. You can also specify "delete" or "update" in an action column to delete an item or update an existing item in the database.

You can specify only one action per column row. If you do not specify an action column, the default action is to add items.

Suppose, for example, that you define an "add" action for item "x" in load file, and item "x" already exists in the PMC with the same key as defined in the load file. As a result of the "add" action, the attributes of item "x" are changed in the PMC as necessary so that they match the attributes of item "x" in the load file. For more information about how the actions work, see Table 2-3.


Table 2-3    Action Results

Action

                                     Result

Comments

PMC Contains an Item with the Same Key Defined in the Load File

PMC Does Not Contain an Item with the Same Key Defined in File

add  

Item attributes in the PMC are replaced by the item attributes in the load file.  

A new item is created in the PMC.  

As a result of the "add" action, the PMC contains an item with the identical load file attribute names, numbers, and units. This means that exiting attributes in the PMC can be added, deleted, and updated to match those in the load file.  

delete  

Item is deleted from the PMC.  

An error message  

 

update  

Item attributes in the PMC are updated to match the attributes in the load file.  

An error message  

Only the attribute values and units changed in the load file are updated in the PMC. Existing attributes in the PMC are not added or deleted.  

Figure 2-4    CSS File Example

<?xml version="1.0" ?>

<!DOCTYPE column-specifications SYSTEM

"/netscape/server4/iMM/catalogtools/dtd/ColumnStructureSpecification.dtd">

<column-specifications>

       <column-specification number="1" >

               <category level="1" />

       </column-specification>

       <column-specification number="2" >

               <category level="2" />

       </column-specification>

       <column-specification number="3" >

        <value key="yes"><name><fixed>SKU</fixed></name></value>

       </column-specification>

       <column-specification number="4" >

               <value><name><fixed>UOM</fixed></name></value>

       </column-specification>

       <column-specification number="5" >

               <value>

                <name><fixed>Price</fixed></name>

               </value>

       </column-specification>

       <column-specification number="6" >

               <value>

                <name><fixed>Description</fixed></name>

               </value>

       </column-specification>

       <column-specification number="7" >

               <value>

                <name><fixed>Product Name</fixed></name>

               </value>

       </column-specification>

       <column-specification number="8" >

               <action />

       </column-specification>

</column-specifications>


Figure 2-5 shows a graphical representation of the hierarchy structure implemented by the CSS file in Figure 2-4.

Figure 2-5    Seller Catalog Hierarchy Structure


Using Column References

The CSS file shown in Figure 2-4 specifies fixed columns, which means that the Import utility interprets all the attribute values or units in a column the same way for every row. But there are cases when the attribute names or units in a column are different in each row, such as when there are different currencies for different items. When the attribute names or units are different in each row, you use a column reference.

Suppose, for example, that you wanted to specify different currency types for the items shown in Figure 2-6.

Figure 2-6    CSF File Example with Different Currency Types



         1    2      3     4       5          6        7          8

Hard Drives~SCSI~1234452~case~15000.00~9.1 Gig HD~MassFastSCSI~USD

Hard Drives~EIDE~1234472~case~45000.00~9.1 Gig HD~MassFastSCSI~JPY


Assume that in Figure 2-6 column 5 contains the values of the "Price" attribute, and column 8 contains the units (currencies) for this same attribute. See Figure 2-7.

Figure 2-7    CSF File with Structural Information

Figure 2-8 shows the section of the CSS file entry to specify that the prices in column 5 have their units (USD for U.S. dollars, JPY for Japanese yen) defined in column 8.

Figure 2-8    Column Reference Section of a CSS File

<column-specification number="5" >

  <value>

      <name><fixed>Price<\fixed><\name>

      <unit><column-ref column-number="8"/></unit>

  <value>

</column-specification>



Creating the Seller Catalog

The seller catalog created from processing the CSF and CSS file is a vendor.xml file. This file maps categories to XML elements. If the category names contain spaces or special characters, those characters are replaced with an underscore ("_") character. See Figure 2-9.

Figure 2-9    Seller Catalog Represented in the vendor.xml File




XML Representation Issues



In the process of representing a seller catalog in XML, the following two issues can arise.

  • Sibling categories with unique names in the seller catalog are renamed to have identical names in the vendor.xml file. This renaming causes what is known as a name clash.

  • When two or more categories in the vendor.xml file have identical names, the result is what is known as a path clash. This situation can occur because two categories in the CSF file have the same name, or as a consequence of renaming categories in the vendor.xml file.

To identify clashes, run the Import utility with the CHECK option. If your catalog has clashes, this option creates name_clashes.xml and path_clashes.xml report files that contain instructions for resolving the clashes. In the report files, the names with clash problems are identified with a <tag-name> tag.


Detecting Clashes

The following two types of clashes can occur.

  • Name clashes

    Occurs when sibling categories are renamed to the same name. For a list of special characters that cause this renaming, see the following URL.

    http://www.w3.org/TR/1998/REC-xml-19980210.html#NT-Name

  • Path clashes

    Occurs when categories have the same name, but their "child" categories have different names.


Resolving Name Clashes

If there are name clashes in your hierarchy, you need to resolve them to create the PMC.

Figure 2-10 shows an example of a hierarchy with a name clash, where the names "Y*" and "Y?" create a name clash.

Figure 2-10    Name Clash


By default, the "Y*" and "Y?" categories are renamed to the same category in the vendor.xml file. See Figure 2-11 and Figure 2-12.

Figure 2-11    Result of Name Clash


Figure 2-12    The vendor.xml File with a Name Clash

<?xml version="1.0" encoding="UTF-8"?>

<root>

   <X vortex-type="category" name="X">

     <Y_ vortex-type="category" name="Y*">

        <Item vortex-type="item" action="add">

         <Attribute vortex-type="attribute" value="1" name="SKU" key="yes" />

        </Item>

     </Y_>

   </X>

   <X vortex-type="category" name="X">

     <Y_ vortex-type="category" name="Y?">

        <Item vortex-type="item" action="add">

         <Attribute vortex-type="attribute" value="2" name="SKU" key="yes" />

       </Item>

     </Y_>

   </X>

</root>


If name clashes exist, the Import utility creates a report file called name_clashes.xml. This file contains an XML description of the name clashes that you can use as a template file to resolve the clashes. Figure 2-13 shows an example of such a file, edited to perform the renaming shown in Figure 2-14. The <tag-name> tags that define the renamed names are in bold in Figure 2-13.

Figure 2-13    Edited Name Clash File

<?xml version="1.0" encoding="UTF-8"?>

    <names>

      <name>

          <category>Y*</category>

          <tag-name>Y_star</tag-name>

      </name>

      <name>

          <category>Y?</category>

          <tag-name>Y_q</tag-name>

      </name>

    </names>



Figure 2-14    Name Clash Resolution


Figure 2-15 shows the vendor.xml file with resolved name clashes.

Figure 2-15    The vendor.xml File with a Resolved Name Clash

<?xml version="1.0" encoding="UTF-8"?>

<root>

  <X vortex-type="category" name="X">

    <Y_star vortex-type="category" name="Y*">

      <Item vortex-type="item" action="add">

        <Attribute vortex-type="attribute" value="1" name="SKU" key="yes" />

      </Item>

    </Y_star>

  </X>

  <X vortex-type="category" name="X">

    <Y_q vortex-type="category" name="Y?">

    <Item vortex-type="item" action="add">

        <Attribute vortex-type="attribute" value="2" name="SKU" key="yes" />

      </Item>

    </Y_q>

  </X>

</root>



Resolving Path Clashes

If there are path clashes in your hierarchy, you need to resolve them to create an mm.xml file you can load into the catalog database. To find out if there are path clashes, always specify the CHECK option the first time you run the Import utility.

Figure 2-16 shows an example of a path clash.

Figure 2-16    Path Clash


In Figure 2-16, the two "parent" categories named "Y" have different sets of "child" categories. Under XML validation rules, the two "Y" categories contain the "child" categories "A" and "B". As a result of validation, the "Y" categories are considered to be the same category represented as <Y> in the vendor.xml file shown in Figure 2-12.

Figure 2-17    The vendor.xml File with a Path Clash

<?xml version="1.0" encoding="UTF-8"?>

<root>

  <X vortex-type="category" name="X">

    <Y vortex-type="category" name="Y/">

      <A vortex-type="category" name="A">

        <Item vortex-type="item" action="add">

          <Attribute vortex-type="attribute" value=" 1" name="SKU" key="yes" />

        </Item>

      </A>

    </Y>

   </X>

  <X vortex-type="category" name="X">

    <Z vortex-type="category" name="Z">

      <Y vortex-type="category" name="Y?">

        <B vortex-type="category" name="B">

           <Item vortex-type="item" action="add">

             <Attribute vortex-type="attribute" value="2" name="SKU" key="yes" />

           </Item>

        </B>

      </Y>

    </Z>

  </X>

</root>


The category hierarchy in Figure 2-17 is defined too broadly. To apply strict XML validation, the "Y" categories need to be renamed so that they are unique. For example, see Figure 2-19.

If path clashes exist, the Import utility creates a report file called path_clashes.xml. This file contains an XML description of the path clashes that you can use as a template file to resolve the clashes. Figure 2-18 shows an example of such a file, edited to perform the renaming shown in Figure 2-19. The <tag-name> tags that define the renamed names are in bold in Figure 2-18.

Figure 2-18    Edited Path Clash File

<?xml version="1.0" encoding="UTF-8"?>

<paths>

     <path-clash clashing-tag="Y">

        <path-ref>

            <path>/root/X/Y</path>

            <tag-name>Y_slash</tag-name>

        </path-ref>

        <path-ref>

            <path>/root/X/Z/Y</path>

                        <tag-name>Y_star</tag-name>

        </path-ref>

     </path-clash>

</paths>



Figure 2-19    Path Clash Resolution


Figure 2-20 shows the vendor.xml file with resolved path clashes.

Figure 2-20    The vendor.xml file with Resolved Path Clashes

<?xml version="1.0" encoding="UTF-8"?>

<root>

  <X vortex-type="category" name="X">

    <Y_slash vortex-type="category" name="Y/">

      <A vortex-type="category" name="A">

        <Item vortex-type="item" action="add">

          <Attribute vortex-type="attribute" value=" 1" name="SKU" key="yes" />

        </Item>

      </A>

    </Y_slash>

  </X>

  <X vortex-type="category" name="X">

    <Z vortex-type="category" name="Z">

      <Y_star vortex-type="category" name="Y?">

        <B vortex-type="category" name="B">

          <Item vortex-type="item" action="add">

            <Attribute vortex-type="attribute" value="2" name="SKU" key="yes" />

           </Item>

        </B>

      </Y_star>

    </Z>

  </X>

</root>



Defining the Ontology Mapping

The "Mapping Catalog Hierarchies" section in Chapter 1 describes the concept of mapping items from a seller catalog to the PMC. Seller ontologies often vary from seller to seller, but the PMC has only one ontology. The PMC ontology is likely to be different from a seller catalog. So a mapping mechanism is necessary to ensure that a seller catalog ontology conforms to the PMC ontology. The instructions to do this mapping are described in an ontology mapping description file named omd.xml.

When the ontology in the PMC is the same as that of a seller catalog, the Import utility can generate the omd.xml file automatically. The Import utility stills needs to do the mapping in this case because the load file is a compact version of the catalog input data, and the load program uses this compact version to optimize processing.

When the ontology of the PMC is different from that of a seller catalog, you must create the file omd.xml manually.


Changing the Ontology Mapping

The Import utility creates an ontology mapping file, omd.xml, that assumes the mm.xml and vendor.xml ontologies are identical. Figure 2-21 shows an example of a vendor.xml ontology.

Figure 2-21    A vendor.xml Hierarchy Example


Figure 2-22 shows the resulting omd.xml file that the Import utility creates. The categories defined in the omd.xml file are the categories that the Import utility creates in the mm.xml file. The <path> tags tells the Import utility where in the vendor.xml file to find the items to be contained in a category in the mm.xml file.

In Figure 2-22, for example, there is a category named "SCSI":

<SCSI vortex-type="category" name="SCSI">

In this example, the "SCSI" category in the vendor.xml file also appears in the mm.xml file. The <path> tag defines the path to locate the items (in this case only one item) in the vendor.xml file to be contained in the "SCSI" category in the mm.xml file. The "Hard_Drives" category is the root category.

<path>/root/Hard_Drives/SCSI/Item</path>

Figure 2-22    Resulting omd.xml Ontology Mapping File

<?xml version="1.0" encoding="UTF-8"?>

<root version="1.0">

    <Hard_Drives vortex-type="category" name="Hard Drives">

         <path>/root/Hard_Drives/Item</path>

         <SCSI vortex-type="category" name="SCSI">

             <path>/root/Hard_Drives/SCSI/Item</path>

         </SCSI>

          <EIDE vortex-type="category" name="EIDE">

             <path>/root/Hard_Drives/EIDE/Item</path>

          </EIDE>

    </Hard_Drives>

</root>


When the ontologies in the vendor.xml and mm.xml files are different, you need to edit the omd.xml file. Figure 2-23 illustrates such a case.

Figure 2-23    Different vendor.xml and mm.xml Hierarchies

In Figure 2-23, the items in vendor.xml are mapped to a different ontology in mm.xml in which there is only one category. This means that the "SCSI" and "EIDE" categories are not in the mm.xml file. The items under these two categories need to be mapped to the "Hard Drives" category. To do this mapping, you need to edit the omd.xml file.

Figure 2-24 shows the edited version of the omd.xml file needed to map the items in the vendor.xml file to the mm.xml file. Note that there is only a "Hard Drives" category. The <path> tags tell the Import utility to take the items under the "SCSI" and "EDIE" categories in the vendor.xml file and map them to the "Hard Drives" category in the mm.xml file.

<path>/root/Hard_Drives/SCSI/Item/</path>

<path>/root/Hard_Drives/EIDE/Item/</path>

Figure 2-24    PMC omd.xml Ontology Mapping File

<?xml version="1.0" encoding="UTF--8"?>

<root version="1.0"

    <Hard_Drives vortex-type="category" name="Hard Drives">

             <path>/root/Hard_Drives/SCSI/Item/</path>

             <path>/root/Hard_Drives/EIDE/Item/</path>

    </Hard_Drives>

</root>


As a result of ontology mapping, the mm.xml file defines the hierarchy of category, item, and attribute elements. Figure 2-25 shows an example of an mm.xml file. The elements are highlighted in bold.

Figure 2-25    Sample mm.xml File

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE vortex-data-load SYSTEM "file:////usr/netscape/server4/iMM/labs/mm.dtd">
<vortex-data-load version="1.0">
    <Hard_Drives name="Hard Drives" vortex-type="category">
        <Item vortex-type="item" action="add">
            <Attribute vortex-type="attribute" value="1234452" name="SKU" key="yes"/>
           <Attribute vortex-type="attribute" value="case" name="UOM" key="no"/>
            <Attribute vortex-type="attribute" value="15000.00" dataType="currency" name="Price" key="no"/>
            <Attribute vortex-type="attribute" value="9.1 Gig HD" name="Description" key="no"/>
            <Attribute vortex-type="attribute" value="MassFastSCSI" name="Name" key="no"/>
        </Item>
        <Item vortex-type="item" action="add">
            <Attribute vortex-type="attribute" value="1234472" name="SKU" key="yes"/>
            <Attribute vortex-type="attribute" value="case" name="UOM" key="no"/>
            <Attribute vortex-type="attribute" value="15000.00" dataType="currency" name="Price" key="no"/>
            <Attribute vortex-type="attribute" value="9.1 Gig HD" name="Description" key="no"/>
            <Attribute vortex-type="attribute" value="MassFastSCSI" name="Name" key="no"/>
        </Item>
    </Hard_Drives>
</vortex-data-load>



Using the Human-Readable Difference Report Generator (HRDRG)



The human-readable difference report generator (HRDRG) provides a set of utilities, which can be customized by iPlanet Professional Services, to build a system that generates HTML reports showing the minimum set of meaningful differences between two CSF files. Suppose, for example, that a seller changes an existing catalog and submits the changes to a marketmaker. The marketmaker might want to review the changes in relation to the existing catalog to find out which items have been added, replaced, updated, or deleted. The HRDRG scripts provide a framework to allow the changes to be displayed in an HTML browser. These scripts are in the following directory.

<imm_install_dir>/catalogtools/bin

The hrdrg script runs the full_diff utility and the CSFParser utility. See Figure 2-26. The full_diff utility produces an annotated CSF file that describes the changes to the existing CSF file. The CSFParser utility converts the annotated CSF file to XML format. For more information about these utilities, see the "Inputs to the HRDRG Flow" section.

Figure 2-26    HRDRG Flow


Inputs to the HRDRG Flow

There are two inputs to the HRDRG flow: the diff.csf file and the index file. The diff.csf file contains the differences between an existing CSF file and a new CSF file. There are a number of ways to create the diff.csf file. One way is to keep the changes made in the existing .csf file in a separate diff.csf file. You can also provide an existing and new index as input to the difference utility to create the diff.csf file.

The index file is the indexed representation of the existing .csf file.

The full_diff utility reads the diff.csf file and the index file and creates a .csf file similar to the .csf file that the difference utility creates. Figure 2-27 shows the input to the HRDRG flow.

Figure 2-27    HRDRG Input

Unlike the .csf file that the difference utility creates, the .csf file output by the full_diff utility lists the items that have been added, replaced, updated, or deleted. For each item changed in the existing catalog, the full_diff utility adds a corresponding new line that notes the type of change. For example, see Figure 2-28.

Figure 2-28    A full_diff Utility .csf Example

Hard Drives~SCSI~1234472~case~15000.00~11 Gig HD~MassFastSCSI~update

Hard Drives~SCSI~1234472~case~15000.00~9.1 Gig HD~MassFastSCSI~updateold


Table 2-4 shows the types of changes that can appear in the .csf file generated by the full_diff utility.


Table 2-4    The full_diff Utility Keywords

Keyword  

Description  

add  

Specifies that a new item is added to the existing .csf file.  

delete  

Specifies that a new item is deleted from the existing .csf file.  

deleteincorrect  

Specifies that a new item is specified to be deleted but is not in the existing .csf file.  

update  

Specifies that a new item is updated the existing .csf file.  

updateold  

Specifies the existing item to be updated in the existing .csf file.  

updateincorrect  

Specifies that a new item is specified to be updated but is not in the existing .csf file.  

replace  

Specifies that a new item is replaced in the existing .csf file.  

replaceold  

Specifies the existing item to be replaced in the existing .csf file.  

replaceincorrect  

Specifies that a new item is specified to be replaced but is not in the existing .csf file.  


Output of the HRDRG Flow

From the annotated CSF file created by the full_diff utility, the hrdrg utility creates a vendor.xml file with the same changes propagated from the annotated CSF file described in Table 2-4. Figure 2-29 shows a vendor.xml file example.

Figure 2-29    A vendor.xml File with Changes Noted

<?xml version="1.0" encoding="UTF-8"?>

<root>

  <X vortex-type="category" name="X">

    <Y vortex-type="category" name="Y/">

      <A vortex-type="category" name="A">

        <Item vortex-type="item" action="update">

          <Attribute vortex-type="attribute" value=" 1" name="SKU" key="yes" />

        </Item>

      </A>

    </Y>

   </X>

<root>

  <X vortex-type="category" name="X">

    <Y vortex-type="category" name="Y/">

      <A vortex-type="category" name="A">

        <Item vortex-type="item" action="updateold">

          <Attribute vortex-type="attribute" value=" 1" name="SKU" key="yes" />

        </Item>

      </A>

    </Y>

   </X>


After the hrdrg script creates the file vendor.xml file, you can customize an xsl stylesheet to format and display the information in the vendor.xml file in an HTML browser. Example of the stylesheets are in the following directory.

<imm_install_dir>/catalogtools/xsl


Previous     Contents     Index     Next     
Copyright © 2002 Sun Microsystems, Inc. All rights reserved.

Last Updated March 25, 2002