You can use the Table structure parsing text processing option to specify how tables should be parsed in HTML documents. Be cause this feature is intended to preserve relationships between data items in a row or column, use it for tables that contain data, as opposed to tables used for page layout. ATG Search requires that tables be identified by an id
or class
attribute in order to be parsed. Table parsing is specified using the form:
Id_or_class|type|start|rowList|colList|titleType,titleArg
Only the id
/class
and type
are required, the other values are optional. Explanations of each value follow.
id or class—These attributes on the
<table>
tag identify the table to associate with this parsing information. Required.type—Describes how the table is organized, and can be any of the values listed below (ObjectFeatureValue, RowFeatureList, etc.). Required.
ObjectFeatureValue—Consists of a left-hand column of items and a header row of features or attributes. The remaining cells specify the value for the feature of the corresponding item. Although the cells in a row are related, the system treats each cell (with its object and feature) as separate text statements which can be retrieved separately.
Item
Color
Shape
ItemA
Blue
Square
ItemB
Red
Circle
The ItemA row would resolve to the following:
ItemA Color Blue
ItemA Shape Square
RowFeatureList—Consists of a header row of related features or attributes. The remaining cells specify the value for the feature. Very similar to ObjectFeatureValue. Since the cells in a row are related, the system treats each row (with its features) as separate text statements which can be retrieved separately.
Color
Shape
ItemA
Blue
Square
ItemB
Red
Circle
The two rows would resolve to the following:
ItemA Color Blue Shape Square
ItemB Color Red Shape Circle
RowFeature—Consists of a header row of unrelated features or items. The remaining columns specify information about their corresponding header cell. Similar to ObjectFeatureValue, except that the left-hand column provides no information across the row, and the cells show more than just a value (possibly even full sentences). Since the cells in a column are unrelated, in this logical interpretation, the system treats each cell (with its feature) as separate text statements which can stand on their own (i.e. be retrieved separately).
ItemA
ItemB
ItemC
Is a cube
Is a pyramid
Is a sphere
Is green
Is blue
Is red
The ItemA column would resolve to the following:
ItemA Is a cube
ItemA Is green
RowOnly—Consists of a header row of unrelated features or items. The remaining columns specify information about their corresponding header cell. Similar to the Row Feature structure, except that the cells show short related information, not full sentences. Since the cells in a row are related, the system treats each row as separate text statements which can be retrieved separately.
ItemA is
Blue
Square
ItemB is
Red
Circle
The ItemA row would resolve to the following:
ItemA is Blue Square
ColumnFeatureList—Same as RowFeatureList, but rotated so that the header is the left-most column.
Item
ItemA
ItemB
Color
Blue
Red
Shape
Square
Circle
The ItemA column would resolve to the following:
ItemA Color Blue Shape Square
ColumnFeature—Same as RowFeature, but rotated so that the header is the left-most column.
ItemA
Is a cube
Is green
Item B
Is a pyramid
Is blue
Item C
Is a sphere
Is red.
The ItemA row would resolve to the following:
ItemA Is a cube Is green
ColumnOnly—Same as RowOnly, but rotated so that the header is the left-most column.
ItemA is
Item B is
Blue
Red
Square
Circle
The ItemA column would resolve to the following:
ItemA is Blue Square
start—Either the starting row or starting column, depending on the type; the default is 0, referring to the topmost or leftmost column. Optional.
rowList—Comma-delimited list of rows to traverse. If this value is included, any rows not specified are excluded from indexing. Optional.
colList—Comma-delimited list of column to traverse. If this value is included, any columns not specified are excluded from indexing. Optional.
titleType and titleArg—If your table has a title you want to include in the logical structure of the table, specify
explicit
as the titleType, and the text of the title as the titleArg.