Generally, you will be able to determine what source document attributes can be mapped to portal properties, but it might not be as clear in HTML documents.
HTML Metadata | Name of Attribute Returned by HTML Accessor | Default Mapping or Mapping Suggestion |
---|---|---|
<TITLE> Tag | Title | Title (default) |
<META> Tag | The attribute name is the NAME value. Example: <META NAME="creation_date" CONTENT="18-Jan-2004">The attribute that would be extracted from the example would be named “creation_date” |
Using the example, you could map creation_date to the Created property. |
Headline Tags | The attribute name is the name of the tag followed by an ordinal,
one-based index in parentheses. The Accessor returns a value for each headline tag (<H1>, <H2>, <H3>, <H4>, <H5>, and <H6>) and each bold tag (<B>). Example:
<H1>Value 1</H1> <H3>Value 2</H3> <H1>Value 3</H1> <B>Value 4</B> The HTML Accessor returns the
following source document attribute-value pairs:
<h1>(1) Value 1 <h3>(1) Value 2 <h1>(2) Value 3 <B>(1) Value 4 |
If on a particular news site, the second <H2> tag contains the name of the article and the third <B> tag contains the name of the author, you could map the portal property Title to <H2>(2) and the portal property Author to <B>(3). |
HTML Comments | It is common practice to store metadata in HTML comments using
the following format:<!-- Writer: jm --> <!-- AP: md --> <!-- Copy editor: mr --> <!-- Web editor: ad --> In other words, the format
is the HTML comment delimiter followed by the name, a colon, the value,
and a close comment delimiter. The HTML Accessor parses data in this
format and returns the following source document attribute-value pairs:
Writer jm AP md Copy editor mr Web editor ad |
Using the example, you could map Writer to the portal property Author. |
Parent URL | Documents imported via a web content crawl return an attribute named Parent URL with the value of the URL of the parent page that contains a link to the document. | URL (default) |
Anchors | The HTML Accessor provides special handling for internal anchors (<a name=”target”>) and URLs that reference them (http://server/page#target). | You might map anchors to portal attributes in the following
ways:
|