Skip Headers
Oracle® Secure Enterprise Search Administrator's Guide
11g Release 2 (11.2.2)

Part Number E23427-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

Configuring Support for Image Metadata

The Oracle SES crawler initially is set to search only text files. You can change this behavior by configuring an image document service connector to search the metadata associated with image files. Image files can contain rich metadata that provide additional information about the image itself.

The Image Document Service connector integrates Oracle Multimedia (formerly Oracle interMedia) images with Oracle SES. This connector is separate from any specific data source.

The following table identifies the metadata formats (EXIF, IPTC, XMP, DICOM) that can be extracted from each supported image format (JPEG, TIFF, GIF, JPEG 2000, DICOM).


JPEG TIFF GIF JPEG2000 DICOM
EXIF Yes Yes No No No
IPTC Yes Yes No No No
XMP Yes Yes Yes Yes No
DICOM No No No No Yes

See Also:

Oracle Multimedia User's Guide and Oracle Multimedia Reference for more information about image metadata

Identifying the Search Attributes for Image Metadata

Image files can contain metadata in multiple formats, but not all of it is useful when performing searches. A configuration file in Oracle SES enables you to control the metadata that is searched and published to an Oracle SES Web application.

If you upgraded from a previous release, then the default configuration file remains ordesima-sample.xml.

The default configuration file is named attr-config.xml. You can modify this file, which is located at ORACLE_HOME/search/lib/plugins/doc/ordim/config/. Oracle recommends that you create a copy of the default configuration file before editing it. Note that the configuration file must conform to the XML schema ORACLE_HOME/search/lib/plugins/doc/ordim/xsd/ordesima.xsd.

Oracle SES indexes and searches only those image metadata tags that are defined within the metadata element (between <metadata>...</metadata>) in the configuration file. By default, the configuration file contains a set of the most commonly searched metadata tags for each of the file formats. You can add other metatags to the file based on your specific requirements.

Image files can contain metadata in multiple formats. For example, an image can contain metadata in the EXIF, XMP, and IPTC formats. An exception to this are DICOM images, which contain only DICOM metadata. Note that for IPTC and EXIF formats, Oracle Multimedia defines its own image metadata schemas. The metadata defined in the configuration file must conform to the Oracle Multimedia defined schemas.

Because different metadata formats use different tags to refer to the same attribute, it is necessary to map metatags and the search attributes they define. Table 3-1 lists some commonly used metatags and how they are mapped in Oracle SES.

Table 3-1 Metatag Mapping

Oracle SES Attribute Name Oracle SES Predefined Name EXIF Metatag IPTC Metatag XMP Metatag

Author

Author

Artist

Author

photoshop:Creator

AuthorTitle

X

X

AuthorTitle

photoshop:AuthorsPosition

Description

Description

ImageDescription

Caption

dc:Description

Title

Title

X

ObjectName

dc:Title

DescriptionWriter

X

X

captionWriter

photoshop:CaptionWriter

Headline1

Headline1

X

Headline

photoshop:Headline

Category

X

X

Category

photoshop:Category

Scene

X

X

X

Iptc4xmpCore:Scene

Publisher

X

X

X

dc:Publisher

Source

X

X

Source

photoshop:Source

Copyright

X

Copyright

Copyright

dc:rights

Keywords

Keywords

X

Keyword

dc:subject

Provider

X

X

Credit

photoshop:Credit

City

X

X

City

photoshop:City

State

X

X

provinceState

photoshop:State

Country

X

X

Country

photoshop:Country

Location

X

X

location

Iptc4xmpCore:Location

EquipmentMake

X

Make

X

tiff:Make

EquipmentModel

X

Model

X

tiff:Model


Oracle SES provides this mapping in the configuration file attr-config.xml. You can edit the file to add other metatags. Oracle recommends that you make a copy of the original configuration file before editing the settings. The configuration file defines the display name of a metatag and how it is mapped to the corresponding metadata in each of the supported formats.

This is done using the <searchAttribute> tag, as shown in the example below:

<searchAttribute>
 <displayName>Author</displayName>
 <metadata>
   <value format="iptc">byline/author</value>
   <value format="exif">TiffIfd/Artist</value>
   <value format="xmp">dc:creator</value>
   <value format="xmp">tiff:Artist</value>
 </metadata>
</searchAttribute>

For each search attribute, the value of <displayName> is an Oracle SES attribute name that is displayed in the Oracle SES web application when an Advanced Search - Attribute Selection is performed. If any of the listed attributes are detected during a crawl, then Oracle SES automatically publishes the attributes to the SES web application.

For the <value> element, the format attribute must take the value of a supported format, such as iptc, exif, xmp, or dicom.

The value defined within the element, for example, byline/author, is the XML path when the image format is IPTC, EXIF, or XMP. For DICOM, this value must be the standard tag number or value locator.

For IPTC and EXIF formats, the XML path must conform to the metadata schemas defined by Oracle Multimedia. These schemas are defined in the files ordexif.xsd and ordiptc.xsd located at ORACLE_HOME/search/lib/plugins/doc/ordim/xsd/.

You do not need to specify the root elements defined in these schemas (iptcMetadata, exifMetadata) in the configuration file. For example, you can specify byline/author as the xmlPath value of the author attribute in IPTC format. Oracle Multimedia does not define XML schemas for XMP metadata, so refer to the Adobe XMP specification for the xmlPath value.

Within the <searchAttribute> tag, you can also specify an optional <dataType> tag if the attribute carries a date or numeric value. For example,

<searchAttribute>
     <displayName>AnDateAttribute</displayName>
     <dataType>date</dataType>
     <metadata>
        ...
     </metadata>
</searchAttribute>
   

The default data type is string, so you do not have to explicitly specify a string.

Supporting XMP Metadata

Oracle SES supports both standard and custom XMP metadata searches. Because all XMP properties share the same parent elements <rdf:rdf><rdf:description>, you must specify only the real property schema and property name in the configuration file. For example, specify photoshop:category instead of rdf:rdf/rdf:description/photoshop:category. The same rule applies to XMP custom metadata also. However, for XMP structure data, you must specify the structure element in the format parent/child 1/child 2/…child N, where child N is a leaf node. For example, Iptc4xmpCore:CreatorContactInfo/Iptc4xmpCore:CiPerson. Note that the image plug-in does not validate the metadata value for XMP metadata.

XMP metatags consist of 2 components separated by a colon(:). For example, photoshop:Creator, which corresponds to the Author attribute (see Table 3-1). In this example, photoshop refers to the XMP schema namespace. The other common namespaces include dc, tiff, and Iptc4xmpCore.

Before defining any XMP metadata in the configuration file, you must ensure that the namespace is defined. For example, before defining the metadata photoshop:Creator, you must include the namespace photoshop in the configuration file. This rule applies to both the standard and custom XMP metadata namespaces. As a best practice, Oracle recommends that you define all the namespaces at the beginning of the configuration file. If the namespace defined in the configuration file is different from the one in the image, then Oracle SES cannot find the attributes associated with this namespace. You can define namespaces as shown:

<xmpNamespaces>
<namespace prefix="Iptc4xmpCore">http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/</namespace>
<namespace prefix="dc">http://purl.org/dc/elements/1.1/</namespace>
<namespace prefix="photoshop">http://ns.adobe.com/photoshop/1.0/</namespace>
<namespace prefix="xmpRights">http://ns.adobe.com/xap/1.0/rights/</namespace>
<namespace prefix="tiff">http://ns.adobe.com/tiff/1.0/</namespace>
</xmpNamespaces>

The Adobe XMP Specification requires that XMP namespaces end with a slash (/) or hash (#) character.

See Also:

Adobe Extensible Metadata Platform (XMP) Specification for the XMP metadata schema and a list of standard XMP namespace values.

http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf

Custom XMP metadata must be explicitly added to attr-config.xml. An example of a custom metadata is:

<xmpNamespaces>
  <namespace prefix="hm">http://www.oracle.com/ordim/hm/</namespace>
</xmpNamespaces>
<searchattribute>
  <displayname>CardTitle</displayname>
  <metadata>
    <value format="xmp">hm:cardtitle</value>       
  </metadata>
</searchattribute>

Supporting DICOM Metatags

Oracle SES 11g supports DICOM metatags, and these metatags are available in the default configuration file attr-config.xml.

DICOM metatags are either DICOM standard tags or DICOM value locators.

DICOM Standard Tags

DICOM standard tags are 8-digit hexadecimal numbers, represented in the format ggggeeee where gggg specifies the group number and eeee specifies the element number. For example, the DICOM standard tag for the attribute performing physician's name is represented using the hexadecimal value 00081050.

The group number gggg must take an even value, excepting 0000, 0002, 0004, and 0006, which are reserved group numbers.

The DICOM standard defines over 2000 standard tags.

The file attr-config.xml contains a list of predefined DICOM standard metatags. You can add new metatags to the file as shown in the following example:

<searchAttribute>
      <displayName>PerformingPhysicianName</displayName>
      <metadata>
        <value format="dicom">00081050</value>
      </metadata>
</searchAttribute>

Note:

The image connector does not support SQ, UN, OW, OB, and OF data type tags. Therefore, do not define such tags in the configuration file.

See Also:

http://medical.nema.org for more information about the standard tags defined in DICOM images, and the rules for defining metatags

DICOM Value Locators

Value locators identify an attribute in the DICOM content, either at the root level or from the root level down.

A value locator contains one or more sublocators and a tag field (optional). A typical value locator is of the format:

sublocator#tag_field

Or of the format:

sublocator

Each sublocator represents a level in the tree hierarchy. DICOM value locators can include multiple sublocators, depending on the level of the attribute in the DICOM hierarchy. Multiple sublocators are separated by the dot character (.). For example, value locators can be of the format:

sublocator1.sublocator2.sublocator3#tag_field

Or of the format:

sublocator1.sublocator2.sublocator3

A tag_field is an optional string that identifies a derived value within an attribute. A tag that contains this string must be the last tag of a DICOM value locator. The default is NONE.

A sublocator consists of a tag element and can contain other optional elements. These optional elements include definer and item_num. Thus, a sublocator can be of the format:

tag

Or it can be of the format

tag(definer)[item_num)

Table 3-2 Sub Components of a Sublocator

Component Description

tag

A DICOM standard tag represented as an 8-digit hexadecimal number.

definer

A string that identifies the organization creating the tag. For tags that are defined by the DICOM standard, the default value (which can be omitted) is DICOM.

Oracle SES supports DICOM standard tags alone. It does not support private tags.

item_num

An integer that identifies a data element within an attribute, or a wildcard character ("*") that identifies all data elements within an attribute. It takes a default value of 1, the first data element of an attribute. This parameter is optional.


The following example shows how to add a value locator to the attr-config.xml file:

<searchAttribute>
  <displayName>PatientFamilyName</displayName>
  <metadata>
  <value format="dicom">00100010#UnibyteFamily</value>       
  </metadata>
</searchAttribute>

where UnibyteFamily is a tag_field of person name.

The following example shows how to define a value locator from the root level.

<searchAttribute>
      <displayName>AdmittingDiagnosisCode</displayName>
      <metadata>
        <value format="dicom">00081084.00080100</value>       
      </metadata>
</searchAttribute>
<searchAttribute>
      <displayName>AdmittingDiagnosis</displayName>
      <metadata>
        <value format="dicom">00081084.00080104</value>
      </metadata>
</searchAttribute>

In the above example, the tag 00081084 represents the root tag Admitting Diagnoses Code Sequence. This tag includes four child tags: code value (0008, 0100), coding scheme designator (0008, 0102), coding scheme version (0008, 0103) and code meaning (0008, 0104). In this example, the value locators are code value: 00081084.00080100 and code meaning: 00081084.00080104.

Note:

The image connector does not support SQ, UN, OW, OB, and OF data type value locators. Therefore, ensure that the last sublocator of a value locator does not specify such data types.

See Also:

Oracle Multimedia DICOM Developer's Guide for more information about DICOM value locators

Example: Adding an Attribute to the Default attr-config.xml File

To search for information about image caption writer:

  1. Open Oracle SES Administration GUI and create the DescriptionWriter attribute:

    Specify DescriptionWriter as an Oracle SES attribute name (shown on the Advanced Search - Attribute Selection page).

  2. Examine the following sources for information relevant to modifying the default attr-config.xml file:

    • Oracle Multimedia IPTC schema at ORACLE_HOME/search/lib/plugins/doc/ordim/xsd/ordiptc.xsd. The IPTC metadata for image caption writer is shown as captionWriter.

    • Adobe XMP Specification for XMP Metadata. The XMP path for this property is defined as photoshop:CaptionWriter.

    • Oracle Multimedia EXIF schema. There is no caption writer metadata in EXIF.

  3. Add the following section to attr-config.xml:

    <searchAttribute>
       <displayName>DescriptionWriter</displayName>
       <metadata>
           <xmlPath format="iptc">captionWriter</xmlPath>
           <xmlPath format="xmp">photoshop:CaptionWriter</xmlPath>
       </metadata>
    </searchAttribute>
    
  4. If the photoshop XMP namespace is not registered in the configuration file, then add the namespace element to xmpNamespaces as shown here:

    <xmpNamespaces>
       <namespace prefix="photoshop">http://ns.adobe.com/photoshop/1.0/</namespace>
    

    .

    . existing namespaces

    .

    </xmpNamespaces>
    

Creating an Image Document Service Connector

A default Image Document Service connector instance is created during the installation of Oracle SES. You can configure the default connector or create a new one.

To create an Image Document Service instance: 

  1. In the Oracle SES Administration GUI, click Global Settings.

  2. Under Sources, click Document Services to display the Global Settings - Document Services page.

  3. To configure the default image service instance:

    1. Click Expand All

    2. Click Edit for the default image service instance.

    or

    To create a new image service instance:

    1. Click Create to display the Create Document Service page.

    2. For Select From Available Managers, choose Secure Enterprise Search Image Document Service and click Next.

    3. Provide a name for the instance.

  4. Provide a value for the attributes configuration file parameter.

    The default value of attributes configuration file is attr-config.xml. The file is located at ORACLE_HOME/search/lib/plugins/doc/ordim/config/.

  5. Click Apply.

  6. Click Document Services in the locator links to return to the Document Services page.

  7. Add the Image Document Service plug-in to either the default pipeline or a new pipeline.

To add the default Image Document Service plug-in to the default pipeline: 

  1. Under Document Service Pipelines, click Edit for the default pipeline.

  2. Move the Image Document Service instance from Available Services to Used in Pipeline.

  3. Click Apply.

To create a new pipeline for the default Image Document Service plug-in: 

  1. Under Document Service Pipelines, click Create to display the Create Document Service Pipeline page.

  2. Enter a name and description for the pipeline.

  3. Move the Image Document Service instance from Available Services to Used in Pipeline.

  4. Click Create.

Using the Image Document Service Connector

You must either create a source to use the connector or enable the connector for an existing source.

To enable the connector for an existing source: 

  1. Click Sources on the Home page.

  2. Click the Edit icon for the desired source.

  3. Click Crawling Parameters.

  4. Select the pipeline that uses the Image Document Service and enable the pipeline for this source.

  5. Click Document Types. From the Not Processed column, select the image types to search and move them to the Processed column. The following sources are supported: JPEG, JPEG2000, GIF, TIFF, DICOM.

Searching Image Metadata

You can search image metadata from either the Oracle SES Basic Search page or the Advanced Search - Attribute Selection page.

For Basic Search, Oracle SES searches all the metadata defined in the configuration file for each supported image document (JPEG, TIFF, GIF, JPEG 2000, and DICOM). It returns the image document if any matching metadata is found.

Advanced Search enables you to search one or more specified attributes. It also supports basic operations for date and number attributes. Oracle SES returns only those image documents that contain the specified metadata.

Oracle SES does not display the Cache link for image search results.

Troubleshooting the Image Document Service Connector

If the Image Document Service Connector fails, then check the following:

  • Is the pipeline with an Image Document Service connector instance enabled for the source?

  • Are the image types added to the source?

  • For a web source, are the correct MIME types included in the HTTP server configuration file?

    For example, if you use Oracle Application Server, then check the Apache mime.types file. If the following media types are missing, then add them:

    MIME Type Extensions
    image/jp2 jp2
    application/dicom dcm

  • If a connection is established, and all the image files are not crawled, then check whether the recrawl policy is set to Process Documents That Have Changed. If so, change it to Process All Documents.

    To do this step, go to Home - Schedules, and under Crawler Schedules, click Edit for the specific source. This opens the Edit Schedule page. Under Update Crawler Recrawl Policy, select Process All Documents.

    You can change the recrawl policy back to Process Documents That Have Changed, after the crawler has finished crawling all the documents in the new source.