Setting Up Database Sources

With a database source, you can crawl any JDBC-enabled database. A database source can crawl database content projected as a view or query. Each record in the view or query result set is interpreted as a document. You can create public database sources or secure database sources.

Required Columns in Database Sources

The view or query to be crawled must contain the columns described in Table 9-1. All column names must be in upper case.

Table 9-1 Database Source Required Columns

Column	Type	Description
`CONTENT`	`VARCHAR2` or `CLOB`	Document content.
`KEY`	`VARCHAR2`	Key to identify the record in the record set. You can use a custom name for this column by modifying drivers.properties. See "Configuring the JDBC Driver".
`LANG`	`VARCHAR2`	Document language in ISO 639-1 language code; for example, `en` for English or `ja` for Japanese.
`LASTMODIFIEDDATE`	`DATE`	Last modified date of the document. If you do not have a column for the mandatory LastModifiedDate attribute, use a constant date value in the SQL query for the source. Use a format that the getTimestamp method of the corresponding JDBC driver accepts without errors. Incremental changes to records are not picked up by re-crawls, so always schedule a full crawl.
`URL`	`VARCHAR2`	Display URL for the document. The value for this column cannot be null. This connector requires that there is URL-based access to the records in the result set of the view or query.

Optional Columns in Database Sources

The view or query can contain the optional columns describe in Table 9-2. Any other column is considered an attribute of the document.

If the query or view contains both content and either an attachment or attachment link, then one column (in the following order) is considered document content:

ATTACHMENT_LINK
ATTACHMENT
CONTENT

Even if the ATTACHMENT_LINK or ATTACHMENT column is specified in the query, you should include the mandatory CONTENT column. However, the content of ATTACHMENT_LINK or ATTACHMENT is indexed as document content.

Table 9-2 Database Source Optional Columns

Column	Type	Description
`ATTACHMENT`	`BLOB`	Binary attachments for the document.
`ATTACHMENT_LINK`	`VARCHAR2`	A link to the attachment for the document. `HTTP`, `HTTPS`, `FILE`, and `FTP` are valid.)
`CONTENTTYPE`	`VARCHAR2`	Content type of the document; for example, "text/html" for HTML documents, "application/pdf" for PDF documents, or "application/msword" for Microsoft Word documents. Leave blank when the content type is unknown or varied so that is it not feasible to specify the content type for each document individually.
`PATH`	`VARCHAR2`	Path to the document. This is used in the browse feature. This can be used to represent the organizational hierarchy of the document. For example, level1#level2#level3.
`TITLE`	`VARCHAR2`	Title of the document to be displayed in the Oracle SES search result page.
`LMD_TIMEZONE`	`VARCHAR2`	Specifies the time zone for the date specified in `LASTMODIFIEDDATE`. For example `CST`. Oracle SES converts the last modified date from the specified time zone to Oracle SES time zone. If the time zone is not specified, then the date is considered to be in the Oracle SES time zone. See "Searching on Date Attributes" for more details about the default Oracle SES time zone.

Configuring the JDBC Driver

Depending on your database source, you may need to configure the JDBC driver.

To crawl any third-party database:

Download the appropriate JDBC driver jar into the ORACLE_HOME/search/lib/plugins/oracleapplications directory in Oracle SES. If the JDBC drivers for JRE 1.5 and JRE 1.6 are different, then perform the following:
1. Download both the driver jars into ORACLE_HOME/search/lib/plugins/oracleapplications directory in Oracle SES.
2. Add an entry for the JRE 1.6 version of the driver jar to the CLASSPATH element of ORACLE_HOME/search/config/searchctl.conf.
3. Restart the middle tier.
Update the drivers.properties file with the following information: DatabaseName:DriverClassName.
Add the JRE 1.5 JDBC driver jar file name to the classpath in MANIFEST.MF of appsjdbc.jar and DBCrawler.jar.

For a key attribute that is not named KEY:

Change the JDBC driver information in the drivers.properties file to specify the key attribute name:

database_name: driver_class_name, key_attribute_name

For example, for a key attribute named ID:

oracle : oracle.jdbc.driver.OracleDriver, ID

In the crawling query, use key_attribute_name as the alias for the key value column name. In this example, ID is the alias for KEYVAL:

SELECT keyval id, content, url, lastmodifieddate, lang FROM sales_only

For Oracle and SQL Server databases, the following default drivers are used if none is specified in drivers.properties:

Oracle: oracle.jdbc.driver.OracleDriver
SQL Server: com.microsoft.sqlserver.jdbc.SQLServerDriver

Query File XML Schema Definition

The following is the XSD that defines the format of the XML query file.

<!--[if !supportEmptyParas]-->XSD for the XML sub-queries file:<!--[endif]-->
<?xml version="1.0" encoding="windows-1252" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"  xmlns="http://xmlns.oracle.com/ses/sqlconnector/detail-attribute-queries" targetNamespace="http://xmlns.oracle.com/ses/sqlconnector/detail-attribute-queries"  elementFormDefault="qualified">
  <xsd:complexType name="sqlQueriesType">
    <xsd:annotation>
      <xsd:documentation>
        Specify detail and attribute queries as a source parameter for 
        each document fetched by the parent query.
      </xsd:documentation>
    </xsd:annotation>
    <xsd:sequence>
      <xsd:element name="attachmentQueries" maxOccurs="1" minOccurs="0">
        <xsd:annotation>
          <xsd:documentation>
            Specify detail queries to fetch detail records for each document
            represented by the parent record. The parent records, fetched by 
            the parent query, are specified as a source parameter. Each record 
            in the document (parent) query can be associated with several detail
            (child) records. Each of these child records has a single column
           specifying the content that will be indexed as attachment to the
           parent document. The child query should select a single column, and
           the WHERE clause should have bind variables of the form 
           ##PARENT ATTR##, where the value of PARENT ATTR from the parent
           record is substituted while executing the detail query. 
         </xsd:documentation> 
        </xsd:annotation>
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="query" maxOccurs="unbounded" minOccurs="1">
             <xsd:complexType>
                <!--Attribute to specify whether the contents retrieved by the
                  query is inline attachment or link to an attachment. The value
                  "true" specifies that the content is a link to an attachment
                  and "false" indicates inline attachment. Default value is
                  false.-->                                    
                <xsd:attribute name="link" default="false"/>
                <!--Content type of the attachment. If no value is specified, 
                  SES will auto-detect the content type.-->
                <xsd:attribute name="contenttype" default="null"/>
              </xsd:complexType>
            </xsd:element>
          </xsd:sequence>
        </xsd:complexType>
      </xsd:element>
      <xsd:element name="attributeQueries" maxOccurs="1" minOccurs="0">
        <xsd:annotation>
          <xsd:documentation>
            Specify queries to retrieve values of attributes of the parent
            document. Use this feature if the attribute can contain multiple
            values for a document. If the attribute is a single-valued 
            attribute, then it can be specified in the parent query. The WHERE
            clause should have bind variables of the form ##PARENT ATTR##,
            where the value of PARENT ATTR from the parent record is substituted
            while executing the query.
          </xsd:documentation>
        </xsd:annotation>
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="query" maxOccurs="unbounded" minOccurs="1"/>
          </xsd:sequence>
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>
  <xsd:element name="sqlQueries" type="sqlQueriesType"/>
</xsd:schema>

Creating Public Database Sources

Public database sources have no security implemented in Oracle SES.

To create a public database source:

Create a database source on the Home - Sources page. Select Database from the Source Type list, and click Create.
Enter the database source parameters as described in Table 9-3.
Click Next.
Set authorization to No Access Control List, and clear the authorization manager class name and jar name.
Click Create to create the database source.

Table 9-3 Database Source Parameters

Parameter	Value
Database Connection String	JDBC connection string for the database with content to be crawled. The JDBC string is driver-specific. For example, `jdbc:oracle:thin:@server:port:SID`
User ID	User ID to log in to the database specified in Database Connection String. This user ID must have access to the schema owning the view specified in View or the query specified in Query.
Password	Password to log in to the database specified in Database Connection String.
View	Table or view to be crawled. Specify either View or Query, not both.
Document Count	Maximum number of documents to be crawled before indexing. Enter `-1` to crawl all documents before indexing.
Query File	Path to the XML file specifying the subqueries to crawl attachments and attributes of documents corresponding to every record in the main query. See "Query File XML Schema Definition".
Query	Query projecting the content to be crawled. Specify either View or Query, not both.
URL Prefix	String that precedes the content of the URL column and forms a display URL for the document.
Cache File	Prefix of a local file name in which the contents can be temporarily cached while crawling.
Path Separator	The character separating the tokens in the `PATH` of the document as returned by the query or view. It must be a single character, and it cannot be a space, a single or double quote, or a control character.
Parse Attributes	Enter `true` to extract the values of the attributes from the document content specified in the `SOLUTION` or `CONTENT` column. Enter `false` otherwise, or when the content is type text/html. In this example, attr1 and attr2 are extracted as attributes of the document with values 22 and 333 respectively: <attr1>22</attr1> <attr2>333</attr2> Content up to the first attribute is interpreted as the document content. The remaining portion is used to extract attributes only. In this example, only "page" is considered document content: page<attr1>22</attr1> is <attr2>333</attr2> dispersed
Remove Deleted Documents	Enter `true` to remove deleted documents from the index; otherwise, enter `false`.
Attachment Link Authentication Type	Standard Java authentication type used by the application serving the link in the `ATTACHMENT_LINK` column. Enter one of these values: `PUBLIC`: No authentication. `DIGEST`: Digest authentication `BASIC`: Basic authentication `NATIVE`: Native authentication in the source
Attachment Link User ID	User ID for accessing the links specified in the `ATTACHMENT_LINK` column. Required when the link targets are secure.
Attachment Link Password	Password for Attachment Link User ID.
Attachment Link Realm	Realm of the application serving the link in the `ATTACHMENT_LINK` column. Required when the link targets are secure.
Grant Security Attributes	Leave blank for public sources.
Deny Security Attributes	Leave blank for public sources.

Defining User-Defined Security for Database Sources

Some attributes in the view or query being crawled must be identified as security attributes. The values of these attributes determine if a user is authorized to view a document. These attributes can be either GRANT attributes or DENY attributes.

Database Search Attributes

Database sources have no predefined attributes. The crawler collects attributes from columns defined during source creation. You must map the columns to the search attributes.

Example of Creating a Database Source With User-Defined Security

The document set to be crawled is in tables T1 and T2 as specified by the following query:

SELECT 
     T1.ID, 
     T1.DESCRIPTION, 
     T2.NAME, 
     T1.LAST_UPDATE_DATE, 
     T2.AUTH_ID, T1.HIERARCHY
FROM 
     T1, T2 
WHERE 
     T1.ID = T2.DOC_ID

The document content is provided by the T1.DESCRIPTION column.

Each document has an HTTP access URL of the form http://my.company.com/docserver?doc_id=document_identifier.

Access to a document is controlled by the value of T2.AUTH_ID. A document is accessible to a user SCOTT only if the value of T2.AUTH_ID for the document is in the list of AUTH_IDs for the user as retrieved by the following query:

SELECT AUTH_ID FROM USER_AUTH A 
     WHERE A.USER='SCOTT'

This source can be crawled as a database source type with the following source parameter values:

Database Connection String: jdbc:oracle:thin:@example:7777:ses
User ID: apps_user
Password: ******
View:
Document Count: -1

Query:

SELECT 
     'docserver?doc_id='|| T1.ID URL, 
     T1.ID "KEY", 
     'en' LANG, 
     T1.LAST_UPDATE_DATE LASTMODIFIEDDATE, 
     T1.DESCRIPTION CONTENT, 
     'text/plain' CONTENTTYPE, 
     T2.NAME CUSTOMER_NAME, 
     T2.AUTH_ID, 
     T1.HIERARCHY PATH 
FROM 
     T1, T2 
WHERE 
     T1.ID=T2.DOC_ID

Query File:

<?xml version="1.0" encoding="UTF-8" ?>
<sqlQueries xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.oracle.com/ses/sqlconnector/detail-attribute-queries detail-attribute-queries.xsd"
 xmlns="http://xmlns.oracle.com/ses/sqlconnector/detail-attribute-queries">
   <attachmentQueries>
     <query>
       <![CDATA[SELECT COMMENTS FROM COMMENTS_TBL WHERE DOCID=##KEY##]]>
     </query>
     <query link="true" contenttype='text/html'>
            <![CDATA[SELECT NOTES FROM NOTES_LINK_TBL WHERE DOCID=##KEY##]]>
     </query>
  </attachmentQueries>
  <attributeQueries>
       <query>
         <![CDATA[SELECT AUTHOR FROM AUTHORS_TBL WHERE DOCID=##KEY##]]>
       </query>
       <query>
                <![CDATA[SELECT KEYWORD FROM KEYWORD_TBL WHERE DOCID=##KEY##]]>
       </query>
  </attributeQueries>
</sqlQueries>

AUTHID and KEY are columns in the select list of the parent query.

Note: This must be the path to a local file containing the subqueries for attributes and attachments that are currently listed directly for query file.

URL Prefix: http://my.example.com/
Cache File: /tmp/cacheFile
Path Separator: #
Parse Attributes: false
Grant Security Attributes: AUTH_ID
Deny Security Attributes:

The following are sample parameter values for authorization.

Database Connection String: jdbc:oracle:thin:@example:7777:ses
User ID: apps_user
Password: *****

Authorization Query:

SELECT AUTH_ID 
FROM USER_AUTH A 
WHERE A.USER=UPPER(?)

Single Record Query: false
Authorization User ID Format: username