Setting Up Database Sources

With a database source, you can crawl any JDBC-enabled database. A database source can crawl database content projected as a view or query. Each record in the view or query result set is interpreted as a document. You can create public database sources or secure database sources.

Required Columns in Database Sources

The view or query to be crawled must contain the columns described in Table 8-9. All column names must be in upper case.

Table 8-9 Database Source Required Columns

Column	Type	Description
`CONTENT`	`VARCHAR2` or `CLOB`	Document content.
`KEY`	`VARCHAR2` or `RAW`	Key to identify the record in the record set. You can use a custom name for this column by modifying `drivers.properties`. See "Configuring the JDBC Driver".
`LANG`	`VARCHAR2`	Document language in ISO 639-1 language code; for example, `en` for English or `ja` for Japanese.
`LASTMODIFIEDDATE`	`DATE`	Last modified date of the document. If you do not have a column for the mandatory LastModifiedDate attribute, use a constant date value in the SQL query for the source. Use a format that the getTimestamp method of the corresponding JDBC driver accepts without errors. Incremental changes to records are not picked up by re-crawls, so always schedule a full crawl.
`URL`	`VARCHAR2`	Display URL for the document. The value for this column cannot be null. This connector requires that there is URL-based access to the records in the result set of the view or query.

Optional Columns in Database Sources

The view or query can contain the optional columns describe in Table 8-10. Any other column is considered an attribute of the document.

If the query or view contains both content and either an attachment or attachment link, then one column (in the following order) is considered document content:

ATTACHMENT_LINK
ATTACHMENT
CONTENT

Even if the ATTACHMENT_LINK or ATTACHMENT column is specified in the query, you should include the mandatory CONTENT column. However, the content of ATTACHMENT_LINK or ATTACHMENT is indexed as document content.

Table 8-10 Database Source Optional Columns

Column	Type	Description
`ATTACHMENT`	`BLOB`	Binary attachments for the document.
`ATTACHMENT_LINK`	`VARCHAR2`	A link to the attachment for the document. `HTTP`, `HTTPS`, `FILE`, and `FTP` are valid.)
`CONTENTTYPE`	`VARCHAR2`	Content type of the document; for example, "text/html" for HTML documents, "application/pdf" for PDF documents, or "application/msword" for Microsoft Word documents. Leave blank when the content type is unknown or varied so that is it not feasible to specify the content type for each document individually.
`PATH`	`VARCHAR2`	Path to the document. It is used in the browse feature. It can represent the organizational hierarchy of the document. For example, level1#level2#level3.
`TITLE`	`VARCHAR2`	Title of the document to be displayed in the Oracle SES search result page.
`LMD_TIMEZONE`	`VARCHAR2`	Specifies the time zone for the date specified in `LASTMODIFIEDDATE`. For example `CST`. Oracle SES converts the last modified date from the specified time zone to Oracle SES time zone. If the time zone is not specified, then the date is considered to be in the Oracle SES time zone.

Configuring the JDBC Driver

Depending on your database source, you may need to configure the JDBC driver.

To crawl any third-party database:

Download the appropriate JDBC driver jar for JRE 1.6 into ORACLE_HOME/search/lib/plugins/oracleapplications.
Add the JRE 1.6 JDBC driver jar file name to the JDBC Driver Class parameter, as described in Table 8-11.
Add the JRE 1.6 JDBC driver jar file name to the classpath in MANIFEST.MF of appsjdbc.jar and DBCrawler.jar.
Restart the middle tier.

For a key attribute that is not named KEY:

When configuring the database connector, specify the column name in the Key Attribute Name parameter, as described in Table 8-11.
In the crawling query, use the key attribute name as the alias for the key value column name. In this example, ID was entered as the value of the Key Attribute Name parameter and is the alias for KEYVAL:
```
SELECT keyval id, content, url, lastmodifieddate, lang FROM sales_only
```

Query File XML Schema Definition

The following is the XSD that defines the format of the XML query file.

<!--[if !supportEmptyParas]-->XSD for the XML sub-queries file:<!--[endif]-->
<?xml version="1.0" encoding="windows-1252" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"  xmlns="http://xmlns.oracle.com/ses/sqlconnector/detail-attribute-queries" targetNamespace="http://xmlns.oracle.com/ses/sqlconnector/detail-attribute-queries"  elementFormDefault="qualified">
  <xsd:complexType name="sqlQueriesType">
    <xsd:annotation>
      <xsd:documentation>
        Specify detail and attribute queries as a source parameter for 
        each document fetched by the parent query.
      </xsd:documentation>
    </xsd:annotation>
    <xsd:sequence>
      <xsd:element name="attachmentQueries" maxOccurs="1" minOccurs="0">
        <xsd:annotation>
          <xsd:documentation>
            Specify detail queries to fetch detail records for each document
            represented by the parent record. The parent records, fetched by 
            the parent query, are specified as a source parameter. Each record 
            in the document (parent) query can be associated with several detail
            (child) records. Each of these child records has a single column
           specifying the content that will be indexed as attachment to the
           parent document. The child query should select a single column, and
           the WHERE clause should have bind variables of the form 
           ##PARENT ATTR##, where the value of PARENT ATTR from the parent
           record is substituted while executing the detail query. 
         </xsd:documentation> 
        </xsd:annotation>
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="query" maxOccurs="unbounded" minOccurs="1">
             <xsd:complexType>
                <!--Attribute to specify whether the contents retrieved by the
                  query is inline attachment or link to an attachment. The value
                  "true" specifies that the content is a link to an attachment
                  and "false" indicates inline attachment. Default value is
                  false.-->                                    
                <xsd:attribute name="link" default="false"/>
                <!--Content type of the attachment. If no value is specified, 
                  SES will auto-detect the content type.-->
                <xsd:attribute name="contenttype" default="null"/>
              </xsd:complexType>
            </xsd:element>
          </xsd:sequence>
        </xsd:complexType>
      </xsd:element>
      <xsd:element name="attributeQueries" maxOccurs="1" minOccurs="0">
        <xsd:annotation>
          <xsd:documentation>
            Specify queries to retrieve values of attributes of the parent
            document. Use this feature if the attribute can contain multiple
            values for a document. If the attribute is a single-valued 
            attribute, then it can be specified in the parent query. The WHERE
            clause should have bind variables of the form ##PARENT ATTR##,
            where the value of PARENT ATTR from the parent record is substituted
            while executing the query.
          </xsd:documentation>
        </xsd:annotation>
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="query" maxOccurs="unbounded" minOccurs="1"/>
          </xsd:sequence>
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>
  <xsd:element name="sqlQueries" type="sqlQueriesType"/>
</xsd:schema>

Creating Public Database Sources

Public database sources have no security implemented in Oracle SES.

To create a public database source:

Create a database source on the Home - Sources page. Select Database from the Source Type list, and click Create.
Enter the database source parameters as described in Table 8-11.
Click Next.
Set authorization to No Access Control List, and clear the authorization manager class name and jar name.
Click Create to create the database source.

Table 8-11 Database Source Parameters

Parameter	Value
Database Connection String	JDBC connection string for the database with content to be crawled. The JDBC string is driver-specific. For example, `jdbc:oracle:thin:@server:port:SID`
User ID	User ID to log in to the database specified in Database Connection String. This user ID must have access to the schema owning the view specified in View or the query specified in Query.
Password	Password to log in to the database specified in Database Connection String.
View	Table or view to be crawled. Specify either View or Query, not both.
JDBC Driver Class	JDBC driver class to connect to the database. For example, oracle.jdbc.driver.OracleDriver. Leave blank to use the default driver: Oracle Database: oracle.jdbc.driver.OracleDriver SQL Server: com.microsoft.sqlserver.jdbc.SQLServerDriver
Key Attribute Name	Name of the KEY attribute in the crawling query/view. The default value is KEY.
Document Count	Maximum number of documents to be crawled before indexing. Enter `-1` to crawl all documents before indexing.
Query File	Path to the XML file specifying the subqueries to crawl attachments and attributes of documents corresponding to every record in the main query. See "Query File XML Schema Definition".
Query	Query projecting the content to be crawled. Specify either View or Query, not both.
URL Prefix	String that precedes the content of the URL column and forms a display URL for the document.
Cache File	Prefix of a local file name in which the contents can be temporarily cached while crawling.
Path Separator	The character separating the tokens in the `PATH` of the document as returned by the query or view. It must be a single character, and it cannot be a space, a single or double quote, or a control character.
Parse Attributes	Enter `true` to extract the values of the attributes from the document content specified in the `SOLUTION` or `CONTENT` column. Enter `false` otherwise, or when the content is type text/html. In this example, attr1 and attr2 are extracted as attributes of the document with values 22 and 333 respectively: <attr1>22</attr1> <attr2>333</attr2> Content up to the first attribute is interpreted as the document content. The remaining portion is used to extract attributes only. In this example, only "page" is considered document content: page<attr1>22</attr1> is <attr2>333</attr2> dispersed
Remove Deleted Documents	Enter `true` to remove deleted documents from the index; otherwise, enter `false`.
Attachment Link Authentication Type	Standard Java authentication type used by the application serving the link in the `ATTACHMENT_LINK` column. Enter one of these values: `PUBLIC`: No authentication. `DIGEST`: Digest authentication `BASIC`: Basic authentication `NATIVE`: Native authentication in the source
Attachment Link User ID	User ID for accessing the links specified in the `ATTACHMENT_LINK` column. Required when the link targets are secure.
Attachment Link Password	Password for Attachment Link User ID.
Attachment Link Realm	Realm of the application serving the link in the `ATTACHMENT_LINK` column. Required when the link targets are secure.
Grant Security Attributes	Leave blank for public sources.
Deny Security Attributes	Leave blank for public sources.
JDBC Driver Class	JDBC driver class used to connect to the database. For example, oracle.jdbc.driver.OracleDriver.
Key Attribute Name	Name of the key column in the database source. The default value is KEY, as described in Table 8-9, "Database Source Required Columns".

Defining User-Defined Security for Database Sources

Some attributes in the view or query being crawled must be identified as security attributes. The values of these attributes determine if a user is authorized to view a document. These attributes can be either GRANT attributes or DENY attributes.

To create a database source with user-defined security:

On the Home - Sources page, select Database from the Source Type list and click Create.
Enter values for the parameters as described Table 8-11. Specify the GRANT and DENY attributes as values for parameters Grant Security Attributes and Deny Security Attributes respectively. If there are multiple GRANT or DENY security attributes, then separate attribute names with a space.
Click Next.
Enter values for the authorization plug-in parameters:
- Authorization Database Connection String: JDBC connection string for the authorization database. The values of the security attributes to which a given user is authorized are retrieved from this database. The JDBC string is driver-specific.
- User ID: User ID to login to the authorization database.
- Password: Password to login to the authorization database.
- Authorization Query: SQL query to retrieve the values of security attributes to which a given user is authorized. The SELECT clause of this query should have all the security attributes specified in Step 2 with identical names. This query can be of two types:
  - The query can return a single record for a given user. The value in each security attribute column should be a space-delimited list of values to which the user is authorized.
  - The query can return multiple records for a given user. The value in each security attribute column of every row of the result set of this query is interpreted as a single value.
    
    Specify a question mark (?) as the placeholder for the username in the query.
- Single Record Query: Enter true if the authorization query returns a single record for a given user.
- Authorization User ID Format: Format of the user ID to be used in the SQL query specified in Authorization Query. This format should be an authentication attribute of the active identity plug-in.
  
  For example, if Oracle SES is configured with the Oracle Internet Directory identity plug-in (which supports DN, nickname and e-mail address as authentication attributes), then this parameter can be specified as nickname. The nickname of the current user is then used in the SQL authorization query to build the security filter.
  
  If no value is specified for this parameter, then the user ID in the canonical form of the active identity plug-in is used in the authorization query to build the security filter.
Click Create to create the database source.

Database Search Attributes

Database sources have no predefined attributes. The crawler collects attributes from columns defined during source creation. You must map the columns to the search attributes.

Example of Creating a Database Source With User-Defined Security

The document set to be crawled is in tables T1 and T2 as specified by the following query:

SELECT 
     T1.ID, 
     T1.DESCRIPTION, 
     T2.NAME, 
     T1.LAST_UPDATE_DATE, 
     T2.AUTH_ID, T1.HIERARCHY
FROM 
     T1, T2 
WHERE 
     T1.ID = T2.DOC_ID

The document content is provided by the T1.DESCRIPTION column.

Each document has an HTTP access URL of the form http://my.company.com/docserver?doc_id=document_identifier.

The value of T2.AUTH_ID controls access to a document. For example, user SCOTT can access a document only if the value of T2.AUTH_ID for the document is in the list of AUTH_IDs for SCOTT as retrieved by the following query:

SELECT AUTH_ID FROM USER_AUTH A 
     WHERE A.USER='SCOTT'

This source can be crawled as a database source type with the following source parameter values:

Database Connection String: jdbc:oracle:thin:@example:7777:ses
User ID: apps_user
Password: ******
View:
Document Count: -1

Query:

SELECT 
     'docserver?doc_id='|| T1.ID URL, 
     T1.ID "KEY", 
     'en' LANG, 
     T1.LAST_UPDATE_DATE LASTMODIFIEDDATE, 
     T1.DESCRIPTION CONTENT, 
     'text/plain' CONTENTTYPE, 
     T2.NAME CUSTOMER_NAME, 
     T2.AUTH_ID, 
     T1.HIERARCHY PATH 
FROM 
     T1, T2 
WHERE 
     T1.ID=T2.DOC_ID

Query File:

<?xml version="1.0" encoding="UTF-8" ?>
<sqlQueries xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.oracle.com/ses/sqlconnector/detail-attribute-queries detail-attribute-queries.xsd"
 xmlns="http://xmlns.oracle.com/ses/sqlconnector/detail-attribute-queries">
   <attachmentQueries>
     <query>
       <![CDATA[SELECT COMMENTS FROM COMMENTS_TBL WHERE DOCID=##KEY##]]>
     </query>
     <query link="true" contenttype='text/html'>
            <![CDATA[SELECT NOTES FROM NOTES_LINK_TBL WHERE DOCID=##KEY##]]>
     </query>
  </attachmentQueries>
  <attributeQueries>
       <query>
         <![CDATA[SELECT AUTHOR FROM AUTHORS_TBL WHERE DOCID=##KEY##]]>
       </query>
       <query>
                <![CDATA[SELECT KEYWORD FROM KEYWORD_TBL WHERE DOCID=##KEY##]]>
       </query>
  </attributeQueries>
</sqlQueries>

AUTHID and KEY are columns in the select list of the parent query.

Note: This must be the path to a local file containing the subqueries for attributes and attachments that are currently listed directly for query file.

URL Prefix: http://my.example.com/
Cache File: /tmp/cacheFile
Path Separator: #
Parse Attributes: false
Grant Security Attributes: AUTH_ID
Deny Security Attributes:

The following are sample parameter values for authorization.

Database Connection String: jdbc:oracle:thin:@example:7777:ses
User ID: apps_user
Password: *****

Authorization Query:

SELECT AUTH_ID 
FROM USER_AUTH A 
WHERE A.USER=UPPER(?)

Single Record Query: false
Authorization User ID Format: username