Oracle® Secure Enterprise Search Administrator's Guide 11g Release 2 (11.2.2) Part Number E23427-01 |
|
|
PDF · Mobi · ePub |
With a database source, you can crawl any JDBC-enabled database. A database source can crawl database content projected as a view or query. Each record in the view or query result set is interpreted as a document. You can create public database sources or secure database sources.
The view or query to be crawled must contain the columns described in Table 8-9. All column names must be in upper case.
Table 8-9 Database Source Required Columns
Column | Type | Description |
---|---|---|
|
|
Document content. |
|
|
Key to identify the record in the record set. You can use a custom name for this column by modifying |
|
|
Document language in ISO 639-1 language code; for example, |
|
|
Last modified date of the document. If you do not have a column for the mandatory LastModifiedDate attribute, use a constant date value in the SQL query for the source. Use a format that the getTimestamp method of the corresponding JDBC driver accepts without errors. Incremental changes to records are not picked up by re-crawls, so always schedule a full crawl. |
|
|
Display URL for the document. The value for this column cannot be null. This connector requires that there is URL-based access to the records in the result set of the view or query. |
The view or query can contain the optional columns describe in Table 8-10. Any other column is considered an attribute of the document.
If the query or view contains both content and either an attachment or attachment link, then one column (in the following order) is considered document content:
ATTACHMENT_LINK
ATTACHMENT
CONTENT
Even if the ATTACHMENT_LINK
or ATTACHMENT
column is specified in the query, you should include the mandatory CONTENT
column. However, the content of ATTACHMENT_LINK
or ATTACHMENT
is indexed as document content.
Table 8-10 Database Source Optional Columns
Column | Type | Description |
---|---|---|
|
|
Binary attachments for the document. |
|
|
A link to the attachment for the document. |
|
|
Content type of the document; for example, "text/html" for HTML documents, "application/pdf" for PDF documents, or "application/msword" for Microsoft Word documents. Leave blank when the content type is unknown or varied so that is it not feasible to specify the content type for each document individually. |
|
|
Path to the document. It is used in the browse feature. It can represent the organizational hierarchy of the document. For example, level1#level2#level3. |
|
|
Title of the document to be displayed in the Oracle SES search result page. |
|
|
Specifies the time zone for the date specified in |
Depending on your database source, you may need to configure the JDBC driver.
To crawl any third-party database:
Download the appropriate JDBC driver jar for JRE 1.6 into ORACLE_HOME/search/lib/plugins/oracleapplications
.
Add the JRE 1.6 JDBC driver jar file name to the JDBC Driver Class parameter, as described in Table 8-11.
Add the JRE 1.6 JDBC driver jar file name to the classpath in MANIFEST.MF of appsjdbc.jar and DBCrawler.jar.
Restart the middle tier.
For a key attribute that is not named KEY:
When configuring the database connector, specify the column name in the Key Attribute Name parameter, as described in Table 8-11.
In the crawling query, use the key attribute name as the alias for the key value column name. In this example, ID was entered as the value of the Key Attribute Name parameter and is the alias for KEYVAL:
SELECT keyval id, content, url, lastmodifieddate, lang FROM sales_only
The following is the XSD that defines the format of the XML query file.
<!--[if !supportEmptyParas]-->XSD for the XML sub-queries file:<!--[endif]--> <?xml version="1.0" encoding="windows-1252" ?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://xmlns.oracle.com/ses/sqlconnector/detail-attribute-queries" targetNamespace="http://xmlns.oracle.com/ses/sqlconnector/detail-attribute-queries" elementFormDefault="qualified"> <xsd:complexType name="sqlQueriesType"> <xsd:annotation> <xsd:documentation> Specify detail and attribute queries as a source parameter for each document fetched by the parent query. </xsd:documentation> </xsd:annotation> <xsd:sequence> <xsd:element name="attachmentQueries" maxOccurs="1" minOccurs="0"> <xsd:annotation> <xsd:documentation> Specify detail queries to fetch detail records for each document represented by the parent record. The parent records, fetched by the parent query, are specified as a source parameter. Each record in the document (parent) query can be associated with several detail (child) records. Each of these child records has a single column specifying the content that will be indexed as attachment to the parent document. The child query should select a single column, and the WHERE clause should have bind variables of the form ##PARENT ATTR##, where the value of PARENT ATTR from the parent record is substituted while executing the detail query. </xsd:documentation> </xsd:annotation> <xsd:complexType> <xsd:sequence> <xsd:element name="query" maxOccurs="unbounded" minOccurs="1"> <xsd:complexType> <!--Attribute to specify whether the contents retrieved by the query is inline attachment or link to an attachment. The value "true" specifies that the content is a link to an attachment and "false" indicates inline attachment. Default value is false.--> <xsd:attribute name="link" default="false"/> <!--Content type of the attachment. If no value is specified, SES will auto-detect the content type.--> <xsd:attribute name="contenttype" default="null"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="attributeQueries" maxOccurs="1" minOccurs="0"> <xsd:annotation> <xsd:documentation> Specify queries to retrieve values of attributes of the parent document. Use this feature if the attribute can contain multiple values for a document. If the attribute is a single-valued attribute, then it can be specified in the parent query. The WHERE clause should have bind variables of the form ##PARENT ATTR##, where the value of PARENT ATTR from the parent record is substituted while executing the query. </xsd:documentation> </xsd:annotation> <xsd:complexType> <xsd:sequence> <xsd:element name="query" maxOccurs="unbounded" minOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> <xsd:element name="sqlQueries" type="sqlQueriesType"/> </xsd:schema>
Public database sources have no security implemented in Oracle SES.
To create a public database source:
Create a database source on the Home - Sources page. Select Database from the Source Type list, and click Create.
Enter the database source parameters as described in Table 8-11.
Click Next.
Set authorization to No Access Control List, and clear the authorization manager class name and jar name.
Click Create to create the database source.
Table 8-11 Database Source Parameters
Parameter | Value |
---|---|
Database Connection String |
JDBC connection string for the database with content to be crawled. The JDBC string is driver-specific. For example, |
User ID |
User ID to log in to the database specified in Database Connection String. This user ID must have access to the schema owning the view specified in View or the query specified in Query. |
Password |
Password to log in to the database specified in Database Connection String. |
View |
Table or view to be crawled. Specify either View or Query, not both. |
JDBC Driver Class |
JDBC driver class to connect to the database. For example, oracle.jdbc.driver.OracleDriver. Leave blank to use the default driver:
|
Key Attribute Name |
Name of the KEY attribute in the crawling query/view. The default value is KEY. |
Document Count |
Maximum number of documents to be crawled before indexing. Enter |
Query File |
Path to the XML file specifying the subqueries to crawl attachments and attributes of documents corresponding to every record in the main query. See "Query File XML Schema Definition". |
Query |
Query projecting the content to be crawled. Specify either View or Query, not both. |
URL Prefix |
String that precedes the content of the URL column and forms a display URL for the document. |
Cache File |
Prefix of a local file name in which the contents can be temporarily cached while crawling. |
Path Separator |
The character separating the tokens in the |
Parse Attributes |
Enter In this example, attr1 and attr2 are extracted as attributes of the document with values 22 and 333 respectively: <attr1>22</attr1> <attr2>333</attr2> Content up to the first attribute is interpreted as the document content. The remaining portion is used to extract attributes only. In this example, only "page" is considered document content: page<attr1>22</attr1> is <attr2>333</attr2> dispersed |
Remove Deleted Documents |
Enter |
Attachment Link Authentication Type |
Standard Java authentication type used by the application serving the link in the
|
Attachment Link User ID |
User ID for accessing the links specified in the |
Attachment Link Password |
Password for Attachment Link User ID. |
Attachment Link Realm |
Realm of the application serving the link in the |
Grant Security Attributes |
Leave blank for public sources. |
Deny Security Attributes |
Leave blank for public sources. |
JDBC Driver Class |
JDBC driver class used to connect to the database. For example, oracle.jdbc.driver.OracleDriver. |
Key Attribute Name |
Name of the key column in the database source. The default value is KEY, as described in Table 8-9, "Database Source Required Columns". |
Some attributes in the view or query being crawled must be identified as security attributes. The values of these attributes determine if a user is authorized to view a document. These attributes can be either GRANT
attributes or DENY
attributes.
To create a database source with user-defined security:
On the Home - Sources page, select Database from the Source Type list and click Create.
Enter values for the parameters as described Table 8-11. Specify the GRANT
and DENY
attributes as values for parameters Grant Security Attributes and Deny Security Attributes respectively. If there are multiple GRANT
or DENY
security attributes, then separate attribute names with a space.
Click Next.
Enter values for the authorization plug-in parameters:
Authorization Database Connection String: JDBC connection string for the authorization database. The values of the security attributes to which a given user is authorized are retrieved from this database. The JDBC string is driver-specific.
User ID: User ID to login to the authorization database.
Password: Password to login to the authorization database.
Authorization Query: SQL query to retrieve the values of security attributes to which a given user is authorized. The SELECT
clause of this query should have all the security attributes specified in Step 2 with identical names. This query can be of two types:
The query can return a single record for a given user. The value in each security attribute column should be a space-delimited list of values to which the user is authorized.
The query can return multiple records for a given user. The value in each security attribute column of every row of the result set of this query is interpreted as a single value.
Specify a question mark (?
) as the placeholder for the username in the query.
Single Record Query: Enter true
if the authorization query returns a single record for a given user.
Authorization User ID Format: Format of the user ID to be used in the SQL query specified in Authorization Query. This format should be an authentication attribute of the active identity plug-in.
For example, if Oracle SES is configured with the Oracle Internet Directory identity plug-in (which supports DN, nickname and e-mail address as authentication attributes), then this parameter can be specified as nickname. The nickname of the current user is then used in the SQL authorization query to build the security filter.
If no value is specified for this parameter, then the user ID in the canonical form of the active identity plug-in is used in the authorization query to build the security filter.
Click Create to create the database source.
Database sources have no predefined attributes. The crawler collects attributes from columns defined during source creation. You must map the columns to the search attributes.
The document set to be crawled is in tables T1
and T2
as specified by the following query:
SELECT T1.ID, T1.DESCRIPTION, T2.NAME, T1.LAST_UPDATE_DATE, T2.AUTH_ID, T1.HIERARCHY FROM T1, T2 WHERE T1.ID = T2.DOC_ID
The document content is provided by the T1.DESCRIPTION
column.
Each document has an HTTP access URL of the form http://my.company.com/docserver?doc_id=
document_identifier
.
The value of T2.AUTH_ID
controls access to a document. For example, user SCOTT can access a document only if the value of T2.AUTH_ID
for the document is in the list of AUTH_IDs for SCOTT as retrieved by the following query:
SELECT AUTH_ID FROM USER_AUTH A WHERE A.USER='SCOTT'
This source can be crawled as a database source type with the following source parameter values:
Database Connection String: jdbc:oracle:thin:@example:7777:ses
User ID: apps_user
Password: ******
View:
Document Count: -1
Query:
SELECT 'docserver?doc_id='|| T1.ID URL, T1.ID "KEY", 'en' LANG, T1.LAST_UPDATE_DATE LASTMODIFIEDDATE, T1.DESCRIPTION CONTENT, 'text/plain' CONTENTTYPE, T2.NAME CUSTOMER_NAME, T2.AUTH_ID, T1.HIERARCHY PATH FROM T1, T2 WHERE T1.ID=T2.DOC_ID
Query File:
<?xml version="1.0" encoding="UTF-8" ?> <sqlQueries xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.oracle.com/ses/sqlconnector/detail-attribute-queries detail-attribute-queries.xsd" xmlns="http://xmlns.oracle.com/ses/sqlconnector/detail-attribute-queries"> <attachmentQueries> <query> <![CDATA[SELECT COMMENTS FROM COMMENTS_TBL WHERE DOCID=##KEY##]]> </query> <query link="true" contenttype='text/html'> <![CDATA[SELECT NOTES FROM NOTES_LINK_TBL WHERE DOCID=##KEY##]]> </query> </attachmentQueries> <attributeQueries> <query> <![CDATA[SELECT AUTHOR FROM AUTHORS_TBL WHERE DOCID=##KEY##]]> </query> <query> <![CDATA[SELECT KEYWORD FROM KEYWORD_TBL WHERE DOCID=##KEY##]]> </query> </attributeQueries> </sqlQueries>
AUTHID
and KEY
are columns in the select list of the parent query.
Note: This must be the path to a local file containing the subqueries for attributes and attachments that are currently listed directly for query file.
URL Prefix: http://my.example.com/
Cache File: /tmp/cacheFile
Path Separator: #
Parse Attributes: false
Grant Security Attributes: AUTH_ID
Deny Security Attributes:
The following are sample parameter values for authorization.
Database Connection String: jdbc:oracle:thin:@example:7777:ses
User ID: apps_user
Password: *****
Authorization Query:
SELECT AUTH_ID FROM USER_AUTH A WHERE A.USER=UPPER(?)
Single Record Query: false
Authorization User ID Format: username