Bookshelf v7.5: Using the Data Mining Adapter

Siebel Analytics Server Administration Guide > Using XML as a Data Source > Using the Siebel Analytics Server XML Gateway >

Using the Data Mining Adapter

The Data Mining Adapter is an extension of the Siebel Analytics Server XML Gateway. It allows you to selectively access external data sources by calling an executable file or DLL API for each record retrieved.

The Data Mining Adapter can only be used for a table in a logical join with another table acting as the driving table. The table with the Data Mining Adapter receives parameterized queries from a driving table through some logical joins. The table with the Data Mining Adapter is not a table that physically exists in a back-end database. Instead, the adapter uses the column values in the WHERE clauses of the parameterized queries as its input column parameters, and generates values for those columns (the output columns) not in the WHERE clauses. For information about how to set up the logical joins, see Specifying a Driving Table.

The Data Mining Adapter has two modes of operation:

In Process. The Data Mining Adapter allows you to specify a DLL, a shared object, or a shared library that implements the Data Mining Adapter API. At run time, the adapter loads the DLL and calls the API, which retrieves records one row at a time. The query results are returned to the XML gateway through an API parameter.

Out of Process. The Data Mining Adapter allows you to specify an executable file. At run time, the adapter executes the file and retrieves records from it one row at a time. You also specify the delimiters that demarcate the column values in the output file.

You specify one executable file or DLL for each table.

The In-Process Data Mining Adapter API

The API currently consists of only one function. It takes in the values of the input columns in the parameterized queries, plus the meta information of both the input and the output columns. On return, the API places the values of the output columns in the outputColumnValueBuffer. All buffers are allocated by the caller.

Refer to the file IterativeGatewayDll.h for the definition of the data type and structures used in this API.

extern "C" ITERATIVEGATEWAYDLL_API SiebelAnalyticIterativeExecutionStatus(
   /* [in] */   const  wchar_t *                   modelId

   /* [in] */  const  int                           inputColumnCount
   /* [in] */  const  SiebelAnalyticColumnMetaInfo  pInputColumnMetaInfoArray
   /* [in] */  const   uint8                        inputColumnValueBuffer

   /* [in] */  const  int                            OutputColumnCount, //actual count of columns returned
   /* [in/out] */  SiebelAnalyticColumnMetaInfo    pOutputColumnMetaInfoArray
   /* [out] */     uint8                           outputColumnValueBuffer

Table 27.  API Elements

Element

Description

modelId

An optional argument that you can specify in the Search Utility field in the XML tab of the Physical Table dialog box.

inputColumnCount

The number of input columns.

pInputColumnMetaInfoArray

An array of meta information for the input columns. SiebelAnalyticColumnMetaInfo is declared in the public header file IterativeGatewayDll.h, which is included with Siebel Analytics.

inputColumnValueBuffer

A buffer of bytes containing the value of the input columns. The actual size of each column value is specified in the columnWidth field of the SiebelAnalyticColumnMetaInfo. The column values are placed in the buffer in the order in which the columns appear in the pInputColumnMetaInfoArray.

OutputColumnCount

The number of output columns.

pOutputColumnMetaInfoArray

An array of meta column information for the output column. SiebelAnalyticColumnMetaInfo is declared in the public header file IterativeGatewayDll.h, which is included with Siebel Analytics. The caller of the API provides the column name, and the callee sets the data type of the column (currently only VarCharData is supported) and the size of the column value.

outputColumnValueBuffer

A buffer of bytes containing the value of the output columns. The actual size of each column value is specified in the columnWidth field of the SiebelAnalyticColumnMetaInfo. The column values must be placed in the buffer in the order in which the columns appear in the pOutputColumnMetaInfoArray.

Table 27 provides a description of the API elements.

Sample Implementation

A sample implementation of the Data Mining Adapter API is provided for all supported platforms in the Sample subdirectory of the Siebel Analytics installation folder. The following files are included in the example:

hpacc.mak (a HPUX make file for building the sample)

IterativeGatewayDll.h (a header file to be included in your DLL)

ReadMe.txt (a text file that describes the Data Mining Adapter API)

StdAfx.cpp (a Windows-specific file)

StdAfx.h (a Windows-specific header file)

sunpro.mak (a Solaris make file for building the sample)

TestExternalGatewayDll.cpp (the sample implementation of the DLL)

TestExternalGatewayDll.dsp (a Microsoft Visual C++ project file for building the sample)

TestLibraryUnix.cpp (a test drive that load up the DLL on the Unix platforms)

xlC50.mak (an AIX make file for building the sample)

Using ValueOf() Expressions

You can use ValueOf() expressions in the command line arguments to pass any additional parameters to the executable file or DLL API.

The following example shows how to pass a user ID and password to an executable file:

executable_name valueof(USERID) valueof(PASSWORD)

Specifying Column Values

When you use the out-of-process mode, that is, when you specify an executable file, you can pass in the column values to the executable file by bracketing the column names with the $() marker.

For example, suppose there is a table containing the columns Car_Loan, Credit, Demand, Score, and Probability. The values of the input columns Car_Loan, Credit, and Demand come from other tables through join relationships. The values of the output columns Score and Probability are to be returned by the executable file. The command line would look like the following:

executable_name $(Car_Loan) $(Credit) $(Demand)

Each time the executable file is called, it returns one row of column values. The column values are output in a single-line demarcated by the delimiter that you specify.

By default, the executable is expected to output to the stdout. Alternatively, you can direct the Data Mining Adapter to read the output from a temporary output file passed to the executable as an argument by specifying a placeholder, $(NQ_OUT_TEMP_FILE) to which the executable outputs the result line. When the Data Mining Adapter invokes the executable, the placeholder $(NQ_OUT_TEMP_FILE) is substituted by a temporary filename generated at runtime. This is demonstrated in the following example:

executable_name $(Car_Loan) $(Credit) $(Demand) $(NQ_OUT_TEMP_FILE)

The values of the columns that are not inputs to the executable file will be output first, in the unsorted order in which they appear in the physical table. In the preceding example, the value of the Score column will be followed by the value of the Probability column.

If the executable file outputs more column values than the number of noninput columns, the Data Mining Adapter will attempt to read the column values according to the unsorted column order of the physical table. If these are in conflict with the values of the corresponding input columns, the values returned from the executable file will be used to override the input columns.

The data length of each column in the delimited query output must not exceed the size specified for that column in the physical table.

Configuring the Data Mining Adapter

Use this procedure to configure the Data Mining Adapter.

To configure the Data Mining Adapter

Using the Siebel Analytics Server Administration Tool, create a database and select XML Server as the database type.

For information about creating a database, see Creating or Editing a Database Object.

Configure the connection pool:

Right-click the database you created in Step 1, and then select New Object > Connection Pool.

Enter a name for the connection pool.

Select XML as the call interface.

Enter a data source name, and then click OK.

NOTE: Do not enter information into any field in the XML tab of the Connection Pool dialog box. The empty fields indicate to the Siebel Analytics Server that the Data Mining Adapter functionality will be invoked.

Right-click the database you created in Step 1, and then select New Object > Table.

The Physical Table dialog box appears. In the XML tab you specify which mode you want to use: in process or out of process. For information about these modes, see Using the Data Mining Adapter.

In the XML tab of the Physical Table dialog box, do one of the following:

Select the Executable radio button, enter the path to the executable file in the Search Utility field, and specify the delimiter for the output values.

Select the DLL radio button and enter the path to the DLL in the Search Utility field.

To include spaces in the path, enclose the path in quotation marks. For example:

"D:\SiebelAnalytics\Bin\Data Mining DLL\ExternalGatewayDLL.dll"

All characters appearing after the DLL path are passed down to the API as a modelid string. You can use the modelid string to pass static or dynamic parameters to the DLL through the API. For example:

"D:\SiebelAnalytics\Bin\Data Mining DLL\ExternalGatewayDLL.dll VALUEOF(Model1) VALUEOF(Model2)"

Table 27. API Elements
Element	Description
modelId	An optional argument that you can specify in the Search Utility field in the XML tab of the Physical Table dialog box.
inputColumnCount	The number of input columns.
pInputColumnMetaInfoArray	An array of meta information for the input columns. SiebelAnalyticColumnMetaInfo is declared in the public header file IterativeGatewayDll.h, which is included with Siebel Analytics.
inputColumnValueBuffer	A buffer of bytes containing the value of the input columns. The actual size of each column value is specified in the columnWidth field of the SiebelAnalyticColumnMetaInfo. The column values are placed in the buffer in the order in which the columns appear in the pInputColumnMetaInfoArray.
OutputColumnCount	The number of output columns.
pOutputColumnMetaInfoArray	An array of meta column information for the output column. SiebelAnalyticColumnMetaInfo is declared in the public header file IterativeGatewayDll.h, which is included with Siebel Analytics. The caller of the API provides the column name, and the callee sets the data type of the column (currently only VarCharData is supported) and the size of the column value.
outputColumnValueBuffer	A buffer of bytes containing the value of the output columns. The actual size of each column value is specified in the columnWidth field of the SiebelAnalyticColumnMetaInfo. The column values must be placed in the buffer in the order in which the columns appear in the pOutputColumnMetaInfoArray.

Siebel Analytics Server Administration Guide
Published: 23 June 2003