Vectorize Relational Tables Using OML Feature Extraction Algorithms
This example shows you how to use OML's Feature Extraction algorithms in conjunction with
        the VECTOR_EMBEDDING() operator to vectorize sets of relational data, build
        similarity indexes, and perform similarity searches on the resulting vectors.
                  
Feature Extraction algorithms help in extracting the most informative features/columns from the data and aim to reduce the dimensionality of large data sets by identifying the principal components that capture the most variance in the data. This reduction simplifies the data set while retaining the most important information, making it easier to analyze correlations and redundancies in the data.
The Principal Component Analysis (PCA) algorithm, a widely used dimensionality reduction technique in machine learning, is used in this tutorial.
Note:
This example uses customer bank marketing data available at https://archive.ics.uci.edu/dataset/222/bank+marketing.
The relational data table includes a mix of numeric and categorical columns. It has more than 4000 records.
SELECT column_name, data_type
FROM user_tab_columns
WHERE table_name = 'BANK'
ORDER BY data_type, column_name;Output:
COLUMN_NAME          DATA_TYPE
-------------------- --------------------
AGE                  NUMBER
CAMPAIGN             NUMBER
CONS_CONF_IDX        NUMBER
CONS_PRICE_IDX       NUMBER
DURATION             NUMBER
EMP_VAR_RATE         NUMBER
EURIBOR3M            NUMBER
ID                   NUMBER
NR_EMPLOYED          NUMBER
PDAYS                NUMBER
PREVIOUS             NUMBER
CONTACT              VARCHAR2
CREDIT_DEFAULT       VARCHAR2
DAY_OF_WEEK          VARCHAR2
EDUCATION            VARCHAR2
HOUSING              VARCHAR2
JOB                  VARCHAR2
LOAN                 VARCHAR2
MARITAL              VARCHAR2
MONTH                VARCHAR2
POUTCOME             VARCHAR2
Y                    VARCHAR2To perform a similarity search, you need to vectorize the relational data. To do so, you can first use the OML Feature Extraction algorithm to project the data onto a more compact numeric space. In this example, you configure the SVD algorithm to perform a Principal Component Analysis (PCA) projection of the original data table. The number of features/columns (5 in this case) is specified in the setting table. The input number determines the number of principal features or columns that will be retained after the dimensionality reduction process. Each of these columns represent a direction in the feature space along which the data varies the most.
Because you need to use the DBMS_DATA_MINING package to
        create the model, you need the CREATE MINING MODEL privilege in addition to
        the other privileges relevant to vector indexes and similarity search. For more information
        about the CREATE MINING MODEL privilege, see Oracle Machine Learning
                                        for SQL User’s Guide.
                  
Parent topic: Generate Embeddings