Apply a Similarity Analysis Model to Your Data

Use a data flow to apply a vector embedding model to a dataset to perform similarity analysis, which identifies records that are similar to a given record.

Before you start, make sure that you have the prerequisites for performing this type of analysis. See Prerequisites to Performing Similarity Analysis in Oracle Analytics.
  1. On the Home page, click Create, and then click Data Flow.
  2. In Add Data, select the dataset that contains the data you want to analyze, then click Add.
    Your dataset must be based on Oracle Database or Oracle Autonomous Data Warehouse.
  3. In the list of columns at the right-hand side, select the columns that you want to analyze. You must include a column with a unique ID.

  4. Click Add a step next to your data step, then click Similarity Analysis.

  5. Select a model to use, then click OK.

  6. Expand the Outputs section, and select Profile_expression.
    This adds a concatenated output column of all your selected data.
  7. Expand the Parameters section.

  8. Configure the parameters:
    • Source - Click Select a value, then select the data column and value to uniquely identifies the record you want to compare with others in the dataset. For example, you might specify ID and select a patient with the ID "100002".
    • Top (closest) or Bottom (furthest) - Select "Top" to find the most similar records, or "Bottom" to find the least similar records.
    • Number of Results - Specify the number of matching records to return. For example, select "100" to find the top 100 matching records nearest to your target record.
    • (Optional) Reference Column1, 2, and 3 - Specify a column or combination of columns that uniquely identify the records with which you're comparing the Source value. For example, for medical patients, you might select "ID", "Age", and "Medication". You're not required to make a selection in reference columns.
    • Include Reference Columns for Profiling - Choose No to exclude the reference columns specified in Reference Column1, 2, and 3 from the profiling, or Yes to include them in the profiling.
  9. Click + next to the Similarity Analysis node in the diagram, then click Add step, and select Save Data.
  10. Configure the Save Data step:

    • Dataset - Change the default value to a more meaningful name. For example, "Similarity Analysis Top 10".
    • Table - Don't change the default value. Oracle Analytics creates a new value when the data flow runs based on the Dataset name specified.
    • (Optional) Default Aggregation - Change the default aggregation. For example, you might change it to Average.
  11. Click Save Model, and specify the name of the generated prediction model.
  12. Click Save and specify a name for the data flow.
  13. Click Run to analyze the data and generate a predictive model.
You can locate the dataset that Oracle Analytics generates on the Dataset tab on the Data page. See Interpreting Results from a Similarity Analysis Model.