Apply a Similarity Analysis Model to Your Data

Use a data flow to apply a similarity analysis to your data, which enables you to identify records that are similar to a given record.

Before you start, make sure that you have the prerequisites for performing this type of analysis. See Prerequisites to Performing Similarity Analysis in Oracle Analytics.

On your home page, click Create, and then click Data Flow.
In Add Data, select a dataset, then click Add.

You must use an Oracle Database or Oracle Autonomous Data Warehouse V23ai or later.
In the list of columns, deselect the columns that you don't want to analyze (they're all selected by default). You must include a column with a unique ID. We recommend selecting between 10 and 15 columns. Selecting more than 15 columns can adversely affect performance.

Description of the illustration similarity-analysis-select-columns-analyze.png
Hover over the dataset node and select Add a step, then click Similarity Analysis.

Description of the illustration vector-embed-choose-sa-step.png
Select a model to use, then click OK.

Description of the illustration similarity-analysis-select-model.png
Expand the Outputs section, and select Profile_expression.
This adds a concatenated output column of all your selected data.
Expand the Parameters section.

Description of the illustration vector-embed-configure-sa-step.png
Configure the parameters:
- Source - Click Select a value, then select the data column and value to uniquely identifies the record you want to compare with others in the dataset. For example, you might specify ID and select a patient with the ID "100002".
- Top (closest) or Bottom (furthest) - Select "Top" to find the most similar records, or "Bottom" to find the least similar records.
- Number of Results - Specify the number of matching records to return. For example, select "100" to find the top 100 matching records nearest to your target record.
- (Optional) Reference Column1, 2, and 3 - Specify a column or combination of columns that uniquely identify the records with which you're comparing the Source value. For example, for medical patients, you might select "ID", "Age", and "Medication". You're not required to make a selection in reference columns.
- Include Reference Columns for Profiling - Choose No to exclude the reference columns specified in Reference Column1, 2, and 3 from the profiling, or Yes to include them in the profiling.
Click + next to the Similarity Analysis node in the diagram, then click Add step, and select Save Data.
Configure the Save Data step:

Description of the illustration similarity-analysis-step.png
- Dataset - Change the default value to a more meaningful name. For example, "Similarity Analysis Top 10".
- Table - Don't change the default value. Oracle Analytics creates a new value when the data flow runs based on the Dataset name specified.
- (Optional) Default Aggregation - Change the default aggregation. For example, you might change it to Average.
Click Save Model, and specify the name of the generated prediction model.
Click Save and specify a name for the data flow.
Click Run to analyze the data and generate a predictive model.

You can locate the dataset that Oracle Analytics generates on the Dataset tab on the Data page. See Interpreting Results from a Similarity Analysis Model.