Scoring with ESA

A typical feature extraction application of Explicit Semantic Analysis (ESA) is to identify the most relevant features of a given input and score their relevance. Scoring an ESA model produces data projections in the concept feature space.

If an ESA model is built from an arbitrary collection of documents, then each one is treated as a feature. You can then identify the most relevant documents in the collection. The feature extraction functions are: FEATURE_DETAILS, FEATURE_ID, FEATURE_SET, FEATURE_VALUE, and FEATURE_COMPARE. The same functions are utilized in the implementation of ESA embeddings, but the space of the features is different. The names of features for ESA embeddings are successive integers starting with 1. The output of FEATURE_ID is numeric. Feature IDs in the output of FEATURE_SET and FEATURE_DETAILS are also numeric.

A typical classification application of ESA is to predict classes of a given document and estimate the probabilities of the predictions. As a classification algorithm, ESA implements the following scoring functions: PREDICTION, PREDICTION_PROBABILITY, PREDICTION_SET, PREDICTION_DETAILS, PREDICTION_COST.