Performing Similarity Search

44 Performing Similarity Search

To perform a similarity search against vectors stored in Coherence, you can use the SimilaritySearch aggregator. The easiest way to construct one is by using the Aggregators.similaritySearch factory method.

When constructing the SimilaritySearch aggregator, you need to specify three arguments:

A ValueExtractor that should be used to retrieve the vector attribute from the map entries.
The search vector with which to compare the extracted values.
The maximum number of the results to return.

For example, to search the map containing Book objects, and return up to ten most similar books, you would create a SimilaritySearch aggregator instance like this:

var searchVector = createEmbedding(searchQuery);  // outside of Coherence
var search       = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 10);

By default, the aggregator will use cosine distance to calculate distance between vectors, but you can change that by calling the fluent algorithm method on the created aggregator instance and passing an instance of a different DistanceAlgorithm implementation:

var search = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 10)
                        .algorithm(new L2SquaredDistance());

Out of the box, Coherence provides CosineDistance, L2SquaredDistance, and InnerProductDistance implementations, but you can easily add support for additional algorithms by implementing the DistanceAlgorithm interface yourself.

After you have an instance of a SimilaritySearch aggregator, you can perform a similarity search by calling the NamedMap.aggregate method like you normally would:

NamedMap<String, Book>          books   = session.getMap("books");
List<QueryResult<String, Book>> results = books.aggregate(search);

The result of the search is a list of up to the maximum specified QueryResult objects (10, in the previous example), which contain entry key, value, and calculated distance between the search vector and a vector extracted from the specified entry. The results are sorted by distance, in ascending order, from closest to farthest.

Brute-Force Search

By default, if no index is defined for the vector attribute, Coherence will perform a brute-force search by deserializing every entry, extracting the vector attribute from it, and performing a distance calculation between the extracted vector and the search vector using the specified distance algorithm.

This is fine for small or medium-sized data sets, because Coherence will still perform the search in parallel across cluster members and aggregate the results, but can be very inefficient as the data sets get larger and larger, in which case using one of the supported index types (described in later sections) is recommended.

However, even when using indexes, it may be beneficial to run the same query using brute force, in order to test recall by comparing the results returned by the (approximate) index-based search and the (exact) brute-force search.

To accomplish that, you can configure the SimilaritySearch aggregator to ignore any configured index and to perform a brute-force search anyway, by calling the bruteForce method on the aggregator instance:

var search = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 10)
                        .bruteForce();

Indexed Brute-Force Search

It is possible to improve the performance of a brute-force search by creating a forward-only index on the vector attribute using DeserializationAccelerator:

NamedMap<String, Book> books = session.getMap("books");
books.addIndex(new DeserializationAccelerator(Book::getSummaryEmbedding));

This will avoid repeated deserialization of Book values when performing a brute-force search, at the cost of additional memory consumed by the indexed vector instances.

The search will still perform the exact distance calculation, so the results will be exact, just like with the non-indexed brute-force search.

Index-Based Search

While brute-force searches work fine with small data sets, as the data set gets larger, it is highly recommended that you create a vector index for a vector property.

Out of the box, Coherence supports two vector index types: HNSW index and Binary Quantization index.

HNSW Index

HNSW index performs approximate vector searches using Hierarchical Navigable Small World graphs, as described by Malkov and Yashunin.

Coherence uses an embedded native implementation of hnswlib for the HNSW index implementation. So, to use the HNSW index, you need to add a dependency on the coherence-hnsw module, which contains all Java code and pre-built native libraries for Linux (ARM and x86), Mac (ARM and x86), and Windows (x86 only) that you need:

<dependency>
  <groupId>${coherence.groupId}</groupId>
  <artifactId>coherence-hnsw</artifactId>
  <version>${coherence.version}</version>
</dependency>

After you add the dependency, creating the HNSW index is as simple as:

NamedMap<String, Book> books = session.getMap("books");
books.addIndex(new HnswIndex<>(Book::getSummaryEmbedding, 768));

The first argument to the HnswIndex constructor is the extractor for the vector attribute to index, and the second is the number of dimensions each indexed vector will have (which must be identical), which will allow the native index implementation to pre-allocate the memory required for the index.

By default, HnswIndex will use cosine distance to calculate vector distances, but this can be overridden by specifying the spaceName argument in a constructor:

NamedMap<String, Book> books = session.getMap("books");
books.addIndex(new HnswIndex<>(Book::getSummaryEmbedding, "L2", 768));

The valid values for space name are COSINE, L2, and IP (inner product).

HnswIndex also provides a number of options that you can use to fine-tune its behavior, which you can specify using the fluent API:

var hnsw = new HnswIndex<>(Book::getSummaryEmbedding, 768)
                    .setEfConstr(200)
                    .setEfSearch(50)
                    .setM(16)
                    .setRandomSeed(100);
books.addIndex(hnsw);

The algorithm parameters in the preceding example are described in more detail in the hnswlib documentation.

You can also specify the maximum index size by calling the setMaxElements method. By default, the index will be created with a maximum size of 4,096 elements, and will be resized as necessary to accommodate data set growth. However, the resize operation is somewhat costly and can be avoided if you know ahead of time how many entries will be stored in the Coherence map on which you are creating the index, in which case you should configure the index size accordingly.

Note:

Remember that Coherence partitions indexes, so there will be as many instances of HNSW index as there are partitions.

This means that the ideal maxElements settings is just a bit over mapSize / partitionCount, and not the actual map size, which would be way too big.

After you have HNSW index created and configured, you can simply perform searches the same way as you did earlier using brute-force search. If one is available, then Coherence will automatically detect and use HNSW index.

Binary Quantization

Coherence also supports the Binary Quantization-based index, which provides significant space savings (32x) compared to vector indexes that use float32 vectors, such as HNSW. It does this by converting each 32-bit float in the original vector into either 0 or 1, and representing it using a single bit in a BitSet.

The downside is that the recall may not be as accurate, especially with smaller vectors, but that can be largely addressed by oversampling and re-scoring the results, which Coherence automatically performs.

BinaryQuantIndex is implemented in pure Java, and is a part of the main Coherence distribution, so it requires no additional dependencies. To create it, simply call the NamedMap.addIndex method:

NamedMap<String, Book> books = session.getMap("books");
books.addIndex(new BinaryQuantIndex<>(Book::getSummaryEmbedding));

The only option you can specify is the oversamplingFactor, which is the multiplier for the maximum number of results to return, and is 3 by default, meaning that if your search aggregator is configured to return 10 results, binary quantization search will initially return 30 results based on the Hamming distance between the binary representation of the search vector and index vectors, re-score all 30 results using exact distance calculation, and then re-order and return the top 10 results based on the calculated exact distance.

To change oversamplingFactor, you can specify it using the fluent API when creating an index:

NamedMap<String, Book> books = session.getMap("books");
books.addIndex(new BinaryQuantIndex<>(Book::getSummaryEmbedding).oversamplingFactor(5));

In the preceding example, this causes the SimilaritySearch aggregator to return and re-score 50 results initially, instead of 30.

Just like with the HNSW index, after you have a Binary Quantization index created and configured, you can simply perform searches the same way as you did previously, using a brute-force search. Coherence will automatically detect it and use it.

Metadata Filtering

In addition to a vector-based similarity search, you can use standard Coherence filters to perform metadata-based filtering of the results. For example, if you wanted to search books by a specific author only, you could specify a metadata filter that the SimilaritySearch aggregator should use in conjunction with a vector similarity search:

var search  = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 3)
                         .filter(Filters.equal(Book::getAuthor, "Jules Verne"));
var results = books.aggregate(search);

The preceding example should return only the top 3 books written by Jules Verne, sorted according to vector similarity.

Metadata filtering works the same regardless of whether you use a brute-force or index-based search, and will use any indexes you may have on the metadata attributes on which you are filtering, such as Book::getAuthor in this case, to speed up filter evaluation.

If you are a long-time Coherence user, you may be wondering why we are setting the filter on the aggregator itself and performing a filter evaluation inside the aggregator, instead of using the aggregate method that accepts a filter and allows you to pre-filter the set of entries to aggregate.

The reason is that both vector index implementations need to evaluate the filter internally, and only include the result if it evaluates to true, so the previous example will work in all situations.

However, if you are using a brute-force search, you may achieve the same result, and likely improve performance, by pre-filtering the entries:

var search  = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 3);
var results = books.aggregate(Filters.equal(Book::getAuthor, "Jules Verne"), search);