44 Performing Similarity Search
To perform a similarity search against vectors stored in Coherence, you can
use the SimilaritySearch
aggregator. The easiest way to construct one is by
using the Aggregators.similaritySearch
factory method.
SimilaritySearch
aggregator, you need to
specify three arguments:
- A
ValueExtractor
that should be used to retrieve the vector attribute from the map entries. - The search vector with which to compare the extracted values.
- The maximum number of the results to return.
Book
objects, and
return up to ten most similar books, you would create a
SimilaritySearch
aggregator instance like
this:var searchVector = createEmbedding(searchQuery); // outside of Coherence var search = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 10);
algorithm
method on the created aggregator instance and passing an
instance of a different DistanceAlgorithm
implementation:var search = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 10) .algorithm(new L2SquaredDistance());
Out of the box, Coherence provides CosineDistance
,
L2SquaredDistance
, and InnerProductDistance
implementations, but you can easily add support for additional algorithms by
implementing the DistanceAlgorithm
interface yourself.
SimilaritySearch
aggregator, you can perform a similarity search by calling the
NamedMap.aggregate
method like you normally
would:NamedMap<String, Book> books = session.getMap("books"); List<QueryResult<String, Book>> results = books.aggregate(search);
The result of the search is a list of up to the maximum specified
QueryResult
objects (10
, in the previous example),
which contain entry key, value, and calculated distance between the search vector and a
vector extracted from the specified entry. The results are sorted by distance, in
ascending order, from closest to farthest.
Brute-Force Search
By default, if no index is defined for the vector attribute, Coherence will perform a brute-force search by deserializing every entry, extracting the vector attribute from it, and performing a distance calculation between the extracted vector and the search vector using the specified distance algorithm.
This is fine for small or medium-sized data sets, because Coherence will still perform the search in parallel across cluster members and aggregate the results, but can be very inefficient as the data sets get larger and larger, in which case using one of the supported index types (described in later sections) is recommended.
However, even when using indexes, it may be beneficial to run the same query using brute force, in order to test recall by comparing the results returned by the (approximate) index-based search and the (exact) brute-force search.
SimilaritySearch
aggregator to ignore any configured index and to perform a brute-force search anyway, by
calling the bruteForce
method on the aggregator
instance:var search = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 10) .bruteForce();
Indexed Brute-Force Search
DeserializationAccelerator
:NamedMap<String, Book> books = session.getMap("books"); books.addIndex(new DeserializationAccelerator(Book::getSummaryEmbedding));
This will avoid repeated deserialization of Book
values
when performing a brute-force search, at the cost of additional memory consumed by
the indexed vector instances.
The search will still perform the exact distance calculation, so the results will be exact, just like with the non-indexed brute-force search.
Index-Based Search
While brute-force searches work fine with small data sets, as the data set gets larger, it is highly recommended that you create a vector index for a vector property.
Out of the box, Coherence supports two vector index types: HNSW index and Binary Quantization index.
HNSW Index
HNSW index performs approximate vector searches using Hierarchical Navigable Small World graphs, as described by Malkov and Yashunin.
coherence-hnsw
module, which contains all Java code and pre-built native libraries for Linux (ARM
and x86), Mac (ARM and x86), and Windows (x86 only) that you
need:<dependency> <groupId>${coherence.groupId}</groupId> <artifactId>coherence-hnsw</artifactId> <version>${coherence.version}</version> </dependency>
NamedMap<String, Book> books = session.getMap("books"); books.addIndex(new HnswIndex<>(Book::getSummaryEmbedding, 768));
The first argument to the HnswIndex
constructor is the
extractor for the vector attribute to index, and the second is the number of
dimensions each indexed vector will have (which must be identical), which will allow
the native index implementation to pre-allocate the memory required for the
index.
HnswIndex
will use cosine distance to
calculate vector distances, but this can be overridden by specifying the
spaceName
argument in a
constructor:NamedMap<String, Book> books = session.getMap("books"); books.addIndex(new HnswIndex<>(Book::getSummaryEmbedding, "L2", 768));
The valid values for space name are COSINE
,
L2
, and IP
(inner product).
HnswIndex
also provides a number of options that you can
use to fine-tune its behavior, which you can specify using the fluent
API:var hnsw = new HnswIndex<>(Book::getSummaryEmbedding, 768) .setEfConstr(200) .setEfSearch(50) .setM(16) .setRandomSeed(100); books.addIndex(hnsw);
The algorithm parameters in the preceding example are described in more detail in the hnswlib documentation.
You can also specify the maximum index size by calling the
setMaxElements
method. By default, the index will be created
with a maximum size of 4,096 elements, and will be resized as necessary to
accommodate data set growth. However, the resize operation is somewhat costly and
can be avoided if you know ahead of time how many entries will be stored in the
Coherence map on which you are creating the index, in which case you should
configure the index size accordingly.
Note:
Remember that Coherence partitions indexes, so there will be as many instances of HNSW index as there are partitions.
This means that the ideal maxElements
settings is
just a bit over mapSize / partitionCount
, and not the actual
map size, which would be way too big.
After you have HNSW index created and configured, you can simply perform searches the same way as you did earlier using brute-force search. If one is available, then Coherence will automatically detect and use HNSW index.
Binary Quantization
Coherence also supports the Binary Quantization-based index, which
provides significant space savings (32x) compared to vector indexes that use
float32
vectors, such as HNSW. It does this by converting each
32-bit float in the original vector into either 0 or 1, and representing it using a
single bit in a BitSet
.
The downside is that the recall may not be as accurate, especially with smaller vectors, but that can be largely addressed by oversampling and re-scoring the results, which Coherence automatically performs.
BinaryQuantIndex
is implemented in pure Java, and is a
part of the main Coherence distribution, so it requires no additional dependencies.
To create it, simply call the NamedMap.addIndex
method:NamedMap<String, Book> books = session.getMap("books"); books.addIndex(new BinaryQuantIndex<>(Book::getSummaryEmbedding));
The only option you can specify is the
oversamplingFactor
, which is the multiplier for the maximum
number of results to return, and is 3 by default, meaning that if your search
aggregator is configured to return 10 results, binary quantization search will
initially return 30 results based on the Hamming distance between the binary
representation of the search vector and index vectors, re-score all 30 results using
exact distance calculation, and then re-order and return the top 10 results based on
the calculated exact distance.
oversamplingFactor
, you can specify it using
the fluent API when creating an
index:NamedMap<String, Book> books = session.getMap("books"); books.addIndex(new BinaryQuantIndex<>(Book::getSummaryEmbedding).oversamplingFactor(5));
In the preceding example, this causes the
SimilaritySearch
aggregator to return and re-score 50 results
initially, instead of 30.
Just like with the HNSW index, after you have a Binary Quantization index created and configured, you can simply perform searches the same way as you did previously, using a brute-force search. Coherence will automatically detect it and use it.
Metadata Filtering
SimilaritySearch
aggregator should use in conjunction with a vector
similarity
search:var search = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 3) .filter(Filters.equal(Book::getAuthor, "Jules Verne")); var results = books.aggregate(search);
The preceding example should return only the top 3 books written by Jules Verne, sorted according to vector similarity.
Metadata filtering works the same regardless of whether you use a
brute-force or index-based search, and will use any indexes you may have on the metadata
attributes on which you are filtering, such as Book::getAuthor
in this
case, to speed up filter evaluation.
If you are a long-time Coherence user, you may be wondering why we are
setting the filter on the aggregator itself and performing a filter evaluation inside
the aggregator, instead of using the aggregate
method that accepts a
filter and allows you to pre-filter the set of entries to aggregate.
The reason is that both vector index implementations need to evaluate the filter internally, and only include the result if it evaluates to true, so the previous example will work in all situations.
var search = Aggregators.similaritySearch(Book::getSummaryEmbedding, searchVector, 3); var results = books.aggregate(Filters.equal(Book::getAuthor, "Jules Verne"), search);