A Record Store instance can save the last read generation and also
track that generation for any number of unique clients. You specify a unique
client by creating a client ID (which can be any string, such as
forge1
) and then set a value to indicate the
last-read generation for that client ID.
There are two ways to the last read generation for a client:
Automatically, in a Forge pipeline, by using a Record Store adapter to read records from the Record Store. In the adapter, use the CLIENT_ID pass-through to specify the client ID to be set for the generation that is being read in.
Manually, by using the
set-last-read-generation
task of the Record Store command-line utility.
Here is an example use-case with Forge processing the records:
You run a full crawl and it writes the records to a Record Store as Generation 1.
You perform a Forge baseline update using Generation 1. The Record Store adapter to Forge uses the READ_TYPE pass-through set to
BASELINE
and the CLIENT_ID pass-through set toforge1
. The use of the CLIENT_ID pass-through means that a client state was saved for the forge1 client.You run a second crawl, either full or incremental, and store the records as Generation 2. Because both crawls use the same
idPropertyName
and the same seeds, some of the records of both generations are identical and the others are delta records (new, modified, or deleted records).You perform a Forge partial update using the delta records between Generation 1 and Generation 2. For this pipeline, the Record Store adapter to Forge uses the READ_TYPE pass-through set to
DELTA
and the CLIENT_ID pass-through set toforge1
.The delta records are processed by Forge and uploaded to the MDEX Engine.
To find out which client states are currently saved in a Record Store
instance, use the
list-client-states
task of the Record Store
command-line utility.