Block Distribution (Loading the Initial Data Set for a Sun Master Index)

Loading the Initial Data Set for a Sun Master Index

Previous: Distributed Processing
Next: Record Matching

Block Distribution

The master Bulk Matcher reads the input file and then writes records to block bucket files to be distributed to each matcher. Before writing the data, the block distributor reads the configuration of the query, match string, and duplicate and match thresholds. It then reads in the input data and writes the data to the block files based on the defined blocking query. The number of files created is dependent on the total number records, record size, and the memory of the processor. Once the data files are created for all blocks, the cluster synchronizer indicates that the matchers can begin the match process.

Previous: Distributed Processing
Next: Record Matching