24 Data Quality Operators

This section discusses the mapping operators that help you achieve data quality. Because reporting on erroneous data wastes time and money, data quality is a key element of Business Intelligence. Include data quality operators in your mappings to load clean, accurate records to your targets.

This section contains the following topics:

About the Match-Merge Operator
About the Name and Address Operator
Using the Match-Merge Operator to Eliminate Duplicate Source Records
Using the Name and Address Operator to Cleanse Source Data

About the Match-Merge Operator

Duplicate records can obscure your understanding of who your customers and suppliers really are. Eliminating duplicate records is an important activity in the data correction process. The Match-Merge operator enables you to identify matching records and merge them into a single record. You can define the business rules used by the Match-Merge operator to identify records in a table that refer to the same data. Master data management working on various systems will make use of this operator to ensure that records are created and matched with a master record.

The Match-Merge operator can be used with the Name and Address operator to support householding, which is the process of identifying unique households in name and address data.

The Match-Merge operator enables you to:

Use weights to determine matches between records.
Determine matches using built-in algorithms, including the Jaro-Winkler and edit distance algorithms.
Cross reference data to track and audit matches.
Create custom rules combining built-in rules for matching and merging.

Understanding Matching Concepts

When you use Warehouse Builder to match records, you can define a single match rule or multiple match rules. If you create more than one match rule, Warehouse Builder determines two rows match if those rows satisfy any of the match rules. In other words, Warehouse Builder evaluates multiple match rules using OR logic.

Example of Matching and Merging Customer Data

Consider how you could utilize the Match-Merge operator to manage a customer mailing list. Use matching to find records that refer to the same person in a table of customer data containing 10,000 rows.

For example, you can define a match rule that screens records that have similar first and last names. Through matching, you may discover that 5 rows could refer to the same person. You can then merge those records into one new record. For example, you can create a merge rule to retain the values from the one of the five matched records with the longest address. The newly merged table now contains one record for each customer.

Table 24-1 shows records that refer to the same person prior to using the Match-Merge operator.

Table 24-1 Sample Records

Row	First Name	Last Name	SSN	Address	Unit	Zip
1	Jane	Doe	NULL	123 Main Street	NULL	22222
2	Jane	Doe	987-65-4325	NULL	NULL	22222
3	J.	Doe	NULL	123 Main Street	Apt 4	22222
4	NULL	Smith	987-65-4325	123 Main Street	Apt 4	22222
5	Jane	Smith-Doe	987-65-4325	NULL	NULL	22222

Table 24-2 shows the single record for Jane Doe after using the Match-Merge operator. Notice that the new record retrieves data from different rows in the sample.

First Name	Last Name	SSN	Address	Unit	Zip
Jane	Doe	987-65-4325	123 Main Street	Apt 4	22222

Example of Multiple Match Rules

The following example illustrates how Warehouse Builder evaluates multiple match rules using OR logic.

In the top portion of the Match Rules tab, create two match rules as described in Table 24-3:

Name	Position	Rule Type	Usage	Description
Rule_1	1	Conditional	Active	Match SSN
Rule _2	2	Conditional	Active	Match Last Name and PHN

In the lower portion of the tab, assign the details to Rule_1 as described in Table 24-4:

Attribute	Position	Algorithm	Similarity Score	Blank Matching
SSN	1	Exact	0	Do not match if either is blank

For Rule_2, assign the details as described in Table 24-5:

Attribute	Position	Algorithm	Similarity Score	Blank Matching
LastName	1	Exact	0	Do not match if either is blank
PHN	2	Exact	0	Do not match if either is blank

Assume you have the data listed in Table 24-6:

Row	First Name	Last Name	PHN	SSN
A	John	Doe	650-555-0111	NULL
B	Jonathan	Doe	650-555-0111	987-65-4328
C	John	Dough	650-555-0111	987-65-4328

According to Rule_1, rows B and C match. According to Rule_2, rows A and B match. Therefore, since Warehouse Builder handles match rules using OR logic, all three records match.

Example of Transitive Matching

The general rule is, if A matches B, and B matches C, then A matches C. Assign a conditional match rule based on similarity such as described in Table 24-7:

Table 24-7 Conditional Match Rule

Attribute	Position	Algorithm	Similarity Score	Blank Matching
LastName	1	Similarity	80	Do not match if either is blank

Assume you have the data listed in Table 24-8:

Table 24-8 Sample Data

Row	First Name	Last Name	PHN	SSN
A	John	Jones	650-555-0110	NULL
B	Jonathan	James	650-555-0110	987-65-4326
C	John	Jamos	650-555-0110	987-65-4326

Jones matches James with a similarity of 80, and James matches Jamos with a similarity of 80. Jones does not match Jamos because the similarity is 60, which is less than the threshold of 80. However, because Jones matches James, and James matches Jamos, all three records match (Jones, James, and Jamos).

Restrictions on Using the Match-Merge Operator

Because the Match-Merge operator only accepts SQL input, you cannot map the output of the Name and Address operator directly to the Match-Merge operator. You must use a staging table.
Because the Match-Merge generates only PL/SQL, you cannot map the Merge or XREF output groups of the Match-Merge operator to a SQL only operator such as a Sort operator or another Match-Merge operator.

Overview of the Matching and Merging Process

Matching determines which records refer to the same logical data. Warehouse Builder provides a variety of match rules to compare records. Match rules range from an exact match to sophisticated algorithms that can discover and correct common data entry errors.

Merging consolidates matched records into a single record that is free from duplicate records, omissions, misspellings, and unnecessary variations. You can define merge rules to select the preferred data values for use in the consolidated record.

Requirements for Matching and Merging Records

Warehouse Builder uses the following in the matching and merging process.

Match Bins

Match bins are containers for similar records and are used to identify potential matches. The match bin attributes are used to determine how records are grouped into match bins. While performing matching, only records within the same match bin are compared. Match bins limit the number of potential matches in a data set, thus improving performance of the match algorithm.

Match Bin Attributes

Before performing matching, Warehouse Builder divides the source records into smaller groups of similar records. Match bin attributes are the source attributes used to determine how records are grouped. Records having the same match bin attributes reside in the same match bin. Match bin attributes also limit match bins to manageable sets.

Select match bin attributes carefully to fulfill the following two conflicting needs:

Ensure that any records that match reside in the same match bin.
Keep the size of the match bin as small as possible.

A small match bin is desirable for efficiency.

Match Record Sets

A match record set consists of one or more similar records. After matching records, a match record set is created for each match bin. You can define the match rules that determine if two records are similar.

Merged Records

A merged record contains data that is merged using multiple records in the match record set. Each match record set generates its own merged record.

Process for Matching and Merging Records

You use the Match-Merge operator to match and merge records. This operator accepts records from an input source, determines the records that are logically the same, and constructs a new merged record from the matched records.

Figure 24-1 represents high-level tasks involved in the matching and merging process. These include the following:

Constructing Match Bins
Constructing Match Record Sets
Constructing Merge Records

Figure 24-1 Match-Merge Process

Description of "Figure 24-1 Match-Merge Process"

Constructing Match Bins

The match bin is constructed using the match bin attributes. Records with the same match bin attribute values will reside in the same match bin. A small match bin is desirable for efficiency.

Constructing Match Record Sets

Match rules are applied to all the records in each match bin to generate one or more match record sets. Match rules determine if two records match. A match rule is an n X n algorithm where all records in the match bin are compared.

One important point of this algorithm is the transitive matching. Consider three records A, B, and C. If record A is equal to record B and record B is equal to record C, this means that record A is equal to record C.

Constructing Merge Records

A single merge record is constructed from each match record set. You can create specific rules to define merge attributes by using merge rules

Match Rules

Match rules are used to determine if two records are logically similar. Warehouse Builder enables you to use different types of rules to match source records. You can define match rules using the MatchMerge Wizard or the MatchMerge Editor. Use the editor to edit existing match rules or add new rules.

Match rules can be active or passive. Active rules are generated and executed in the order specified. Passive rules are generated but not executed.

Table 24-9 describes the types of match rules.

Table 24-9 Types of Match Rules

Match Rule	Description
All Match	Matches all rows within a match bin.
None Match	Turns off matching. No rows match within the match bin.
Conditional	Matches rows based on the algorithm you set. For more information about Conditional match rules and how to create one, see "Conditional Match Rules".
Weight	Matches row based on scores that you assign to the attributes. For more information about Weight match rules and how to create one, see "Weight Match Rules".
Person	Matches records based on the names of people. For more information about Person match rules and how to create one, see "Person Match Rules".
Firm	Matches records based on the name of the organization or firm. For more information about Firm match rules and how to create one, see "Firm Match Rules".
Address	Matches records based on postal addresses. For more information about Address match rules and how to create one, see "Address Match Rules".
Custom	Matches records based on a custom comparison algorithm that you define. For more information about Custom match rules and how to create one, see "Custom Match Rules".

Conditional Match Rules

Conditional match rules specify the conditions under which records match.

A conditional match rule allows you to combine multiple attribute comparisons into one composite rule. When more than one attribute is involved in a rule, two records are considered to be a match only if all comparisons are true. Warehouse Builder displays an AND icon in the left-most column of subsequent conditions.

You can specify how attributes are compared using comparison algorithms.

Attribute

Identifies the attribute that will be tested for a particular condition. You can select from any input attribute (INGRP1).

Position

The order of execution. You can change the position of a rule by clicking on the row header and dragging the row to its new location. The row headers are the boxes to the left of the Attribute column.

Algorithm

A list of methods that can be used to determine a match. Table 24-10 describes the algorithms.

Similarity Score

The minimum similarity value required for two strings to match, as calculated by the Edit Distance, Standardized Edit Distance, Jaro-Winkler, or Standardized Jaro-Winkler algorithms. Enter a value between 0 and 100. A value of 100 indicates an exact match, and a value of 0 indicates no similarity.

Blank Matching

Lists options for handling empty strings in a match.

Comparison Algorithms

Each attribute in a conditional match rule is assigned a comparison algorithm, which specifies how the attribute values are compared. Multiple attributes may be compared in one rule with a separate comparison algorithm selected for each.

Table 24-10 describes the types of comparisons.

Table 24-10 Types of Comparison Algorithms for Conditional Match Rules

Algorithm	Description
Exact	Attributes match if their values are exactly the same. For example, "Dog" and "dog!" would not match, because the second string is not capitalized and contains an extra character. For data types other than `STRING`, this is the only type of comparison allowed.
Standardized Exact	Standardizes the values of the attributes before comparing for an exact match. With standardization, the comparison ignores case, spaces, and non-alphanumeric characters. Using this algorithm, "Dog" and "dog!" would match.
Soundex	Converts the data to a Soundex representation and then compares the text strings. If the Soundex representations match, then the two attribute values are considered matched.
Edit Distance	A "similarity score" in the range 0-100 is entered. If the similarity of the two attributes is equal or greater to the specified value, the attribute values are considered matched. The similarity algorithm computes the edit distance between two strings. A value of 100 indicates that the two values are identical; a value of zero indicates no similarity whatsoever. For example, if the string "tootle" is compared with the string "tootles", then the edit distance is 1. The length of the string "tootles" is 7. The similarity value is therefore (6/7)*100 or 85.
Standardized Edit Distance	Standardizes the values of the attribute before using the Similarity algorithm to determine a match. With standardization, the comparison ignores case, spaces, and non-alphanumeric characters.
Partial Name	The values of a string attribute are considered a match if the value of one entire attribute is contained within the other, starting with the first word. For example, "Midtown Power" would match "Midtown Power and Light", but would not match "Northern Midtown Power". The comparison ignores case and non-alphanumeric characters.
Abbreviation	The values of a string attribute are considered a match if one string contains words that are abbreviations of corresponding words in the other. Before attempting to find an abbreviation, this algorithm performs a Std Exact comparison on the entire string. The comparison ignores case and non-alphanumeric character. For each word, the match rule will look for abbreviations, as follows. If the larger of the words being compared contains all of the letters from the shorter word, and the letters appear in the same order as the shorter word, then the words are considered a match. For example, "Intl. Business Products" would match "International Bus Prd".
Acronym	The values of a string attribute are considered a match if one string is an acronym for the other. Before attempting to identify an acronym, this algorithm performs a Std Exact comparison on the entire string. If no match is found, then each word of one string is compared to the corresponding word in the other string. If the entire word does not match, each character of the word in one string is compared to the first character of each remaining word in the other string. If the characters are the same, the names are considered a match. For example, "Chase Manhattan Bank NA" matches "CMB North America". The comparison ignores case and non-alphanumeric characters.
Jaro-Wrinkler	Matches strings based on their similarity value using an improved comparison system over the Edit Distance algorithm. It accounts for the length of the strings and penalizes more for errors at the beginning. It also recognizes common typographical errors. The strings match when their similarity value is equal to or greater than the Similarity Score that you specify. A similarity value of 100 indicates that the two strings are identical. A value of zero indicates no similarity whatsoever. Note that the value actually calculated by the algorithm (0.0 to 1.0) is multiplied by 100 to correspond to the Edit Distance scores.
Standardized Jaro-Wrinkler	Eliminates case, spaces, and non-alphanumeric characters before using the Jaro-Winkler algorithm to determine a match.
Double Metaphone	Matches phonetically similar strings using an improved coding system over the Soundex algorithm. It generates two codes for strings that could be pronounced in multiple ways. If the primary codes match for the two strings, or if the secondary codes match, then the strings match. The Double Metaphone algorithm accounts for alternate pronunciations in Italian, Spanish, French, and Germanic and Slavic languages. Unlike the Soundex algorithm, Double Metaphone encodes the first letter, so that 'Kathy' and 'Cathy' evaluate to the same phonetic code.

Creating Conditional Match Rules

To define a conditional match rule, complete the following steps:

On the top portion of the Match Rules tab or the Match Rules page, select Conditional in the Rule Type column.

A Details section is displayed.
Click Add to add a new row.
Select an attribute in the Attribute column.
In the Algorithm column, select a comparison algorithm. See Table 24-10 for descriptions.
Specify a similarity score for the Edit Distance, Standardized Edit Distance, Jaro-Winkler, or Standardized Jaro-Winkler algorithms.
Select a method for handling blanks.

Weight Match Rules

A weighted match rule allows you to assign an integer weight to each attribute included in the rule. You must also specify a threshold. For each attribute, the Match-Merge operator multiplies the weight by the similarity score, and sums the scores. If the sum equals or exceeds the threshold, the two records being compared are considered a match.

Weight match rules are most useful when you need to compare a large number of attributes, without having a single attribute that is different causing a non-match, as can happen with conditional rules.

Weight rules implicitly invoke the similarity algorithm to compare two attribute values. This algorithm returns an integer, percentage value in the range 0-100, which represents the degree to which two values are alike. A value of 100 indicates that the two values are identical; a value of zero indicates no similarity whatsoever.

Similarity Algorithm

The method used to determine a match. Choose from these algorithms:

Edit Distance: Calculates the number of deletions, insertions, or substitutions required to transform one string into another.
Jaro-Winkler: Uses an improved comparison system over the Edit Distance algorithm. It accounts for the length of the strings and penalizes more for errors at the beginning. It also recognizes common typographical errors.

Attribute

Identifies the attribute that will be tested for a particular condition. You can select from any input attribute (INGRP1).

Maximum Score

The weight value for the attribute. This value should be greater than the value of Required Score to Match.

Score When Blank

The similarity value when one of the records is empty.

Required Score to Match

A value that represents the similarity required for a match. A value of 100 indicates that the two values are identical. A value of zero indicates there is no similarity.

Example of Weight Match Rules

Table 24-11 displays the attribute values contained in two separate records that are read in the following order.

Table 24-11 Example of Weight Match Rule

Record Number	First Name	Middle Name	Last Name
Record 1	Robert	Steve	Paul
Record 2		Steven	Paul

You define a match rule that uses the Edit Distance similarity algorithm. The Required Score to Match is 120. The attributes for first name and middle name are defined with a Maximum Score of 50 and Score When Blank of 20. The attribute for last name has a Maximum Score of 80 and a Score When Blank of 0.

Consider an example of the comparison of Record 1 and Record 2 using the weight match rule.

Since first name is blank for Record 2, the Blank Score = 20.
The similarity of middle name in the two records is 0.83. Since the weight assigned to this attribute is 50, the similarity score for this attribute is 43 (0.83 X 50).
Since the last name attributes are the same, the similarity score for the last name is 1. The weighted score is 80 (1 X 80).

The total score for this comparison is 143 (20+43+80). Since this is more than the value defined for Required Score to Match, the records are considered a match.

Creating Weight Match Rules

To use the Weight match rule, complete the following steps:

On the Match Rules tab or the Match Rules page, select Weight as the Rule Type.

The Details tab is displayed at the bottom of the page.
Select Add at the bottom of the page to add a new row.
For each row, select an attribute to add to the rule using the Attribute column.
In Maximum Score, assign a weight to each attribute. Warehouse Builder compares each attribute using a similarity algorithm that returns a score between 0 and 100 to represent the similarity between the rows.
In Score When Blank, assign a value to be used when the attribute is blank in one of the records.
In Required score to match, assign an overall score for the match.

For two rows to be considered a match, the total counts must be greater than the value specified in the Required score to match parameter.

Person Match Rules

Built-in Person rules provide an easy and convenient way for matching names of individuals. Person match rules are most effective when the data has first been corrected using the Name and Address operator.

When you use Person match rules, you must specify which data within the record represents the name of the person. The data can come from multiple columns. Each column must be assigned an input role that specifies what the data represents.

To define a Person match rule, you must define the Person Attributes that are part of the rule. For example, you can create a Person match rule that uses the Person Attributes first name and last name for comparison. For each Person Attribute, you must define the Person Role that the attribute uses. Next you define the rule options used for the comparison. For example, while comparing last names, you can specify that hyphenated last names should be considered a match.

Person Roles

Table 24-12 describes the roles for different parts of a name that are used for matching. On the Match Rules page or Match Rules tab, use the Roles column on the Person Attributes tab to define person details.

Table 24-12 Name Roles for Person Match Rules

Role	Description
Prename	Prenames are compared only if the following are true: The Last_name and, if present, the middle name (Middle_name_std, Middle_name_2_std, and Middle_name_3_std roles) in both records match. The "Mrs. Match" option is selected. Either record has a missing First_name_std.
First Name Standardized	Compares the first names. By default, the first names must match exactly, but you can specify other comparison options as well. First names match if both are blank. A blank first name will not match a non-blank first name unless the Prename role has been assigned and the "Mrs. Match" option is set. If a Last_name role has not been assigned, a role of First_name_std must be assigned.
Middle Name Standardized, Middle Name 2 Standardized, Middle Name 3 Standardized	Compares the middle names. By default, the middle names must match exactly, but other comparison options can be specified. If more than one middle name role is assigned, attributes assigned to the different roles are cross-compared. For example, values for Middle_name_std will be compared not only against other Middle_name_std values, but also against Middle_name_2_std, if that role is also assigned. Middle names match if either or both are blank. If any of the middle name roles are assigned, the First_name_std role must also be assigned.
Last Name	Compares the last names. By default, the last names must match exactly, but you can specify other comparison options. The last names match if both are blank, but not if only one is blank.
Maturity Post Name	Compares the post name, such as "Jr.", "III," and so on. The post names match if the values are exactly the same, or if either value is blank.

Person Details

Table 24-13 describes the options that determine a match for person match rules. Use the Details tab of the Match Rules tab or the Match Rules page to define person details.

Table 24-13 Options for Person Match Rule

Option	Description
Detect switched name order	Detects switched name orders such as matching 'Elmer Fudd' to 'Fudd Elmer'. You can select this option if you selected First Name and Last Name roles for attributes on the Person Attributes tab.
Match on initials	Matches initials to names such as 'R.' and 'Robert'. You can select this option for first name and middle name roles.
Match on substrings	Matches substrings to names such as 'Rob' to 'Robert'. You can select this option for first name and middle name roles.
Similarity Score	Records are considered a match if the similarity is greater than or equal to score. For example, "Susan" will match "Susen" if the score is less than or equal to 80. Uses a similarity score to determine a match, as calculated by the Edit Distance or Jaro-Winkler algorithms. A value of 100 requires an exact match, and a value of 0 requires no similarity whatsoever.
Match on Phonetic Codes	Determines a match using either the Soundex or the Double Metaphone algorithms.
Detect compound name	Matches compound names to names such as 'De Anne' to 'Deanne'. You can select this option for the first name role.
"Mrs" Match	Matches prenames to first and last names such as 'Mrs. Washington' to 'George Washington'. You can select this option for the prename role.
Match hyphenated names	Matches hyphenated names to unhyphenated names such as "Reese-Jones" to "Reese". You can select this option for the last name role.
Detect missing hyphen	The operator detects missing hyphens, such as matching "Hillary Rodham Clinton" to "Hillary Rodham-Clinton". You can select this option for the last name role.

Creating Person Match Rules

To define a Person match rule, complete the following steps:

On the Match Rules tab, select Person as the Rule Type.

The Person Attributes tab and Details tab are displayed at the bottom of the page.
In the left panel of the Person Attributes tab, select the attributes that describe a full name and use the right arrow to move them to Name Roles section.
For each attribute, select the role it plays in a name.

You must define either the Last Name or First Name Standardized for the match rule to be effective. See Table 24-12 for the types of roles you can assign.
Select the Details tab and select the applicable options as listed in Table 24-13.

Firm Match Rules

Built-in Firm match rules provide an easy and convenient way for matching business names. Firm match rules are most effective when the data has first been corrected using the Name and Address operator. Similar to the Person rule, this rule requires users to set what data within the record represents the name of the firm. The data can come from multiple columns and each column specified must be assigned an input role that indicates what the data represents.

Note that you need not assign a firm role to every attribute, and not every role needs to be assigned to an attribute. The attributes assigned to firm roles are used in the match rule to compare the records. The attributes are compared based on the role they have been assigned and other comparison options have you set. For a complete list of firm roles and how each role is treated in a firm match rule, see "Firm Roles".

Firm Roles

Firm roles define the parts of a firm name that are used for matching. The options you can select for firm role are Firm1 or Firm2. If you select one attribute, for firm name, select Firm1 as the role. If you selected two attributes, designate one of them as Firm1 and the other as Firm2.

Firm1: If this role is assigned, the business names represented by Firm1 are compared. Firm1 names will not be compared against Firm2 names unless if the Cross-match firm1 and firm2 box is checked. By default, the firm names must match exactly; but other comparison options can also be specified. Firm1 names do not match if either or both names are blank.
Firm2: If this role is assigned, the values of the attribute assigned to Firm2 will be compared. Firm2 names will not be compared against Firm1 names unless if the Cross-match firm1 and firm2 box is checked. By default, the firm names must match exactly; but other comparison options can also be specified. Firm2 names do not match if either or both names are blank. If a Firm1 role is not assigned, a Firm2 roles must be assigned.

Firm Details

Table 24-14 describes the rule options you can set for each component of the firm name to determine a match.

Table 24-14 Options for Firm Rules

Option	Description
Strip noise words	Removes the following words from Firm1 and Firm2 before matching: THE, AND, CORP, CORPORATION, CO, COMPANY, INC, INCORPORATED, LTD, TO, OF, and BY.
Cross-match firm 1 and firm 2	When comparing two records for matching, in addition to matching firm1 to firm1 and firm2 to firm2 of the respective records, match firm1 against firm2 for the records.
Match on partial firm name	Uses the Partial Name algorithm to determine a match. For example, match "Midtown Power" to "Midtown Power and Light".
Match on abbreviations	Uses the Abbreviation algorithm to determine a match. For example, match "International Business Machines" to "IBM".
Match on acronyms	Uses the Acronym algorithm to determine a match. For example, match "CMB, North America" to "Chase Manhattan Bank, NA".
Similarity score	Uses a similarity score to determine a match, as calculated by the Edit Distance or Jaro-Winkler algorithms. Enter a value between 0 and 100 as the minimum similarity value required for a match. A value of 100 requires an exact match, and a value of 0 requires no similarity whatsoever. Two records are considered as a match if the similarity is greater than or equal to the value of similarity score.

Creating Firm Match Rules

To define a Firm match rule, complete the following steps:

On the Match Rules tab or the Match Rules page, select Firm as the Rule Type.

The Firm Attributes tab and Details tab are displayed at the bottom of the page.
In the left panel of the Firm Attributes tab, select one or two attributes that represent the firm name and click the right shuttle button.

The attributes are moved to the Firm Roles box.
For each attribute, click Roles. From the list, select Firm 1 for the first attribute, and Firm 2 for the second attribute, if it exists.
On the Details tab, select the applicable options. For more details, see "Firm Details".

Address Match Rules

Address Match rules provide a method of matching records based on postal addresses. Address match rules are most effective when the data has first been corrected using a Name and Address operator.

Address Rules work differently depending on whether the address being processed has been corrected, using the Name and Address operator, or not. Generally, corrected addresses have already been identified in a postal matching database, and are therefore syntactically correct, legal, and existing addresses according to the Postal Service of the country containing the address. Corrected addresses can be processed more quickly, since the match rule can make certain assumptions about their format.

Uncorrected addresses may be syntactically correct, but have not been found in a postal matching database. Addresses may have not been found because they are not in the database, or because there is no postal matching database installed for the country containing the address. Address match rules determine whether an address has been corrected based on the Is_found role. If Is_found role is not assigned, then the match rule performs the comparisons for both the corrected and uncorrected addresses.

To create an Address match rule, assign address roles to the various attributes. The attributes assigned to address roles are used in the match rule to compare the records. Attributes are compared depending on which role they have been assigned, and what other comparison options have been set.

Address Roles

Table 24-15 describes the address roles you can select for each part of an address.

Table 24-15 Address Roles

Role	Description
Primary Address	Compares the primary addresses. Primary addresses can be, for example, street addresses ("100 Main Street") or PO boxes ("PO Box 100"). By default, the primary addresses must match exactly, but a similarity option can also be specified. The Primary_address role must be assigned.
Unit Number	Unit numbers (such as suite numbers, floor numbers, or apartment numbers) are compared if the primary addresses match. The unit numbers match if both are blank, but not if one is blank, unless the Match on blank secondary address option is set. If the Allow differing secondary address is set, the unit numbers are ignored.
PO Box	Compares the Post Office Boxes. The PO Box is just the number portion of the PO Box ("100"), and is a subset of the primary address, when the primary address represents a PO Box ("PO Box 100"). If the primary address represents a street address, the PO Box will be blank.
Dual Primary Address	The Dual_primary_address is compared against the other record's Dual_primary_address and Primary_address to determine a match.
Dual Unit Number	Compares the Dual_unit_number address with the Dual_unit_number and Unit_number of the other record. The unit numbers will match if one or both are blank. To assign the Dual_unit_number role, the Dual_primary_address role must also be assigned.
Dual PO Box	Dual_PO_Box address of a record is compared with the Dual_PO_Box and the PO_Box of the other record. To assign the Dual_PO_Box role, the Dual_primary_address role must also be assigned.
City	Compares the cities for uncorrected addresses. For corrected addresses, the cities are only compared if the postal codes do not match. If both City and State roles match, then the address line roles, such as Primary_address, can be compared. By default, the cities must match exactly. But you may specify a last line similarity option. The cities match if both are blank, but not if only one is blank. If the City role is assigned, then the State role must also be assigned.
State	Assign this role only when also assigning the City role. The states are compared for uncorrected addresses. For corrected addresses, the states are only compared if the postal codes do not match. If both State and City roles match, then the address line roles, such as Primary_address, can be compared. By default, the states must match exactly, but a last line similarity option may be specified. The states match if both are blank, but not if only one is blank. If the State role is assigned, then the City role must also be assigned.
Postal Code	For uncorrected address data, the operator does not use Postal Code. The postal codes are compared for corrected addresses. For uncorrected addresses, the Postal_code role is not used. To match, the postal codes must be exactly the same. The postal codes are not considered a match if one or both are blank. If the postal codes match, then the address line roles, such as Primary_address, can be compared. If the postal codes do not match, City and State roles are compared to determine whether the address line roles should be compared.
Is Found	The Is_found_flag attributes are not compared, but instead are used to determine whether an address has been found in a postal matching database, and therefore represents a legal address according to the postal service of the country containing the address. This determination is important because the type of comparison done during matching depends on whether the address has been found in the postal database or not.

Address Details

Table 24-16 describes the options for determining a match for an address rule.

Table 24-16 Options for Address Roles

Option	Description
Allow differing secondary address	Allow addresses to match even if the unit numbers are not null and are different.
Match on blank secondary address	Allow addresses to match even if exactly one unit number is null.
Match on either street or post office box	Matches records if either the street address or the post office box match.
Address line similarity	Match if address line similarity >= the score. All spaces and non-alpanumeric characters are removed before the similarity is calculated.
Last line similarity	Match is the last line similarity >= score. The last line consists of city and state. All spaces and non-alphanumeric characters are removed before the similarity is calculated.

Creating Address Match Rules

To define an Address match rule, complete the following steps:

On the Match Rules tab or the Match Rules page, select Address as the Rule Type.

The Address Attributes tab and Details tab are displayed at the bottom of the page.
In the left panel of the Address Attributes tab, select the attribute that represents the primary address. Use the right shuttle key to move it to the Address Roles Attributes column.
Click Role Required and designate that attribute as the Primary Address.

You must designate one attribute as the primary address. If you do not assign the Primary Address role, the match rule is invalid.
Add other attributes and designate their roles as necessary. See Table 24-15 for the types of roles you can assign.
Select the Details tab and select the applicable options as listed in Table 24-16.

Custom Match Rules

Custom match rules enable you to write your own comparison algorithms to match records. You can use any input attributes or match functions within this comparison. You can use an active custom rule to control the execution of passive rules.

Consider the following three passive built-in rules:

NAME_MATCH: built-in name rule.
ADDRESS_MATCH: built-in address rule.
TN_MATCH: built-in conditional rule.

You can create a custom rule to specify that two records can be considered a match if any two of these rules are satisfied. Example 5–1 describes the PL/SQL code used to create the custom match rule that implements this example.

Example 24-1 Creating a Custom Rule Using Existing Passive Rules

BEGIN
   RETURN(
       (NAME_MATCH(THIS_,THAT_) AND ADDRESS_MATCH(THIS_,THAT_))
     OR
       (NAME_MATCH(THIS_,THAT_) AND TN_MATCH(THIS_,THAT_))
     OR
       (ADDRESS_MATCH(THIS_,THAT_) AND TN_MATCH(THIS_,THAT_))
         );
END;

Creating Custom Match Rules

To define a Custom match rule, complete the following steps:

On the Match Rules tab or the Match Rules page, select Custom as the Rule Type.

A Details field is displayed at the bottom of the page with the skeleton of a PL/SQL program.
Click Edit to open the Custom Match Rules Editor.

For more information about using the editor, select Help Topic from the Help menu.
To enter PL/SQL code, use any combination of the following:
- To read in a file, select Open File from the Code menu.
- To enter text, first position the cursor using the mouse or arrow keys, then begin typing. You can also use the commands on the Edit and Search menus.
- To reference any function, parameter, or transformation in the navigation tree, first position the cursor, then double-click or drag-and-drop the object onto the Implementation field.
To validate your code, select Validate from the Test menu.

The validation results appear on the Messages tab.
To save your code, select Save from the Code menu.
To close the Custom Match Rules Editor, select Close from the Code menu.

Merge Rules

Matching produces a set of records that are logically the same. Merging is the process of creating one record from the set of matched records. A Merge rule is applied to attributes in the matched record set to obtain a single value for the attribute in the merged record.

You can define one Merge rule for all the attributes in the Merge record or define a rule for each attribute. For instance, if the merged record is a customer record, it may have attributes such as ADDRESS1, ADDRESS2, CITY, STATE, and ZIP. You can write five rules that select the value of each attribute from up to five different records, or one Record rule that selects that values of all five attributes from one record. Use record rules when multiple attributes compose a logical unit, such as an address. For example, City, State, and Zip Code might be three different attributes, but the data for these attributes should all come from the same record.

Table 24-17 describes the types of merge rules.

Merge Rule	Description
Any	Uses the first non-blank value.
Match ID	Merges records that have already been output from another Match-Merge operator.
Rank	Ranks the records from the match set. The associated attribute from the highest ranked record will be used to populate the merge attribute value.
Sequence	Specify a database sequence for this rule. The next value of the sequence will be used for the value.
Min Max	Specify an attribute and a relation to choose the record to be used as a source for the merge attribute.
Copy	Choose a value from a different previously merged value.
Custom	Create a PL/SQL package function to select the merge value. The operator will provide the signature of this function. The user is responsible for the implementation of the rule from "BEGIN" to "END;" The matched records and merge record are parameters for this function.
Any Record	Identical to the Any rule, except that an Any Record rule applies to multiple attributes.
Rank Record	Identical to the Rank rule, except that a Rank Record rule applies to multiple attributes.
Min Max Record	Identical to the Min Max rule, except that a Min Max Record rule applies to multiple attributes.
Custom Record	Identical to the Custom rule, except that a Custom Record rule applies to multiple attributes.

Match ID Merge Rule

Use the Match ID merge rule to merge records that have already been output in the XREF group from another Match-Merge operator. No other operator is valid for this type of input. For more information, see "Using Two Match-Merge Operators".

Next Value of the Sequence

Identifies the sequence that will be used by the rule.

sequences list

Lists all sequences defined in the current project.

Select Sequence

Sets the sequence for the rule to the sequence currently selected in the list. Move a sequence from the sequences list to Select Sequence.

Rank and Rank Record Merge Rules

Use the Rank and Rank Record rules when merging data from multiple sources. These rules enable you to identify your preference for certain sources. Your data must have a second input attribute on which the rule is based.

For example, the second attribute might identify the data source, and these data sources are ranked in order of reliability. The most reliable value would be used in the merged record. The merge rule might look like this:

INGRP1.SOURCE = 'Order Entry'

Name

An arbitrary name for the rule. Warehouse Builder creates a default name such as RULE_0 for each rank merge rule. You can replace these names with meaningful ones.

Position

The order of execution. You can change the position of a rule by clicking on the row header and dragging the row to its new location. The row headers are the boxes to the left of the Name column.

Expression Record Selection

The custom SQL expression used in the ranking. Click the Ellipsis button to display the Rank Rule Editor (also called the Expression Builder User Interface). Use this editor to develop the ranking expression.

Sequence Merge Rule

The Sequence rule uses the next value in a sequence.

Next Value of the Sequence

Identifies the sequence that will be used by the rule.

sequences list

Lists all sequences defined in the current project.

Select Sequence

Sets the sequence for the rule to the sequence currently selected in the list.

Min Max and Min Max Record Merge Rules

The Min Max and Min Max Record rules select an attribute value based on the size of another attribute value in the record.

For example, you might select the First Name value from the record in each bin that contains the longest Last Name value.

Selecting Attribute

Lists all input attributes. Select the attribute whose values provide the order.

Attribute Relation

Select the characteristic for choosing a value in the selected attribute.

Minimum. Selects the smallest numeric value or the oldest date value.
Maximum. Selects the largest numeric value or the most recent date value.
Shortest. Selects the shortest character value.
Longest. Selects the longest character value.

Copy Merge Rule

The Copy rule uses the values from another merged attribute.

Merged Attribute

Lists the other merged attributes, which you selected on the Merge Attributes page.

Custom and Custom Record Merge Rule s

The Custom and Custom Record rules use PL/SQL code that you provide to merge the records. The following is an example of a Custom merge rule, which returns the value of the TAXID attribute for record 1.

BEGIN
RETURN M_MATCHES(1)."TAXID";
END;

Following is an example of a Custom Record merge rule, which returns a record for record 1:

BEGIN
RETURN M_MATCHES(1);
END;

Merge Rules Detail

Displays the PL/SQL code composing your custom algorithm. You can edit code directly in this field or use the Custom Merge Rule Editor.

Edit

Displays the Custom Merge Rule Editor.

About the Name and Address Operator

After matching and merging records, you can further validate information about your customers and suppliers, and discover additional errors and inconsistencies. Warehouse Builder parses the names and addresses, and uses methods specific to this type of data, such as matching common nicknames and abbreviations. You can compare the input data to the data libraries supplied by third-party name and address cleansing software vendors, which can augment your records with information such as postal routes and geographic coordinates.

Successful delivery and lower postage rates are not the only reasons to cleanse name and address data. You will get better results from data analysis when the results are not skewed by duplicate records and incomplete information.

Warehouse Builder enables you to perform name and address cleansing on data using the Name and Address operator. The Name and Address operator identifies and corrects errors and inconsistencies in name and address source data by comparing input data to the data libraries supplied by third-party name and address cleansing software vendors. You can purchase the data libraries directly from these vendors.

Note:

The Name and Address operator requires separate licensing and installation of third-party name and address cleansing software. Refer to the Oracle Warehouse Builder Installation and Administration Guide for more information.

The errors and inconsistencies corrected by the Name and Address operator include variations in address formats, use of abbreviations, misspellings, outdated information, inconsistent data, and transposed names. The operator fixes these errors and inconsistencies by:

Parsing the name and address input data into individual elements.
Standardizing name and address data, using standardized versions of nicknames and business names and standard abbreviations of address components, as approved by the postal service of the appropriate country. Standardized versions of names and addresses facilitate matching and householding, and ultimately help you obtain a single view of your customer.
Correcting address information such as street names and city names. Filtering out incorrect or undeliverable addresses can lead to savings on marketing campaigns.
Augmenting names and addresses with additional data such as gender, postal code, country code, apartment identification, or business and consumer identification. You can use this and other augmented address information, such as census geocoding, for marketing campaigns that are based on geographical location.

Augmenting addresses with geographic information facilitates geography-specific marketing initiatives, such as marketing only to customers in large metropolitan areas (for example, within an n-mile radius from large cities); marketing only to customers served by a company's stores (within an x-mile radius from these stores). Oracle Spatial, an option with Oracle Database, and Oracle Locator, packaged with Oracle Database, are two products that you can use with this feature.

The Name and Address operator also enables you to generate postal reports for countries that support address correction and postal matching. Postal reports often qualify you for mailing discounts. For more information, see "About Postal Reporting".

Example: Correcting Address Information

This example follows a record through a mapping using the Name and Address operator. This mapping also uses a Splitter operator to demonstrate a highly recommended data quality error handling technique.

Example Input

In this example, the source data contains a Customer table with the row of data shown in Table 24-18.

Table 24-18 Sample Input to Name and Address Operator

Address Column	Address Component
Name	Joe Smith
Street Address	8500 Normandale Lake Suite 710
City	Bloomington
ZIP Code	55437

The data contains a nickname, a last name, and part of a mailing address, but it lacks the customer's full name, complete street address, and the state in which he lives. The data also lacks geographic information such as latitude and longitude, which can be used to calculate distances for truckload shipping.

Example Steps

This example uses a mapping with a Name and Address operator to cleanse name and address records, followed by a Splitter operator to load the records into separate targets depending on whether they were successfully parsed. This section explains the general steps required to design such a mapping.

To make the listed changes to the sample record:

In the Mapping Editor, begin by adding the following operators to the canvas:
- A CUSTOMERS table from which you extract the records. This is the data source. It contains the data in Table 24-18.
- A Name and Address operator. This action starts the Name and Address Wizard. Follow the steps of the wizard.
- A Splitter operator. For information on using this operator, see "Splitter Operator".
- Three target operators into which you load the successfully parsed records, the records with parsing errors, and the records whose addresses are parsed but not found in the postal matching software.
Map the attributes from the CUSTOMERS table to the Name and Address operator ingroup. Map the attributes from the Name and Address operator outgroup to the Splitter operator ingroup.

You are not required to use the Splitter operator, but it provides an important function in separating good records from problematic records.
Define the split conditions for each of the outgroups in the Splitter operator and map the outgroups to the targets.

Figure 24-2 shows a mapping designed for this example. The data is mapped from the source table to the Name and Address operator, and then to the Splitter operator. The Splitter operator separates the successfully parsed records from those that have errors. The output from OUTGRP1 is mapped to the CUSTOMERS_GOOD target. The split condition for OUTGRP2 is set such that records whose Is Parsed flag is False are loaded to the NOT_PARSED target. That is, the Split Condition for OUTGRP2 is set as INGRP1.ISPARSED='F'. The Records in the REMAINING_RECORDS group are successfully parsed, but their addresses are not found by the postal matching software. These records are loaded to the PARSED_NOT_FOUND target.

Figure 24-2 Name and Address Operator Used with a Splitter Operator in a Mapping

Diagram of Name and Address operator with Splitter operator

Description of "Figure 24-2 Name and Address Operator Used with a Splitter Operator in a Mapping"

Example Output

If you run the mapping designed in this example, the Name and Address operator standardizes, corrects, and completes the address data from the source table. In this example, the target table contains the address data as shown in Table 24-19. Compare it with the input record from Table 24-18 .

Table 24-19 Sample Output from Name and Address Operator

Address Column	Address Component
First Name Standardized	JOSEPH
Last Name	SMITH
Primary Address	8500 NORMANDALE LAKE BLVD
Secondary Address	STE 710
City	BLOOMINGTON
State	MN
Postal Code	55437-3813
Latitude	44.849194
Longitude	-093.356352
Is Parsed	True or False. Indicates whether a record can be separated into individual elements.
Is Good Name	True or False. Indicates whether the name was found in a postal database.
Is Good Address	True or False. Indicates whether the address was found in a postal database or was parsed successfully.
Is Found	True or False. Indicates whether the address was found in a postal database.
Name Warning	True or False. Indicates whether problems occurred in parsing the name.
Street Warning	True or False. Indicates whether problems occurred in parsing the address.
City Warning	True or False. Indicates whether problems occurred in parsing the city name.

In this example, the following changes were made to the input data:

Joe Smith was separated into separate columns for First_Name_Standardized and Last_Name.
Joe was standardized into JOSEPH and Suite was standardized into STE.
Normandale Lake was corrected to NORMANDALE LAKE BLVD.
The first portion of the postal code, 55437, was augmented with the ZIP+4 code to read 55437-3813.
Latitude and longitude locations were added.
The records were tested in various ways, and the good records are directed to a different target from the ones that have problems.

About Postal Reporting

All address lists used to produce mailings for discounted automation postal rates must be matched by postal report-certified software. Certifications depend on the third-party vendors of name and address software and data. The certifications may include the following:

United States Postal Service: Coding Accuracy Support System (CASS)
Canada Post: Software Evaluation and Recognition Program (SERP)
Australia Post: Address Matching Approval System (AMAS)

United States Postal Service CASS Certification

The Coding Accuracy Support System (CASS) was developed by the United States Postal Service (USPS) in cooperation with the mailing industry. The system provides mailers a common platform to measure the quality of address-matching software, focusing on the accuracy of five-digit ZIP Codes, ZIP+4 Codes, delivery point codes, and carrier route codes applied to all mail. All address lists used to produce mailings for automation rates must be matched by CASS-certified software.

To meet USPS requirements, the mailer must submit a CASS report in its original form to the USPS.

Canada Post SERP Certification

Canada Post developed a testing program called Software Evaluation and Recognition Program (SERP), which evaluates software packages for their ability to validate, or validate and correct, mailing lists to Canada Post requirements. Postal programs that meet SERP requirements are listed on the Canada Post Web site.

Canadian postal customers who use Incentive Lettermail, Addressed Admail, and Publications Mail must meet the Address Accuracy Program requirements. Customers can obtain a Statement of Accuracy by comparing their databases to Canada Post's address data.

Australia Post AMAS Certification

The Address Matching Approval System (AMAS) was developed by Australia Post to improve the quality of addressing. It provides a standard by which to test and measure the ability of address-matching software to:

Correct and match addresses against the Postal Address File (PAF).
Append a unique Delivery Point Identifier (DPID) to each address record, which is a step toward barcoding mail.

AMAS enables companies to develop address matching software which:

Prepares addresses for barcode creation.
Ensures quality addressing.
Enables qualification for discounts on PreSort letter lodgements.

PreSort Letter Service prices are conditional upon customers using AMAS Approved Software with Delivery Point Identifiers (DPIDs) being current against the latest version of the PAF.

A declaration that the mail was prepared appropriately must be made when using the Presort Lodgement Document, available from post offices.

Input Role Descriptions

For each attribute you select for Name or Address cleansing, you must specify an input role that indicates the type of data that is stored in the source attribute. Warehouse Builder provides a set of predefined input roles from which you can select the most suitable one for your data.

For example, the Employees table contains the columns last_name and city. You can select the Last Name and City respectively for these columns.

Table 24-20 describes the input roles for the Name and Address Operator.

Table 24-20 Name and Address Operator Input Roles

Input Role	Description
Pass Through	Any attribute that requires no processing.
First Name	First name, nickname, or shortened version of the first name.
Middle Name	Middle name or initial. Use when there is only one middle name, or for the first of several middle names; for example, 'May' in Ethel May Roberta Louise Mertz.
Middle Name 2	Second middle name; for example, 'Roberta' in Ethel May Roberta Louise Mertz.
Middle Name 3	Third middle name; for example, 'Louise' in Ethel May Roberta Louise Mertz.
Last Name	Last name or surname.
First Part Name	First part of the Person name, including: Pre name First name Middle name(s) Use when these components are contained in one source column.
Last Part Name	Last part of Person Name, including: Last name Post Name Use when these components are all contained in one source column.
Pre Name	Information that precedes and qualifies the name; for example, Ms., Mr., or Dr.
Post Name	Generation or other information qualifying the name; for example, Jr. or Ph.D.
Person	Full person name, including: First Part Name (consisting of Pre Name, First Name, and Middle Names) Last Part Name (consisting of Last Name and Post Name) Use when these components are all contained in one source column.
Person 2	Designates a second person if the input includes multiple personal contacts.
Person 3	Designates a third person if the input includes multiple personal contacts.
Firm Name	Name of the company or organization.
Primary Address	Box, route, or street address, including: Street name House number City map grid direction; for example, SW or N Street type; for example, Avenue, Street, or Road. This does not include the Unit Designator or the Unit Number.
Secondary Address	The second part of the street address, including: Unit Designator Unit Number For example, in a secondary address of Suite 2100, the Unit Designator is STE (a standardization of 'Suite') and the Unit Number is 2100.
Address	Full address line, including: Primary Address Secondary Address Use when these components share one column.
Address 2	Generic address line.
Neighborhood	Neighborhood or barrio, common in South and Latin American addresses.
Locality Name	The city (shi) or island (shima) in Japan.
Locality 2	The ward (ku) in Japan.
Locality 3	The district (machi) or village (mura) in Japan.
Locality 4	The subdistrict (aza, bu, chiwari, or sen) in Japan.
City	Name of city.
State	Name of state or province.
Postal Code	Postal code, such as a ZIP code in the United States or a postal code in Canada.
Country Name	Full country name.
Country Code	The ISO 3166-1993 (E) two- or three-character country code. For example, US or USA for United States; CA or CAN for Canada.
Last Line	Last address line, including: City State or province Postal code Use when these components are all contained in one source column.
Last Line 2	For Japanese adaptors, specifies additional line information that appears at the end of an address.
Line1... Line10	Use for free-form name, business, personal, and address text of any type. These roles do not provide the parser with any information about the data content. Whenever possible, use the discrete input roles provided instead.

Descriptions of Output Components

Use output components to define attributes that will store data cleansed by the Name and Address operator. Any attributes with an input role of Pass Through are automatically displayed as output components. You can define additional output components to store cleansed data.

Categories of Output Components

Output components are grouped in the following categories:

Pass Through
Name
Address
Extra Vendor
Error Status
Country-Specific

Pass Through

The Pass Through output component is for any attribute that requires no processing. When you create a Pass Through input role, the corresponding Pass Through output component is created automatically. You cannot edit a Pass Through output component, but you can edit the corresponding input role.

Name

Table 24-21 describes the Name output components. Many components can be used multiple times to process a record, as noted in the table. For example, in records with two occurrences of Firm Name, you can extract both by adding two output attributes. Assign one as the First instance, and the other as the Second instance.

Table 24-21 Name Output Components

Subfolder	Output Component	Description
None	Pre Name	Title or salutation appearing before a name; for example, Ms. or Dr. Can be used multiple times.
None	First Name Standardized	Standard version of first name; for example, Theodore for Ted or James for Jim. Can be used multiple times.
None	Middle Name Standardized	Standardized version of the middle name; for example, Theodore for Ted or James for Jim. Use when there is only one middle name, or for the first of several middle names. Can be used multiple times.
None	Middle Name 2 Standardized	Standardized version of the second middle name; for example, Theodore for Ted or James for Jim. Can be used multiple times.
None	Middle Name 3 Standardized	Standardized version of the third middle name; for example, Theodore for Ted or James for Jim. Can be used multiple times.
None	Post Name	Name suffix indicating generation; for example, Sr., Jr., or III. Can be used multiple times.
None	Other Post Name	Name suffix indicating certification, academic degree, or affiliation; for example, Ph.D., M.D., or R.N. Can be used multiple times.
None	Title	Personal title, for example, Manager.
None	Name Designator	Personal name designation; for example, ATTN (to the attention of) or C/O (care of). Can be used multiple times.
None	Relationship	Information related to another person; for example, Trustee For. Can be used multiple times.
None	SSN	Social security number.
None	Email Address	E-mail address.
None	Phone Number	Telephone number.
None	Name/Firm Extra	Extra information associated with the firm or personal name.
None	Person	First name, middle name, and last name. Can be used multiple times.
Person	First Name	The first name found in the input name. Can be used multiple times.
Person	Middle Name	Middle name or initial. Use this for a single middle name, or for the first of several middle names; for example, 'May' in Ethel May Roberta Louise Mertz. Can be used multiple times.
Person	Middle Name 2	Second middle name; for example, 'Roberta' in Ethel May Roberta Louise Mertz. Can be used multiple times.
Person	Middle Name 3	Third middle name; for example, 'Louise' in Ethel May Roberta Louise Mertz. Can be used multiple times.
Person	Last Name	Last name or surname. Can be used multiple times.
Derived	Gender	Probable gender: M = Male F = Female N = Neutral (either male or female) Blank = Unknown Can be used multiple times.
Derived	Person Count	Number of persons the record references; for example, a record with a Person name of 'John and Jane Doe' has a Person Count of 2.
Business	Firm Name	Name of the company or organization, including divisions. Can be used multiple times.
Business	Firm Count	Number of firms referenced in the record. Can be used multiple times.
Business	Firm Location	Location within a firm; for example, Accounts Payable

Address

Table 24-22 describes the Address output components. In records with dual addresses, you can specify which line is used as the Normal Address (and thus assigned to the Address component) and which is used as the Dual Address for many output components, as noted in the table.

Table 24-22 Address Output Components

Subfolder	Output Component	Description
None	Address	Full address line, including: Primary Address Secondary Address Can be used as the Normal Address or the Dual Address.
None	Primary Address	Box, route, or street address, including: Street name House number City map grid direction; for example, SW or N Street type; for example, Avenue, Street, or Road. Does not include the `Unit Designator` or the `Unit Number`. Can be used as the Normal Address or the Dual Address.
Primary Address	Street Number	Number that identifies the address, such as a house or building number, sometimes referred to as the primary range. For example, in 200 Oracle Parkway, the `Street Number` value is 200. Can be used as the Normal Address or the Dual Address.
Primary Address	Pre Directional	Street directional indicator appearing before the street name; for example, in 100 N University Drive, the `Pre Directional` value is 'N'. Can be used as the Normal Address or the Dual Address.
Primary Address	Street Name	Name of street. Can be used as the Normal Address or the Dual Address.
Primary Address	Primary Name 2	Second street name, often used for addresses at a street intersection.
Primary Address	Street Type	Street identifier; for example, ST, AVE, RD, DR, or HWY. Can be used as the Normal Address or the Dual Address.
Primary Address	Post Directional	Street directional indicator appearing after the street name; for example, in 100 15th Ave. S., the `Post Directional` value is 'S'. Can be used as the Normal Address or the Dual Address.
None	Secondary Address	The second part of the street address, including: Unit Designator Unit Number For example, in a secondary address of Suite 2100, `Unit Designator` is 'STE' (a standardization of 'Suite') and `Unit Number` is '2100'. Can be used as the Normal Address or the Dual Address.
Secondary Address	Unit Designator	Type of secondary address, such as APT or STE. For example, in a secondary address of Suite 2100, `Unit Designator` is 'STE' (a standardization of 'Suite'). Can be used as the Normal Address or the Dual Address.
Secondary Address	Unit Number	A number that identifies the secondary address, such as the apartment or suite number. For example, in a secondary address of Suite 2100, `Unit Number` is '2100'. Can be used as the Normal Address or the Dual Address.
Secondary Address	Non-postal Secondary Address	A secondary address that is not in official postal format.
Secondary Address	Non-postal Unit Designator	A unit designator that is not in official postal format.
Secondary Address	Non-postal Unit Number	A unit number that is not in official postal format.
Address	Last Line	Final address line, including: City state, province, or county Formatted postal code if the address was fully assigned
Last Line	Neighborhood	Neighborhood or barrio, common in South and Latin American addresses.
Last Line	City	Name of city. The U.S. city names may be converted to United States Postal Service preferred names.
Last Line	City Abbreviated	Abbreviated city name, composed of 13 characters for the United States.
Last Line	City Abbreviated 2	Alternative abbreviation for the city name.
Last Line	Alternate City	An alternate name for a city that may be referenced by more than one name. In the United States, a city may be referenced by its actual name or the name of a larger urban area. For example, Brighton Massachusetts may have Boston as an alternate city name.
Last Line	Locality Code	The last three digits of the International Mailsort Code, which represents a geographical region or locality within each country. Locality Codes are numeric in the range 000-999.
Last Line	Locality Name	In the United Kingdom, the following address is assigned Locality Name KNAPHILL: Chobham Rd Knaphill Woking GU21 2TZ
Last Line	Locality 2	The ward (ku) in Japan.
Last Line	Locality 3	The district (machi) or village (mura) in Japan.
Last Line	Locality 4	The subdistrict (aza, bu, chiwari, or sen) in Japan.
Last Line	County Name	The name of a county in the United Kingdom, United States, or other country.
Last Line	State	Name of state or province.
Last Line	Postal Code	Full postal code with spaces and other non-alphanumeric characters removed.
Last Line	Postal Code Formatted	Formatted version of postal code that includes spaces and other non-alphanumeric characters, such as dashes.
Last Line	Delivery Point	A designation used in the United States and Australia. For the United States, this is the two-digit postal delivery point, which is combined with a full nine-digit postal code and check digit to form a delivery point bar code. For Australia, this is a nine-digit delivery point.
Last Line	Country Code	The ISO 3166-1993 (E) two-character country code, as defined by the International Organization for Standardization; for example, 'US' for United States or 'CA' for Canada.
Last Line	Country Code 3	The ISO 3166-1993 (E) three-character country code, as defined by the International Organization for Standardization; for example, 'USA' for United States, 'FRA' for France, or 'UKR' for Ukraine.
Last Line	Country Name	The full country name.
Address	Address 2	A second address line, typically used for Hong Kong addresses that have both a street address and a building or floor address.
Address	Last Line 2	Additional information that appears at the end of an address in Japan.
Other Address Line	Box Name	The name for a post office box address; for example, for 'PO Box 95', the Box Name is 'PO BOX'. Can be used as the Normal Address or the Dual Address.
Other Address Line	Box Number	The number for a post office box address; for example, for 'PO Box 95', the Box Number is '95'. Can be used as the Normal Address or the Dual Address.
Other Address Line	Route Name	Route name for a rural route address. For an address of 'Route 5 Box 10', the Route Name is 'RTE' (a standardization of 'Route'). Can be used as the Normal Address or the Dual Address.
Other Address Line	Route Number	Route number for a rural route address. For an address of 'Route 5 Box 10', the Route Number is '5'. Can be used as the Normal Address or the Dual Address.
Other Address Line	Building Name	Building name, such as 'Cannon Bridge House'. Building names are common in the United Kingdom.
Other Address Line	Complex	Building, campus, or other complex. For example, USS John F. Kennedy Shadow Green Apartments Cedarvale Gardens Concordia College You can use an the Instance field in the Output Components dialog box to specify which complex should be returned in cases where an address has more than one complex.
Other Address Line	Miscellaneous Address	Miscellaneous address information. In records with multiple miscellaneous fields, you can extract them by specifying which instance to use in the Output Components page.
Geography	Latitude	Latitude in degrees north of the equator: Positive for north of the equator; negative for south (always positive for North America).
Geography	Longitude	Longitude in degrees east of the Greenwich Meridian: positive for east of GM; negative for west (always negative for North America).
Geography	Geo Match Precision	Indicates how closely the location identified by the latitude and longitude matches the address.

Extra Vendor

Twenty components are open for vendor-specified uses.

Error Status

Table 24-23 describes the Error Status output components. Refer to "Handling Errors in Name and Address Data" for usages notes on the Error Status components.

Table 24-23 Error Status Output Components

Subfolders	Output Component	Description
Name and Address	Is Good Group	Indicates whether the name group, address group, or name and address group was processed successfully. T = For name groups, the name has been successfully parsed. For address groups, the address has been found in a postal matching database if one is available, or has been successfully parsed if no postal database is installed. For name and address groups, both the name and the address have been successfully processed. F = The group was not parsed successfully. Using this flag in conjunction with another flag, such as the `Is Parsed` flag, followed by the Splitter operator, enables you to isolate unsuccessfully parsed records in their own target, where you can address them separately.
Name and Address	Is Parsed	Indicates whether the name or address was parsed: T = The name or address was parsed successfully, although some warning conditions may have been flagged. F = The name or address cannot be parsed. Check the status of warning flags such as `Name Warning` or `City Warning`.
Name and Address	Parse Status	Postal matching software parse status code.
Name and Address	Parse Status Description	Text description of the postal matching software parse status.
Name Only	Is Good Name	Indicates whether the name was parsed successfully: T = The name was parsed successfully, although some warning conditions may have been flagged. F = The name cannot be parsed.
Name Only	Name Warning	Indicates whether the parser found unusual or possibly erroneous data in a name: T = The parser had difficulty parsing a name or found unusual data. Check the `Parse Status` component for the cause of the warning. F = No difficulty parsing name.
Address Only	Is Good Address	Indicates whether the address was processed successfully: T = Successfully processed. Either the address was found in the postal matching database or, if no postal matching database is installed for the country indicated by the address, the address was successfully parsed. F = Not successfully processed. If a postal matching database is installed for the country indicated by the address, the address was not found in the database. If no postal matching database is available for the country, the address cannot be parsed. Use this component when you have a mix of records from both postal-matched and non-postal-matched countries.
Address Only	Is Found	Indicates whether the address is listed in the postal matching database for the country indicated by the address: T = The address was found in a postal matching database. F = The address was not found in a postal matching database. This status may indicate either that the address is not a legal address, or that postal matching is not available for the country. This flag is true only if all of the other 'Found' flags are true. If postal matching is available, this flag is the best indicator of record quality.
Address Only: Is Found	City Found	T = The postal matcher found the city; otherwise, F.
Address Only: Is Found	Street Name Found	T = The postal matcher found the street name; otherwise, F.
Address Only: Is Found	Street Number Found	T = The postal matcher found the street number within a valid range of numbers for the named street, otherwise, F.
Address Only: Is Found	Street Components Found	T = The postal matcher found the street components, such as the Pre Directional or Post Directional; otherwise, F.
Address Only: Is Found	Non-ambiguous Match Found	Indicates whether the postal matcher found a matching address in the postal database: T = The postal matcher found a match between the input record and a single entry in the postal database. F = The address is ambiguous. The postal matcher found that the address matched several postal database entries and could not make a selection. For example, if the input address is '100 4th Avenue,' but the postal database contains '100 4th Ave N' and '100 4th Ave S,' the input's missing directional causes the match to fail.
Address Only	City Warning	T = The parser found unusual or possibly erroneous data in a city; otherwise, F.
Address Only	Street Warning	T = The parser found unusual or possibly erroneous data in a street address otherwise, F.
Address Only	Is Address Verifiable	T = Postal matching is available for the country of the address; otherwise, F. F does not indicate whether or not a postal matching database is installed for the country in the address. It only indicates that matching is not available for a particular address.
Address Only	Address Corrected	Indicates whether the address was corrected in any way during matching. Standardization is not considered correction in this case. T = Some component of the address was changed, aside from standardization. One of the other Corrected flags must also be true. F = No components of the address were changed, with the possible exception of standardization.
Address Only: Address Corrected	Postal Code Corrected	T = The postal code was corrected during matching, possibly by the addition of a postal extension; otherwise, F.
Address Only: Address Corrected	City Corrected	T = The city name was corrected during matching; otherwise, F. Postal code input is used to determine the city name preferred by the postal service.
Address Only: Address Corrected	Street Corrected	T = The street name was corrected during matching; otherwise, F. Some correct street names may be changed to an alternate name preferred by the postal service.
Address Only: Address Corrected	Street Components Corrected	T = One or more street components, such as `Pre Directional` or `Post Directional`, were corrected during matching.
Address Only	Address Type	Type of address. The following are common examples; actual values vary with vendors of postal matching software: B= Box F = Firm G= General Delivery H= High-rise apartment or office building HD= High-rise default, where a single Zip+4 postal code applies to the entire building. The Name and Address operator can detect a finer level of postal code assignment if a floor or suite address is provided, in which case the record is treated as an H type, with a more specific Zip+4 code for that floor or suite. M= Military P= Post Office Box R= Rural Code S= Street
Address Only	Parsing Country	Country parser that was used for the final parse of the record.

Country-Specific

Table 24-24 describes the output components that are specific to a particular country.

Table 24-24 Country-Specific Output Components

Subfolder	Output Component	Description
United States	ZIP5	The five-digit United States postal code.
United States	ZIP4	The four-digit suffix that is added to the five-digit United States postal code to further specify location.
United States	Urbanization Name	Urban unit name used in Puerto Rico.
United States	LACS Flag	T = Address requires a LACS conversion and should be submitted to a LACS vendor; otherwise, F. The Locatable Address Conversion System (LACS) provides new addresses when a 911 emergency system has been implemented. 911 address conversions typically involve changing rural-style addresses to city-style street addresses, but they may involve renaming or renumbering existing city-style addresses.
United States	CART	Four-character USPS Carrier route.
United States	DPBC Check Digit	Check digit for forming a delivery point bar code.
United States	Automated Zone Indicator	T = The mail in this zip code is sorted by bar code sorting equipment; otherwise, F.
United States	Urban Indicator	T = An address is located within an urban area; otherwise, F.
United States	Line of Travel	United States Postal Service (USPS) line of travel
United States	Line of Travel Order	United States Postal Service (USPS) line of travel order
United States: Census/Geography	Metropolitan Statistical Area	Metropolitan Statistical Area (MSA) number. For example, '0000' indicates that the address does not lie within any MSA, and typically indicates a rural area.
United States: Census/Geography	Minor Census District	Minor Census District.
United States: Census/Geography	CBSA Code	A 5-digit Core Based Statistical Area code that identifies metropolitan and micropolitan areas.
United States: Census/Geography	CBSA Descriptor	Indicates whether the CBSA is metropolitan (population of 50,000 or more) or micropolitan (population of 10,000 to 49,999).
United States: Census/Geography	FIPS Code	The complete (state plus county) code assigned to the county by the Federal Information Processing Standard (FIPS). Because FIPS county codes are unique within a state, a complete FIPS Code includes the two-digit state code followed by the three-digit county code.
United States: Census/Geography	FIPS County	The three-digit county code as defined by the Federal Information Processing Standard (FIPS).
United States: Census/Geography	FIPS Place Code	The five-digit place code as defined by the Federal Information Processing Standard (FIPS).
United States: Geography	Census ID	United States Census tract and block-group number. The first six digits are the tract number; the final digit is the block-group number within the tract. These codes are used for matching to demographic-coding databases.
Canada	Installation Type	A type of Canadian postal installation: STN= Station RPO = Retail Postal Outlet For example, for the address, 'PO Box 7010, Scarborough ON M1S 3C6,' the Installation Type is 'STN'.
Canada	Installation Name	Name of a Canadian postal installation. For example, for the address, 'PO Box 7010, Scarborough ON M1S 3C6,' the Installation Name is 'AGINCOURT'.
Hong Kong	Delivery Office Code	A mailing code used in Hong Kong. For example, the following address is assigned the Delivery Office Code 'WCH': Oracle 39/F The Lee Gardens 33 Hysan Ave Causeway Bay
Hong Kong	Delivery Beat Code	A mailing code used in Hong Kong. For example, the following address is assigned the Delivery Beat Code 'S06': Oracle 39/F The Lee Gardens 33 Hysan Ave Causeway Bay

Handling Errors in Name and Address Data

Name and Address parsing, like any other type of parsing, depends on identification of keywords and patterns containing those keywords. Free-form name and address data difficult to parse because the keyword set is large and it is never 100% complete. Keyword sets are built by analyzing millions of records, but each new data set is likely to contain some undefined keywords.

Because most free-form name and address records contain common patterns of numbers, single letters, and alphanumeric strings, parsing can often be performed based on just the alphanumeric patterns. However, alphanumeric patterns may be ambiguous or a particular pattern may not be found. Name and Address parsing errors set parsing status codes that you can use to control data mapping.

Since the criteria for quality vary among applications, numerous flags are available to help you determine the quality of a particular record. For countries with postal matching support, use the Is Good Group flag, because it verifies that an address is a valid entry in a postal database. Also use the Is Good Group flag for U.S. Coding Accuracy Support System (CASS) and Canadian Software Evaluation and Recognition Program (SERP) certified mailings.

Unless you specify postal reporting, an address does not have to be found in a postal database to be acceptable. For example, street intersection addresses or building names may not be in a postal database, but they may still be deliverable. If the Is Good Group flag indicates failure, additional error flags can help determine the parsing status.

The Is Parsed flag indicates success or failure of the parsing process. If Is Parsed indicates parsing success, you may still wish to check the parser warning flags, which indicate unusual data. You may want to check those records manually.

If Is Parsed indicates parsing failure, you must preserve the original data to prevent data loss.

Use the Splitter operator to map successful records to one target and failed records to another target.

Using the Match-Merge Operator to Eliminate Duplicate Source Records

Use the Match-Merge operator to identify matching records in a data source and merge them into a single record.

The Match-Merge operator has one input group and two output groups, Merge and Xref. The source data is mapped to the input group. The Merge group contains records that have been merged after the matching process is complete. The Xref group provides a record of the merge process. Every record in the input group will have a corresponding record in the Xref group. This record may contain the original attribute values and the merged attributes.

The Match-Merge operator uses an ordered record stream as input. From this stream, it constructs the match bins. From each match bin, matched sets are constructed. From each matched set, a merged record is created. The initial query will contain an ORDER BY clause consisting of the match bin attributes.

Steps to Use a Match-Merge Operator

To match and merge source data using the Match-Merge operator:

Drag and drop the operators representing the source data and the operator representing the merged data onto the mapping editor canvas:

For example, if your source data is stored in a table, and the merged data will be stored in another table, drag and drop two Table operators that are bound to the tables onto the canvas.
Drag and drop a Match-Merge operator onto the mapping editor canvas.

The MatchMerge wizard is displayed.
On the Name and Address page, the Name field contains a default name for the operator. You can change this name or accept the default name.

You can enter an optional description for the operator.
On the Groups page, you can rename groups or provide descriptions for them.

This page contains the following three groups:
- INGRP1: Contains input attributes.
- MERGE: Contains the merged records (usually this means fewer records than INGRP1).
- XREF: Contains the link between the original and merged data sets. This is the tracking mechanism used when a merge is performed.
On the Input Connections page, move the attributes that you want to match and merge form the Available section to the Mapped Attributes section. Click Next.

The Available Attributes section of this page displays nodes for each operator on the canvas. Expand a node to display the attributes contained in the operator, select the attributes, and use the shuttle arrows to move selected attributes to the Mapped Attributes section.

Note:
The Match-Merge operator requires an ordered input data set. If you have source data from more than one operators, use a Set Operation operator to combine the data and obtain an ordered data set.
On the Input Attributes page, review the attribute data types and lengths.

In general, if you go through the wizard, you need not change any of these values. Warehouse Builder populates them based on the output attributes.
On the Merge Output page, select the attributes to be merged from the input attributes.

These attributes appear in the Merge output group (the cleansed group). The attributes in this group retain the name and properties of the input attributes.
On the Cross Reference Output page, select attributes for the XREF output group.

The Source Attributes section contains all the input attributes and the Merge attributes you selected on the Merge Output page. The attributes from the Merge group are prefixed with MM. The other attributes define the unmodified input attribute values. Ensure that you select at least one attribute from the Merge group that will provide a link between the input and Merge groups.
On the Match Bins page, specify the match bin attributes. These attributes are used to group source data into match bins.

After the first deployment, you can choose whether to match and merge all records or only new records. To match and merge only the new records, select Match New Records Only.

You must designate a condition that identifies new records. The match-merge operator treats the new records in the following way:
- No matching is performed for any records in a match bin unless the match bin contains new record.
- Old records will not be compared with each other.
- A matched record set will not be presented to the merge processing unless the matched record set contains a new record.
- An old record will not be presented to the Xref output unless the record is matched to a new record.
For more information about match bin attributes and match bins, see "Overview of the Matching and Merging Process".
On the Define Match Rules page, define the match rules that will be used to match the source data.

Match rules can be active or passive. A passive match rule is generated but not automatically invoked. You must define at least one active match rule.

For more information about the match rules, the types of match rules you can define, and the steps used to define them, see "Match Rules".
On the Merge Rules page, define the rules that will be used to merge the sets of matched records created from the source data.

You can define Merge rules for each attribute in a record or for the entire record. Warehouse Builder provides different types of Merge rules.

For more information about the type of Merge rules and the steps to create Merge rules, see "Merge Rules".
On the Summary page, review your selections. Click Back to modify any selection you made. Click Next to complete creating the Match-Merge operator.
Map the Merge group of the Match-Merge operator to the input group of the operator that stores the merged data.

Designing Mappings with a Match-Merge Operator

Figure 24-3 shows a mapping you can design using a Match-Merge operator. Notice that the Match-Merge operator is preceded by a Name and Address operator, NAMEADDR, and a staging table, CLN_CUSTOMERS. You can design your mapping with or without a Name and Address operator. Preceding the Match-Merge operator with a Name and Address operator provides clean and standardized data before starting time consuming match and merge operations

Figure 24-3 Match-Merge Operator in a Mapping

Description of "Figure 24-3 Match-Merge Operator in a Mapping"

Whether you include a Name and Address operator or not, be aware of the following considerations as you design your mapping:

Operating modes: Operators may accept either set-based or row-based input and generate either set-based or row-based output. SQL is set-based, so a set of records is processed at one time. PL/SQL is row-based, so each row in processed separately. When the Match-Merge operator matches records, it compares each row with the subsequent row in the source and generates row-based code only. A mapping that contains a Match-Merge operator can only run in row-based mode.
SQL based operators before Match-Merge: The Match-Merge operator accepts set-based SQL input, but generates only row-based PL/SQL output. Any operators that generate only SQL code must precede the Match-Merge operator. For example, the Joiner, Key Lookup, and Set operators generate set-based SQL output, so they must precede Match-Merge. If set-based operators appear after Match-Merge, then the mapping is invalid.
PL/SQL input: The Match-Merge operator requires SQL input except from another Match-Merge operator, as described in "Using Two Match-Merge Operators". If you want to precede a Match-Merge with an operator that generates only PL/SQL output such as the Name and Address operator, you must first load the data into a staging table.
Refining Data from Match-Merge operators: To achieve greater data refinement, map the XREF output from one Match-Merge operator into another Match-Merge operator. This scenario is the one exception to the SQL input rule for Match-Merge operators. With additional design elements, the second Match-Merge operator accepts PL/SQL. For more information, see "Using Two Match-Merge Operators".

Using Two Match-Merge Operators

Most match-merge operations can be performed by a single match-merge operator. However, if you are directing the output to two different targets, then you may need to use two match-merge operators in succession.

For example, when householding name and address data, you may need to merge the data first for addresses and then again for names. Assuming you map the MERGE output to a target table, you can map the XREF group to another Match-Merge operator. Although you could map the XREF group to a staging table, this practice can lead to significant loss of performance.

Figure 24-4 shows a mapping that uses two match-merge operators. The XREF group from MM is mapped directly to MM_1. For this mapping to be valid, you must assign the Match ID generated for the first XREF group as the Match Bin rule on the second Match-Merge operator.

Figure 24-4 Householding Data: XREF Group Merged to Second Match-Merge Operator

Description of "Figure 24-4 Householding Data: XREF Group Merged to Second Match-Merge Operator"

Using the Name and Address Operator to Cleanse Source Data

The Name and Address operator accepts one PL/SQL input and generates one PL/SQL output.

If you experience time-out errors, you may need to increase the socket time-out setting of the Name and Address Server. The time-out setting is the number of seconds the server will wait for a parsing request from a mapping before the server drops a connection. The default setting is 600 seconds (10 minutes). After the server drops a connection because of inactivity, subsequent parsing requests fail with a NAS-00021 error.

For most mappings, long time lapses between parsing requests are rare. However, maps operating in row based mode with a filter operator may have long time lapses between record parsing requests, because of the inefficiency of filtering records in row based mode. For this type of mapping, you may need to increase the socket time-out value to prevent connections from being dropped.

To increase the socket time-out setting, see "Managing the Name and Address Server".

Creating a Mapping with a Name and Address Operator

The Name and Address operator has one input group and one output group.

To create a mapping with a Name and Address operator:

Drag and drop the operators representing the source data and the operator representing the cleansed data onto the mapping editor canvas:

For example, if your source data is stored in a table, and the cleansed data will be stored in another table, drag and drop two Table operators that are bound to the tables onto the canvas.
Drag and drop a Name and Address operator onto the mapping editor canvas.

The Name and Address wizard is displayed.
On the Name page, specify a name and an optional description for the Name and Address operator.

Alternately, you can choose to retain the default name displayed in the Name field.
On the Definitions page, select values that define the type of source data.

See "Specifying Source Data Details and Setting Parsing Type".
On the Groups page, optionally rename the input and output groups.

The Name and Address operator has one input group, INGRP1, and one output group, OUTGRP1. You cannot edit, add, or delete groups. If the input data requires multiple groups, create a separate Name and Address operator for each group.
On the Input Connections page, select attributes from any operator in your mapping that you want to copy and map to the Name and Address operator.

To complete the Input Connections page for an operator:
1. Select complete groups or individual attributes from the Available Attributes panel.
  
  To search for a specific attribute or group by name, type the text in Search for and click Go. To find the next match, click Go again.
  
  Hold the Shift key down to select multiple groups or attributes. If you want to select attributes from different groups, you must first combine the groups with a Joiner or Set operator.
  
  Note:
  If you have not created any operators for the source data, the Available Attributes section is empty.
2. Use the right arrow button between the two panels to move your selections to the Mapped Attributes panel.
  
  The Mapped Attributes section lists the attributes that will be processed by the Name and Address operator.
On the Input Attributes page, assign input roles to each attribute you selected on the Input Attributes page.

Input roles indicate the type of name and address information that resides in a line of data. Whenever possible, choose discrete roles (such as City, state, and Postal Code) rather than non-discrete ones (such as Last Line). Discrete roles help in better parsing.

See Also:
"Input Role Descriptions".

For attributes that have the input role set to Pass Through, specify the data type details using the Data Type, Length, Precision, Scale, and Seconds Precision fields.
On the Output Attributes page, define output attributes that determine how the Name and Address operator handles parsed data. The output attribute properties characterize the data extracted from the parser output.

Any attributes that have the Pass Through input role assigned are automatically listed as output attributes. You can add additional output attributes.

Note:
The attributes for output components with the Pass Through role cannot be changed

To add output attributes:
1. Click Add.
  
  A row is added for the new output attribute with a default name. You can rename the output attribute by selecting the name and typing the new name.
2. Click the Ellipses on the Output Component field to select an output component for the attribute.
  
  See Also:
  "Descriptions of Output Components" for the descriptions of output components
  
  Ensure that you add error handling flags such as Is Parsed, Is Good Name, and Is Good Address. You can use these flags with the Splitter operator to separate good records from the records with errors and load them into different targets.
3. Specify the data type details for the output attribute using the Data Type, Length, Precision, Scale, and Seconds Precision fields.
For countries that support address correction and postal matching, use the Postal Report page to specify the details for the postal report.

See "Specifying Postal Report Details".

Specifying Source Data Details and Setting Parsing Type

Use the Definitions page or the Definitions tab to provide information about your source data and to specify the type of parsing to be performed on the source data. Set the following values: Parsing Type, Primary Country, and Dual Address Assignment.

Parsing Type

Select one of the following parsing types:

Name Only: Select this option when the input data contains only name data. Names can include both personal and business names. Selecting this option instead of the more generic Name and Address option may improve performance and accuracy, depending on the adapter.
Address Only: Select this option when the input data contains only address data and no name data. Selecting this option instead of the more generic Name and Address option may improve performance and accuracy, depending on the adapter.
Name and Address: Select this option when the input data contains both name and address data.

Note:

You can only specify the parsing type when you first add the Name and Address operator to your mapping. You cannot modify the parsing type in the editor.

Primary Country

Select the country that best represents the country distribution of your data. The primary country is used by some providers of name and address cleansing software as a hint for the appropriate parser or parsing rules to use on the initial parse of the record. For other name and address service providers, external configuration of their installation controls this behavior.

Dual Address Assignment

A dual address contains both a Post Office (PO) box and a street address for the same address record. For records that have dual addresses, your selection determines which address becomes the normal address and which address becomes the dual address. A sample dual address is:

PO Box 2589
4439 Mormon Coulee Rd
La Crosse WI 54601-8231

Note that the choice for Dual Address Assignment affects which postal codes are assigned during postal code correction, because the street address and PO box address may correspond to different postal codes.

Street Assignment: The street address is the normal address and the PO Box address is the dual address. This means that the Address component is assigned the street address. In the preceding example, the Address is 4439 MORMON COULEE RD. This choice corrects the postal code to 54601-8220.
PO Box Assignment: The PO Box address is the normal address and the street address is the dual address. This means that the Address component is assigned the Post Office (PO) box address. In the preceding example, the Address is PO BOX 2589. This choice corrects the postal code to 54602-2589.
Closest to Last Line: Whichever address occurs closest to the last line is the normal address; the other is the dual address. This means that the Address component is assigned the address line closest to the last line. In the preceding example, the Address is 4439 MORMON COULEE RD. This choice corrects the postal code to 54601-8220.

This option has no effect for records having a single street or PO box address.

Note:

Dual Address Assignment may not be supported by all name and address cleansing software providers.

Specifying Postal Report Details

Country certification varies with different vendors of name and address cleansing software. The most common country certifications are United States, Canada, and Australia. The process provides mailers a common platform to measure the quality of address-matching software, focusing on the accuracy of postal codes (in the case of the United States, of five-digit ZIP Codes and ZIP+4 Codes), delivery point codes, and carrier route codes applied to all mail. Some vendors of name and address cleansing software may ignore these parameters and require external setup for generating postal reports. For more information, see "About Postal Reporting".

To specify postal reporting, select Yes in the Postal Report files and then provide values for the fields:

Processor Name: The use of this field varies with vendors of name and address cleansing software. Typically, this value appears on the United States Coding Accuracy Support System (CASS) report.

List Name: An optional reference field that appears on the United States and United Kingdom reports under the List Name section, but is not included in other reports. The list name provides a reference for tracking multiple postal reports; for example, 'July 2005 Promotional Campaign'.

Processor Address Lines: These address lines may appear on various postal reports. Various name and address cleansing software vendors use these fields differently. They often contain the full address of your company.

Managing the Name and Address Server

An external Name and Address server provides an interface between Oracle Database and third-party name and address processing libraries. This section discusses methods of configuring, starting, and stopping the Name and Address Server.

Configuring the Name and Address Server

The Name and Address operator generates PL/SQL code, which calls the UTL_NAME_ADDR package installed in the Runtime Schema. A private synonym, NAME_ADDR, is defined in the target schema to reference the UTL_NAME_ADDR package. The UTL_NAME_ADDR package calls Java packages, which send processing requests to an external Name and Address server, which then interfaces with third-party Name and Address processing libraries, such as Trillium.

You can use the server property file, NameAddr.properties, to configure server options. This file is located in owb/bin/admin under the Oracle home you specified when installing the server components. The following code illustrates several important properties with their default settings.

TraceLevel=0
SocketTimeout=180
ClientThreads=4
Port=4040

The TraceLevel property is often changed to perform diagnostics on server communication and view output from the postal matching program parser. Other properties are rarely changed.

TraceLevel: Enables output of file NASvrTrace.log in the owb/bin/admin folder. This file shows all incoming and outgoing data, verifies that your mapping is communicating with the Name and Address Server, and that the Name and Address Server is receiving output from the service provider. The trace log shows all server input and output and is most useful for determining whether any parsing requests are being made by an executing mapping. Set TraceLevel=1 to enable logging. However, tracing degrades performance and creates a large log file. Set TraceLevel=0 to disable logging for production.
SocketTimeOut: Specifies the number of seconds the name/address server will wait for a parsing request before closing the connection. You can increase this time to 1800 (30 minutes) when running concurrent mappings to prevent timing out.
ClientThreads: Specifies the number of threads used to service client connections. One client connection is made for each database session or slave session if a map is parallelized. Most maps are parallelized, and the number of parallel processes is proportional to the number of processors. On a single processor computer, two parallel processes are spawned for large maps. On a four processor computer, up to eight processes may be spawned. Parallelism may also be controlled by database initialization settings such as Sessions.

For the best performance, set ClientThreads to the maximum number of clients that will be connected simultaneously. The actual number of connected clients is recorded in NASvr.log after a map run. You should increase the value of ClientThreads when the number of client connections shown in the log is greater.

When the number of clients exceeds the number of threads, all clients are still serviced because the threads are shared among clients.
Port: Specifies the port on which the server listens and was initially assigned by the installer. This value may be changed if the default port conflicts with another process. If the port is changed, the port attribute must also be changed in the runtime_schema.nas_connection table to enable the utl_name_addr package to establish a connection.

Starting and Stopping the Name and Address Server

Whenever you edit the properties file or perform table maintenance, you must stop and restart the Name and Address Server for the changes to take effect.

To manually stop the Name and Addresss Server:

In Windows, run OWB_ORACLE_HOME/owb/bin/win32/NAStop.bat.
In UNIX, run OWB_ORACLE_HOME/owb/bin/unix/NAStop.sh.

You can automatically restart the Name and Address Server by invoking a mapping in Warehouse Builder You can also restart the server manually.

To manually restart the Name and Address Server:

In Windows, run OWB_ORACLE_HOME/owb/bin/win32/NAStart.bat.
In UNIX, run OWB_ORACLE_HOME/owb/bin/unix/NAStart.sh.

24 Data Quality Operators

About the Match-Merge Operator

Understanding Matching Concepts

Example of Matching and Merging Customer Data

Example of Multiple Match Rules

Example of Transitive Matching

Restrictions on Using the Match-Merge Operator

Overview of the Matching and Merging Process

Requirements for Matching and Merging Records

Process for Matching and Merging Records

Constructing Match Bins

Constructing Match Record Sets

Constructing Merge Records

Match Rules

Conditional Match Rules

Comparison Algorithms

Creating Conditional Match Rules

Weight Match Rules

Example of Weight Match Rules

Creating Weight Match Rules

Person Match Rules

Person Roles

Person Details

Creating Person Match Rules

Firm Match Rules

Firm Roles

Firm Details

Creating Firm Match Rules

Address Match Rules

Address Roles

Address Details

Creating Address Match Rules

Custom Match Rules

Creating Custom Match Rules

Merge Rules

Match ID Merge Rule

Rank and Rank Record Merge Rules

Sequence Merge Rule

Min Max and Min Max Record Merge Rules

Copy Merge Rule

Custom and Custom Record Merge Rules

About the Name and Address Operator

Example: Correcting Address Information

Example Input

Example Steps

Example Output

About Postal Reporting

United States Postal Service CASS Certification

Canada Post SERP Certification

Australia Post AMAS Certification

Input Role Descriptions

Descriptions of Output Components

Pass Through

Name

Address

Extra Vendor

Error Status

Country-Specific

Handling Errors in Name and Address Data

Using the Match-Merge Operator to Eliminate Duplicate Source Records

Steps to Use a Match-Merge Operator

Designing Mappings with a Match-Merge Operator

Using Two Match-Merge Operators

Using the Name and Address Operator to Cleanse Source Data

Creating a Mapping with a Name and Address Operator

Specifying Source Data Details and Setting Parsing Type

Parsing Type

Primary Country

Dual Address Assignment

Specifying Postal Report Details

Managing the Name and Address Server

Configuring the Name and Address Server

Starting and Stopping the Name and Address Server

Custom and Custom Record Merge Rule s