Oracle® Warehouse Builder User's Guide 10g Release 1 (10.1) Part Number B12146-02 |
|
|
View PDF |
This chapter contains additional information about using the Name and Address operator, discussed in Chapter 8, "Using Mapping Operators". This chapter contains the following topics:
Input roles indicate what kind of name or address information resides in a line of data. Each input attribute in the Name and Address operator must have an input role that most closely matches the data contained in the source attribute. Input roles can either be non-discrete, or line oriented, input roles for free-form data (such as 'Line1'); or they can be discrete roles for specific input attributes (such as 'Person', 'First Name', 'Primary Address', or 'City'). Discrete roles give the Name and Address operator more information about the content of the source attribute.
Table 20-1 lists input roles and descriptions for the Name and Address Operator.
Table 20-1 Name and Address Operator Input Roles
Each output component represents a discrete name or address entity, such as a title, a standardized first name, a street number, a street name, or a street type. An output component is assigned to the corresponding output attribute, and identifies the portion of a name or address that the attribute constitutes. Table 20-2 lists the Name and Address operator output components and their descriptions.
Table 20-2 Name and Address Operator Output Components
Parent Node(s) | Output Component | Description |
---|---|---|
Name |
Pre NameFoot 1 |
Title or salutation appearing before a name; for example, Ms. or Dr. |
Name |
First Name StandardizedFootref 1 |
Standard version of first name; for example, Theodore for Ted or James for Jim. |
Name |
Middle Name StandardizedFootref 1 |
Standardized version of the middle name; for example, Theodore for Ted or James for Jim. Used when there is only one middle name, or for the first of several middle names (such as ÒHerbertÓ in George Herbert Walker Bush). |
Name |
Middle Name 2 StandardizedFootref 1 |
Standardized version of the second middle name; for example, Theodore for Ted or James for Jim. |
Name |
Middle Name 3 StandardizedFootref 1 |
Standardized version of the third middle name; for example, Theodore for Ted or James for Jim. |
Name |
Post NameFootref 1 |
Name suffix indicating generation; for example, Sr., Jr., or III. |
Name |
Other Post NameFootref 1 |
Name suffix indicating certification, academic degree, or affiliation; for example, Ph.D., M.D., or R.N. |
Name |
Name DesignatorFootref 1 |
Personal name designation; for example, ÒATTNÓ (to the attention of) or ÒC/OÓ (care of). |
Name |
RelationshipFootref 1 |
Information related to another person; for example, ÒTrustee for.Ó |
Name |
PersonFootref 1 |
First name, middle name, and last name. |
Name: Person |
First NameFootref 1 |
The first name found in the input name. |
Name: Person |
Middle NameFootref 1 |
Middle name or initial. Use this for a single middle name, or for the first of several middle names (such as ÒHerbertÓ in George Herbert Walker Bush). |
Name: Person |
Middle Name 2Footref 1 |
Second middle name; for example, ÒWalkerÓ in George Herbert Walker Bush. |
Name: Person |
Middle Name 3Footref 1 |
Third middle name; for example, ÒLouiseÓ in Ethel May Roberta Louise Mertz. |
Name: Person |
Last NameFootref 1 |
Last name, or surname. |
Name: Derived |
GenderFootref 1 |
Probable gender:
|
Name: Derived |
Person Count |
Number of persons the record references. For example, a record with a Person name of ÒJohn and Jane DoeÓ has a Person Count of 2. |
Name: Business |
Firm NameFootref 1 |
Name of the company or organization, including divisions. |
Name: Business |
Firm CountFootref 1 |
Number of firms referenced in the record. |
Address |
AddressFoot 2 |
Full address line, including:
|
Address |
Primary AddressFootref 2 |
Box, route, or street address, including:
This does not include the Unit Designator or the Unit Number. |
Address: Primary Address |
Street NumberFootref 2 |
Number that identifies the address, such as a house or building number, sometimes referred to as the primary range. For example, in 200 Oracle Parkway, the Street Number is 200. |
Address: Primary Address |
Pre DirectionalFootref 2 |
Street directional indicator appearing before the street name; for example, in 100 N University Drive, the Pre Directional is 'N'. |
Address: Primary Address |
Street NameFootref 2 |
Name of street. |
Address: Primary Address |
Street TypeFootref 2 |
Street identifier; for example, ST, AVE, RD, DR, or HWY. |
Address: Primary Address |
Post DirectionalFootref 2 |
Street directional indicator appearing after the street name; for example, in 100 15th Ave. S., the Post Directional is 'S'. |
Address |
Secondary AddressFootref 2 |
The second part of the street address, including:
For example, in a secondary address of Suite 2100, the Unit Designator is 'STE' (a standardization of ÒSuiteÓ) and the Unit Number is '2100'. |
Address: Secondary Address |
Unit DesignatorFootref 2 |
Type of secondary address, such as APT or STE. For example, in a secondary address of Suite 2100, the Unit Designator is 'STE' (a standardization of ÒSuiteÓ). |
Address: Secondary Address |
Unit NumberFootref 2 |
Number that identifies the secondary address, such as the apartment or suite number. For example, in a secondary address of Suite 2100, the Unit Number is '2100'. |
Address |
Last Line |
Final address line, including:
|
Address: Last Line |
Neighborhood |
Neighborhood or barrio, common in South and Latin American addresses. |
Address: Last Line |
City |
Name of city. The US city names may be converted to United States Postal Service preferred names. |
Address: Last Line |
State |
Name of state or province. |
Address: Last Line |
Postal Code |
Full postal code with spaces and other non-alphanumeric characters removed. |
Address: Last Line |
Postal Code Formatted |
Formatted version of postal code that includes spaces and other non-alphanumeric characters, such as dashes. |
Address: Last Line |
Delivery Point |
Applies to United States and Australia.
|
Address: Last Line |
Country Code |
The ISO 3166-1993 (E) two-character country code, as defined by the International Organization for Standardization; for example, 'US' for United States or 'CA' for Canada. |
Address: Last Line |
Country Code 3 |
The ISO 3166-1993 (E) three-character country code, as defined by the International Organization for Standardization; for example, 'USA' for United States, 'FRA' for France, or 'UKR' for Ukraine. |
Address: Last Line |
Country Name |
The full country name. |
Address: Other Address Line |
Box NameFootref 2 |
The name for a post office box address; for example, for ÒPO Box 95,Ó the Box Name is 'PO BOX'. |
Address: Other Address Line |
Box NumberFootref 2 |
The number for a post office box address; for example, for ÒPO Box 95,Ó the Box Number is '95'. |
Address: Other Address Line |
Route NameFootref 2 |
Route name for a rural route address. For an address of ÒRoute 5 Box 10,Ó the Route Name is 'RTE' (a standardization of ÒRouteÓ). |
Address: Other Address Line |
Route NumberFootref 2 |
Route number for a rural route address. For an address of ÒRoute 5 Box 10,Ó the Route Number is '5'. |
Address: Other Address Line |
Building Name |
Building name, such as ÒCannon Bridge House.Ó Building names are common in the United Kingdom. |
Address: Other Address Line |
Complex |
Building, campus, or other complex. For example, USS John F. Kennedy, Shadow Green Apartments, Cedarvale Gardens, Concordia College. You can use an the Instance field in the Output Components dialog to specify which complex should be returned in cases where an address has more than one complex. |
Address: Other Address Line |
Miscellaneous Address |
Miscellaneous address information, such as a telephone number or an e-mail address. In records with multiple miscellaneous fields, you can extract several by specifying which instance to use in the Name and Address operator Output Components screen. For example, in a record that has both an e-mail address and a telephone number, you can extract both items of information by using Miscellaneous Address 1 and Miscellaneous Address 2. |
Error Status: Name and Address |
Is Good Group |
Possible values: T or F. Indicates whether the name group, address group, or name and address group was processed successfully:
Using this flag in conjunction with another flag, such as the Is Parsed Flag, followed by the Splitter operator, enables you to isolate unsuccessfully parsed records in their own target, where you can address them separately. |
Error Status: Name and Address |
Is Parsed |
Indicates whether the name or address was parsed:
Check the status of parsing warning flags (such as Name Warning, or City Warning). |
Error Status: Name and Address |
Parse Status |
Postal matching software parse status code. |
Error Status: Name and Address |
Parse Status Description |
Text description of the postal matching software parse status. |
Error Status: Name Only |
Is Good Name |
Indicates whether the name was parsed successfully:
|
Error Status: Name Only |
Name Warning |
Indicates whether the parser found unusual or possibly erroneous data in a name:
|
Error Checking: Address Only |
Is Good Address |
Indicates whether the address was processed successfully:
This flag is easier to use if you have a mix of records from both postal matched and non postal matched countries. |
Error Checking: Address Only |
Is Found |
Indicates whether the address is listed in the postal matching database for the country indicated by the address:
This flag is true only if all of the flags listed in the following section are true. If postal matching is available, this flag is the best indicator of record quality. |
Error Checking: Address Only: Is Found |
City Found |
Indicates whether the postal matcher found the city:
|
Error Checking: Address Only: Is Found |
Street Name Found |
Indicates whether the postal matcher found the street name:
|
Error Checking: Address Only: Is Found |
Street Number Found |
Indicates whether the postal matcher found the street number within a valid range of numbers for the named street:
|
Error Checking: Address Only: Is Found |
Street Components Found |
Indicates whether the postal matcher found the street components, such as the Pre Directional or Post Directional:
|
Error Checking: Address Only: Is Found |
Non-ambiguous Match Found |
Indicates whether the postal matcher found a matching address in the postal database:
|
Error Checking: Address Only |
City Warning |
Indicates whether the parser found unusual or possibly erroneous data in a city:
|
Error Checking: Address Only |
Street Warning |
Indicates whether the parser found unusual or possibly erroneous data in a street address:
|
Error Checking: Address Only |
Is Address Verifiable |
Indicates whether postal matching is available for the country of the address, but does not indicate the outcome of the matching operation:
|
Error Checking: Address Only |
Address Corrected |
Indicates whether the address was corrected in any way during matching. Standardization is not considered correction in this case.
|
Error Checking: Address Only: Address Corrected |
Postal Code Corrected |
Indicates whether the postal code was corrected during matching. Correction can include the addition of a postal extension:
|
Error Checking: Address Only: Address Corrected |
City Corrected |
Indicates whether the city name was corrected during matching. Postal code input is used to determine the city name preferred by the postal service.
|
Error Checking: Address Only: Address Corrected |
Street Corrected |
Indicates whether the street name was corrected during matching. Some correct street names may be changed to an alternate name preferred by the postal service.
|
Error Checking: Address Only: Address Corrected |
Street Components Corrected |
Indicates whether any of the street components, such as the Pre Directional or Post Directional, were corrected during matching:
|
Error Checking: Address Only |
Address Type |
Type of address. The following are common examples; actual values vary with vendors of postal matching software:
|
Error Checking: Address Only |
Parsing Country |
Country parser that was used for the final parse of the record. |
Country Specific: United States |
ZIP 5 |
The five-digit United States postal code. This applies to United States addresses only. |
Country Specific: United States |
ZIP 4 |
The four-digit suffix that is added to the five-digit United States postal code to further specify location. This applies to United States addresses only. |
Country Specific: United States |
Urbanization Name |
Urban unit name used in Puerto Rico. This applies to United States (Puerto Rico) addresses only. |
Country Specific: United States |
LACS Flag |
Indicates whether the address requires a LACS conversion. The Locatable Address Conversion System (LACS) provides new addresses when a 911 emergency system has been implemented. 911 address conversions typically involve changing rural-style addresses to city-style addresses, but on occasion they may involve renaming or renumbering existing city-style addresses.
This applies to United States addresses only. |
Country Specific: United States |
CART |
Four-character USPS Carrier route. This applies to United States addresses only. |
Country Specific: United States |
DPBC Check Digit |
Check digit for forming a delivery point bar code. This applies to United States addresses only. |
Country Specific: United States: Geography |
Metropolitan Statistical Area |
Metropolitan Statistical Area (MSA) number. For example, '0000' indicates that the address does not lie within any MSA, and typically indicates a rural area. This applies to United States addresses only. |
Country Specific: United States: Geography |
Minor Census District |
Minor Census District. This applies to United States addresses only. |
Country Specific: United States: Geography |
Latitude |
Latitude in degrees north of the equator: positive for north of the equator; negative for south (always positive for North America). |
Country Specific: United States: Geography |
Longitude |
Longitude in degrees east of the Greenwich Meridian: positive for east of GM; negative for west (always negative for North America). |
Country Specific: United States: Geography |
FIPS County |
The three-digit county code as defined by the Federal Information Processing Standard (FIPS). This applies to United States addresses only. |
Country Specific: United States: Geography |
FIPS Code |
The complete (state plus county) code assigned to the county by the Federal Information Processing Standard (FIPS). Because FIPS county codes are unique within a state, a complete FIPS Code includes the two-digit state code followed by the three-digit county code. This applies to United States addresses only. |
Country Specific: United States: Geography |
Census ID |
United States Census tract and block-group number. The first six digits are the tract number; the final digit is the block-group number within the tract. These codes are used for matching to demographic-coding databases. This applies to United States addresses only. |
Country Specific: United Kingdom |
Locality Code |
This applies to United Kingdom addresses only. For example, the following address is assigned Locality Code 23591: Chobham Rd Knaphill Woking GU21 2TZ |
Country Specific: United Kingdom |
Locality Name |
This applies to United Kingdom addresses only. For example, the following address is assigned Locality Name KNAPHILL: Chobham Rd Knaphill Woking GU21 2TZ |
Country Specific: United Kingdom |
County Name |
This applies to United Kingdom addresses only. |
Country Specific: Canada |
Installation Type |
Type of Canadian postal installation:
For example, for the address, ÒPO Box 7010, Scarborough ON M1S 3C6,Ó the Installation Type is 'STN'. |
Country Specific: Canada |
Installation Name |
Name of Canadian postal installation. For example, for the address, ÒPO Box 7010, Scarborough ON M1S 3C6,Ó the Installation Name is 'AGINCOURT'. |
Country Specific: Hong Kong |
Delivery Office Code |
This applies to Hong Kong addresses only. For example, the following address is assigned the Delivery Office Code WCH: Oracle 39/F The Lee Gardens 33 Hysan Ave Causeway Bay |
Country Specific: Hong Kong |
Delivery Beat Code |
This applies to Hong Kong addresses only. For example, the following address is assigned the Delivery Beat Code S06: Oracle 39/F The Lee Gardens 33 Hysan Ave Causeway Bay |
Country Specific: Hong Kong |
Address 2 |
The second address line, assigned when both a street address and a building or floor address are present. This applies to Hong Kong only. |
Footnote 1 In records where this component occurs multiple times, you can specify which instance to use in the Name and Address operator Output Attributes components dialog. For example, in records with two occurrences of Firm Name, you can extract both by adding two output attributes for Firm1 and Firm2, and then assigning a First instance to one, and then a Second instance to the other, from the Instances drop list in the components dialog.
Footnote 2 In records with dual addresses, you can specify which line is used as the Normal Address (and thus assigned to the Address component) and which is used as the Dual Address. You can specify this in the Name and Address operator Output Attributes components dialog.
Table 20-3 lists supported countries for some vendors of name and address cleansing software. This list varies by vendor.
Table 20-3 Countries Supported by the Name and Address Operator
Name | Postal Code Matching Software Available for Use with the Name and Address Operator |
---|---|
Argentina |
No |
Australia |
Yes |
Belgium |
No |
Brazil |
Yes |
Canada |
Yes |
Chile |
No |
Colombia |
No |
Denmark |
No |
France |
Yes |
Germany |
Yes |
Hong Kong |
Yes |
India |
No |
Ireland |
No |
Italy |
No |
Mexico |
Yes |
Malaysia |
No |
Netherlands |
Yes |
New Zealand |
No |
Peru |
No |
Philippines |
No |
Portugal |
Yes |
Singapore |
Yes |
South Africa |
No |
Spain |
Yes |
Sweden |
No |
Switzerland |
No |
United Arab Emirates |
No |
United Kingdom |
Great Britain Only |
United States |
Yes |
Venezuela |
No |
The postal matching software used by Oracle Warehouse Builder Name and Address can be certified for various countries. The certification depends on the vendor of the postal matching software you use. Possible certifications include:
United States - CASS certification with US Postal Service
Canada - SERP certification with Canada Post
Australia - AMAS certification with Australia Post
The Coding Accuracy Support System (CASS) is a process developed by the United States Postal Service (USPS) in cooperation with the mailing industry. The process provides mailers a common platform to measure the quality of address-matching software, focusing on the accuracy of five-digit ZIP Codes, ZIP+4 Codes, delivery point codes, and carrier route codes applied to all mail. All address lists used to produce mailings for automation rates must be matched by CASS-certified software. Oracle9i Pure Name and Address is CASS-certified. The CASS report is a text file specified by the USPS and produced by Oracle9i Pure Name and Address. To meet USPS requirements, the mailer must submit a CASS report in its original form to the USPS.
Canada Post has developed a testing program called Software Evaluation and Recognition Program (SERP) which evaluates software packages for their ability to validate or validate and correct mailing lists to Canada Post requirements. Postal programs that meet SERP requirements are listed on the Canada Post Web site.
Customers who utilize Incentive Lettermail, Addressed Admail and Publications Mail must meet the Address Accuracy Program requirements. Customers can obtain a Statement of Accuracy by comparing their database to Canada Post's address data.
The Address Matching Approval System (AMAS®) is a software approval program that has been developed by Australia Post to improve the quality of addressing. It provides a standard by which to test and measure the ability of address-matching software to:
Correct and match addresses against the Postal Address File (PAF)
Append a unique Delivery Point Identifier (DPID)® to each address record, which is a step toward the barcoding of mail.
AMAS allows companies to develop address matching software which:
Prepares addresses for barcode creation
Ensures quality addressing
Enables qualification for discounts on PreSort letter lodgements
PreSort Letter Service prices are conditional upon customers using AMAS Approved Software with Delivery Point Identifiers (DPIDs) being current against the latest version of the PAF.
A declaration that your mail was prepared appropriately must be made when using the Presort Lodgement Document, available from Post Offices.
You can only produce one postal report for each mapping. The postal report created is for the primary country specified in the Name and Address operator definition. Ideally, all mapped addresses should be located in the primary country selected in the Definitions page of the Name and Address Wizard. For more information, see "Definitions".
The Name and Address operator generates PL/SQL code, which calls the UTL_NAME_ADDR
package installed in the Runtime Schema. A private synonym, NAME_ADDR
, is defined in the target schema to reference the UTL_NAME_ADDR
package. The UTL_NAME_ADDR
package calls Java packages, which send processing requests to an external Name and Address server, which then interfaces with third-party Name and Address processing libraries, such as Trillium.
You can use the server property file, NameAddr.properties
, to configure server options. This file is located under the ORACLE_HOME
of your Oracle Warehouse Builder Server Side Install at owb/bin/admin
. The following code illustrates several important properties with their default settings.
TraceLevel=0 SocketTimeout=120 ClientThreads=16 Port=4040
The TraceLevel
property is often changed to perform diagnostics on server communication and view output from the postal matching program parser. Other properties are rarely changed.
TraceLevel: Enables output of file NASvrTrace.log
in the owb/bin/admin
folder. This file shows all incoming and outgoing data, verifies that your mapping is communicating with the Name and Address Server, and that the Name and Address Server is receiving output from the service provider. The trace log shows all server input and output and is most useful for determining whether any parse requests are being made by an executing mapping. Set TraceLevel=1
to enable logging. However, tracing degrades performance and creates a large log file. Set TraceLevel=0
to disable logging for production.
ClientThreads: This parameter is reserved for future development. Keep the default value of 16.
Port: Specifies the port on which the server listens and was initially assigned by the installer. This value may be changed if the default port conflicts with another process. If the port is changed, the port attribute must also be changed in the runtime_schema
.nas_connection
table to enable the utl_name_addr
package to establish a connection.
Whenever you edit the properties file or perform table maintenance, you must stop and restart the Name and Address Server for the changes to take effect. You must stop and restart the server manually if the auto shutdown property is FALSE
and the server is running when you edit the file or perform the table maintenance.
To manually stop the Name and Addresss Server:
In Windows, run ORACLE_HOME
/owb/bin/win32/NAStop.bat
.
In UNIX, run ORACLE_HOME
/owb/bin/unix/NAStop.sh
.
You can automatically restart the Name and Address Server by invoking a mapping in Warehouse Builder. You can also restart the server manually.
To manually restart the Name and Address Server:
In Windows, run ORACLE_HOME
/owb/bin/win32/NAStart.bat
.
In UNIX, run ORACLE_HOME
/owb/bin/unix/NAStart.sh
.
Use the following best practice tips to define and use the Name and Address Operator in your mapping.
Oracle Warehouse Builder Name and Address is built on name and address software and data supplied by third-party software vendors who specialize in name and address cleansing. This section discusses additional considerations that pertain to these products, as well as general parsing issues.
Name and Address parsing and correction can provide great improvements in data quality, but can degrade quality if not applied carefully. Name and Address parsing, postal matching, or any cleansing of incorrect data is error-prone because the problem domain of dirty data is not bounded.
Name and Address parsing, like any other type of parsing, depends on identification of keywords and patterns containing those keywords. Unlike computer languages, which contain a small set of keywords, free form Name and Address data of any region is very difficult to parse because the keyword set is so large and the set is never 100% complete. For example, one name and address cleansing software provider's pattern table for USA contains over 80,000 defined keywords and over 5,000 patterns. Keyword sets are built by analyzing millions of records, but each new data set is likely to contain some undefined keywords.
Because most free-form Name and Address record lines contain common patterns of numbers, single letters, and alphanumeric strings, parsing can often be performed based on just the alphanumeric patterns. In more difficult cases, alphanumeric patterns may be ambiguous, with possible interpretation as either an address line or name line. In other cases, a particular pattern may not be found. With computer languages, compilation errors are reported and an executable image is not produced; with Name and Address parsing errors, parsing status codes are set and Name and Address elements may not be parsed.
Because failure of the parsing engine at some point is highly probable, perform all Name and Address processing with error processing in mind. Use status codes to control the data mapping. Since the criteria for quality vary between applications, numerous flags are available to help you determine the quality of a particular record. For countries with postal matching support, the Is Good Group flag is the best measure of quality because it verifies that an address is a valid entry in a postal database. For CASS or SERP certified mailings, this flag may be the best criterion for acceptance or rejection of a record.
If you do not perform postal reporting, an address does not have to be found in a postal database to be acceptable. For example, it may not be possible to locate street intersection addresses or addresses using building names may not be locatable in a postal database, but they may still be deliverable. If the parse status is good, some benefit will be derived from standardization of address elements. If the Is Good Group flag indicates postal matching failure, several parser warning/error flags are also available to help determine the parsing status. The Is Parsed flag indicates success or failure of the parsing process. If Is Parsed indicates parsing success, you may still wish to check the parser warning flags, which indicate unusual data. If parser warnings are present, it may be desirable to check those records manually. If Is Parsed indicates parsing failure, preservation of your original data is necessary to prevent data loss.
A wide range of input roles is available to match the granularity of input data. Line based input roles are available for line oriented input, and discrete roles are available for atomic attributes such as First Name. Whenever possible, use discrete roles because they give the parser more information about the data content, and result in better parsing. Some of the discrete input roles overlap less discrete roles; for example, the combination of city, state, and postal code redefines the Last Line role. Such redefining assignments should be avoided. Where they occur, the more discrete roles take precedence because they are more specific.
Map discrete Name and Address attributes whose original value you want to retain directly to the target (map them around the Name and Address operator). This is very common for first, middle, and last names because it is not a good practice to change a first name to a standardized first name (for example, changing Peggy to Margaret). This direct mapping prevents losing name data if a personal name is parsed as a firm name. To obtain a standardized first or middle name for matching and merging (data deduplication), you can map the same attributes into the Name and Address operator. However, only the recoded outputs, such as title, gender, or standardized name, are used as output from the operator. There is no advantage to using first, middle, or last name output because it mirrors the input. This recommendation only applies to discrete data where the first, middle, and last name is already known.
Consider using the splitter operator to map good records to one target and bad records to another target. If the percentage of bad records is very low, or if the data set is small, you may consider manual correction of bad records. You can flag each record manually corrected record for later mapping to the target schema. If the IS_GOOD_GRP flag is F and the records are mapped to a special target for manual correction, you can map additional components that indicate the success level of the postal match. Components are available to indicate matching success at the city, street name, street number range, or street component (for example, the pre-directional, post-directional, or street type) level.