12.5 Data Structures for Geocoding

Oracle uses the following tables for geocoding.

  • GC_PARSER_PROFILES

  • GC_PARSER_PROFILEAFS

  • GC_COUNTRY_PROFILE

  • GC_AREA_<suffix>

  • GC_POSTAL_CODE_<suffix>

  • GC_ROAD_SEGMENT_<suffix>

  • GC_ROAD_<suffix>

  • GC_POI_<suffix>

  • GC_INTERSECTION_<suffix>

The GC_PARSER_PROFILES and GC_PARSER_PROFILEAFS tables store address format definitions of all supported counties. These tables are used by the internal address parser in parsing postal addresses into addressing fields. The data for these two tables is provided by your data provider or by Oracle. (If these tables are not supplied by your data provider, you will need to install and populate them as explained in Installing the Profile Tables.) The remaining tables store geocoding data provided by data vendors.

Each user that owns the tables containing geocoding data (that is, each user that can be specified with the username parameter in a call to an SDO_GCDR subprogram) must have one GC_PARSER_PROFILES table, one GC_PARSER_PROFILEAFS table, and one GC_COUNTRY_PROFILE table. Each such user can have multiple sets of the other tables (GC_xxx_<suffix>). Each set of tables whose names end with the same suffix stores geocoding data of a country. For example, the following set of tables can be used to store geocoding data of the United States:

  • GC_AREA_US

  • GC_POSTAL_CODE_US

  • GC_ROAD_SEGMENT_US

  • GC_ROAD_US

  • GC_POI_US

  • GC_INTERSECTION_US

Geocoding data of one country cannot be stored in more than one set of those tables. The table suffix is defined by data venders and is specified in the GC_TABLE_SUFFIX column in the GC_COUNTRY_PROFILE table (described in GC_COUNTRY_PROFILE Table).

The following sections describe the vendor-supplied tables that store geocoding data, in alphabetical order by table name.

Indexes on Tables for Geocoding describes the indexes that you must create in order to use these tables for geocoding.

12.5.1 GC_ADDRESS_POINT_<suffix> Table and Index

The GC_ADDRESS_POINT_<suffix> table (for example, GC_ADDRESS_POINT_US) stores the geographic (latitude, longitude) coordinates for addresses in the country or group of countries associated with the table-name suffix. This table is not required for geocoding (although it is required for point-based geocoding); however, it enables the geocoder to provide more accurate location results. It is automatically used when present in the schema. This table contains one row for each address stored in the table, and it contains the columns shown in Table 12-7.

Table 12-7 GC_ADDRESS_POINT_<suffix> Table

Column Name Data Type Description

ADDRESS_POINT_ID

NUMBER(10)

ID number of the address point. (Required)

ROAD_ID

NUMBER

ID number of the road on which the address point is located. (Required)

ROAD_SEGMENT_ID

NUMBER(10)

ID number of the road segment on the road on which the address point is located. (Required)

SIDE

VARCHAR2(1)

Side of the road on which the address point is located. Possible values: L (left) or R (right). (Required)

LANG_CODE

VARCHAR2(3)

3-letter ISO national language code for the language associated with the address point. (Required) point

HOUSE_NUMBER

VARCHAR2(600 CHAR)

House number of the address point; may contain non-numeric characters. (Required)

PERCENT

NUMBER

Decimal fraction of the length of the road segment on which the address point is located. It is computed by dividing the distance from the segment start point to the address point by the length of the road segment. (Required).

ADDR_LONG

NUMBER(10)

Longitude coordinate value of the address point. (Required)

ADDR_LAT

NUMBER(10)

Latitude coordinate value of the address point. (Required)

COUNTRY_CODE_2

VARCHAR2(2)

2- letter ISO country code of the country to which the address point belongs. (Required)

PARTITION_ID

NUMBER

Partition key used for partitioning geocoder data by geographic boundaries. If the data is not partitioned, set the value to 1. (Required)

If you use the GC_ADDRESS_POINT_<suffix> table, you must create an index on the table using a statement in the following form:

CREATE INDEX idx_<suffix>_addrpt_addr ON gc_address_point_<suffix> (road_segment_id, road_id, house_number, side);

12.5.2 GC_AREA_<suffix> Table

The GC_AREA_<suffix> table (for example, CG_AREA_US) stores administration area information for the country associated with the table name suffix. This table contains one row for each administration area, and it contains the columns shown in Table 12-8.

Table 12-8 GC_AREA_<suffix> Table

Column Name Data Type Description

AREA_ID

NUMBER(10)

Area ID number. (Required)

AREA_NAME

VARCHAR2(64)

Area name. (Required)

LANG_CODE

VARCHAR2(3)

3-letter ISO national language code for the language associated with the area. (Required)

ADMIN_LEVEL

NUMBER(1)

Administration hierarchy level for the area. (Required)

LEVEL1_AREA_ID

NUMBER(10)

ID of the level-1 area to which the area belongs. In the administration hierarchy, the level-1 area is the country. (Required)

LEVEL2_AREA_ID

NUMBER(10)

ID of the level-2 area to which the area belongs, if applicable. You must specify an area ID for each level in the administration hierarchy to which this area belongs. (Optional)

LEVEL3_AREA_ID

NUMBER(10)

ID of the level-3 area to which the area belongs, if applicable. You must specify an area ID for each level in the administration hierarchy to which this area belongs. (Optional)

LEVEL4_AREA_ID

NUMBER(10)

ID of the level-4 area to which the area belongs, if applicable. You must specify an area ID for each level in the administration hierarchy to which this area belongs. (Optional)

LEVEL5_AREA_ID

NUMBER(10)

ID of the level-5 area to which the area belongs, if applicable. You must specify an area ID for each level in the administration hierarchy to which this area belongs. (Optional)

LEVEL6_AREA_ID

NUMBER(10)

ID of the level-6 area to which the area belongs, if applicable. You must specify an area ID for each level in the administration hierarchy to which this area belongs. (Optional)

LEVEL7_AREA_ID

NUMBER(10)

ID of the level-7 area to which the area belongs, if applicable. You must specify an area ID for each level in the administration hierarchy to which this area belongs. (Optional)

CENTER_LONG

NUMBER

Longitude value of the center of the area. The center is set to the closest road segment to the center longitude and latitude values. Oracle recommends that these two attributes be set properly. If these values are not set, the longitude and latitude coordinates of the geocoded result of an area will be (0,0). (Optional)

CENTER_LAT

NUMBER

Latitude value of the center of the area. (See the explanation for the CENTER_LONG column.) (Optional)

ROAD_SEGMENT_ID

NUMBER(10)

ID of the road segment to which the area center is set. This value must be set correctly if the geocoder is intended to work with the Oracle Spatial routing engine (described in Routing Engine); otherwise, it can be set to any nonzero value, but it cannot be null. (Required)

POSTAL_CODE

VARCHAR2(16)

Postal code for the center of the area. Oracle recommends that this attribute be set correctly. If this value is null, the postal code attribute of the geocoded result of an area will be null. (Optional)

COUNTRY_CODE_2

VARCHAR2(2)

2- letter ISO country code of the country to which the area belongs. (Required)

PARTITION_ID

NUMBER

Partition key used for partitioning geocoder data by geographic boundaries. If the data is not partitioned, set the value to 1. (Required)

REAL_NAME

VARCHAR2(64)

The real name of the area, as spelled using the local language. This column is useful for area names that are not in English. For example, the German name of city MUNICH is MÜNCHEN. It is allowed to be spelled as MUNCHEN, but its REAL_NAME value should be MÜNCHEN. In the area table for Germany, areas with name MÜNCHEN and MUNCHEN both refer to the same area, and they both have the same real name MÜNCHEN. If the area name does not have any non-English characters, set REAL_NAME to be the same as AREA_NAME. (Required)

IS_ALIAS

VARCHAR2(1)

Contains T if this area is an alias of another area that is an officially recognized administrative area; contains F if this area is not an alias of another area that is an officially recognized administrative area. For example, Manhattan is not an officially recognized administrative area, but it is used by the public to refer to a part of New York City. In this case, Manhattan is an alias of New York City. (Required)

NUM_STREETS

NUMBER

The number of streets inside this area. (Optional)

12.5.3 GC_COUNTRY_PROFILE Table

The GC_COUNTRY_PROFILE table stores country profile information used by the geocoder. This information includes administrative-area hierarchy definitions, the national languages, and the table-name suffix used by the data tables and their indexes. This table contains one row for each supported country, and it contains the columns shown in Table 12-9.

Table 12-9 GC_COUNTRY_PROFILE Table

Column Name Data Type Description

COUNTRY_NAME

VARCHAR2(60)

Country name. (Required)

COUNTRY_CODE_3

VARCHAR2(3)

3- letter ISO country code. (Required)

COUNTRY_CODE_2

VARCHAR2(2)

2- letter ISO country code. (Required)

LANG_CODE_1

VARCHAR2(3)

3-letter ISO national language code. Some countries might have multiple national languages, in which case LANG_CODE_2 and perhaps other LANG_CODE_n columns should contain values. (Required)

LANG_CODE_2

VARCHAR2(3)

3-letter ISO national language code. (Optional)

LANG_CODE_3

VARCHAR2(3)

3-letter ISO national language code. (Optional)

LANG_CODE_4

VARCHAR2(3)

3-letter ISO national language code. (Optional)

NUMBER_ADMIN_LEVELS

NUMBER(1)

Number of administration hierarchy levels. A country can have up to 7 administration area levels, numbered from 1 to 7 (largest to smallest). The top level area (country) is level 1. For the United States, the administration hierarchy is as follows: level 1 = country, level 2 = state, level 3 = county, level 4 = city. (Required)

SETTLEMENT_LEVEL

NUMBER(1)

Administration hierarchy level for a settlement, which is the lowest area level used in addressing. In the United States, this is the city level; in Europe, this is generally a subdivision of a city (level 5). (Required)

MUNICIPALITY_LEVEL

NUMBER(1)

Administration hierarchy level for a municipality, which is the second-lowest area level used in addressing. In the United States, this is the county (level 3); in Europe, this is generally a city (level 4). (Optional)

REGION_LEVEL

NUMBER(1)

Administrative level for the region, which is above the municipality level. In the United States, this is the state or third-lowest area level used in addressing (level 2); in Europe, this is a recognized subdivision of the country (level 2 or level 3). (Optional)

SETTLEMENT_IS_OPTIONAL

VARCHAR2(1)

Contains T if settlement information is optional in the address data; contains F if settlement information is not optional (that is, is required) in the address data. (Required)

MUNICIPALITY_IS_OPTIONAL

VARCHAR2(1)

Contains T if municipality information is optional in the address data; contains F if municipality information is not optional (that is, is required) in the address data. (Required)

REGION_IS_OPTIONAL

VARCHAR2(1)

Contains T if region information is optional in the address data; contains F if region information is not optional (that is, is required) in the address data. (Required)

POSTCODE_IN_SETTLEMENT

VARCHAR(1)

Contains T if each postal code must be completely within a settlement area; contains F if a postal code can include areas from multiple settlements. (Required)

SETTLEMENT_AS_CITY

VARCHAR(1)

Contains T if a city name can identify both a municipality and a settlement; contains F if a city name can identify only a settlement. For example, in the United Kingdom, London can be both the name of a municipality area and the name of a settlement area, which is inside the municipality of London. This is common in large cities in some European countries, such as the UK and Belgium. (Required)

CACHED_ADMIN_AREA_LEVEL

NUMBER

(Reserved for future use.)

GC_TABLE_SUFFIX

VARCHAR2(5)

Table name suffix identifying the country for the GC_* data tables. For example, if the value of GC_TABLE_SUFFIX is US, the names of tables with geocoding data for this country end with _US (for example, CG_AREA_US). (Required)

CENTER_LONG

NUMBER

Longitude value of the center of the area. (Optional)

CENTER_LAT

NUMBER

Latitude value of the center of the area. (Optional)

SEPARATE_PREFIX

VARCHAR2(1)

Contains T if the street name prefix is a separate word from the street name; contains F if the street name prefix is in the same word with the street name. For example, in an American street address of 123 N Main St, the prefix is N, and it is separate from the street name, which is Main. (Optional; not currently used by Oracle)

SEPARATE_SUFFIX

VARCHAR2(1)

Contains T if the street name suffix is a separate word from the street name; contains F if the street name suffix is in the same word with the street name. For example, in an American street address of 123 Main St NW, the suffix is NW, and it is separate from the street name, which is Main, and from the street type, which is St. (Optional; not currently used by Oracle)

SEPARATE_STYPE

VARCHAR2(1)

Contains T if the street type is a separate word from the street name; contains F if the street type is in the same word with the street name. For example, in a German street address of 123 Beethovenstrass, the type is strass, and it is in the same word with the street name, which is Beethoven. (Optional; not currently used by Oracle)

AREA_ID

NUMBER

Not currently used by Oracle. (Optional)

VERSION

VARCHAR2(10)

Version of the data. The first version should be 1.0. (Required)

12.5.4 GC_INTERSECTION_<suffix> Table

The GC_INTERSECTION_<suffix> table (for example, GC_INTERSECTION_US) stores information on road intersections for the country or group of countries associated with the table-name suffix. An intersection occurs when roads meet or cross each other. This table contains the columns shown in Table 12-10.

Table 12-10 GC_INTERSECTION_<suffix> Table

Column Name Data Type Description

ROAD_ID_1

NUMBER

ID number of the first road on which the intersection is located. (Required)

ROAD_SEGMENT_ID_1

NUMBER

ID number of the road segment on the first road on which the intersection is located. (Required)

ROAD_ID_2

NUMBER

ID number of the second road on which the intersection is located. (Required)

ROAD_SEGMENT_ID_2

NUMBER

ID number of the road segment on the second road on which the intersection is located. (Required)

INTS_LONG

NUMBER

Longitude coordinate value of the intersection. (Required)

INTS_LAT

NUMBER

Latitude coordinate value of the intersection. (Required)

HOUSE_NUMBER

NUMBER

The leading numerical part of the house number at the intersection. (See the explanation of house numbers after Table 12-16 in GC_ROAD_SEGMENT_<suffix> Table.) (Required)

HOUSE_NUMBER_2

VARCHAR2(10)

The second part of the house number at the intersection. (See the explanation of house numbers after Table 12-16 in GC_ROAD_SEGMENT_<suffix> Table.) (Required)

SIDE

VARCHAR2(1)

Side of the street on which the house at the intersection is located. Possible values: L (left) or R (right). (Required)

COUNTRY_CODE_2

VARCHAR2(2)

2- letter ISO country code of the country to which the house at the intersection belongs. (Required)

PARTITION_ID

NUMBER

Partition key used for partitioning geocoder data by geographic boundaries. If the data is not partitioned, set the value to 1. (Required)

12.5.5 GC_PARSER_PROFILES Table

The GC_PARSER_PROFILES table stores information about keywords typically found in postal addresses. The geocoder uses keywords to identify address fields, such as house number, road name, city name, state name, and postal code. A keyword can be the type of street (such as road, street, drive, or avenue) or the prefix or suffix of a street (such as north, south, east, or west). This table contains the columns shown in Table 12-11.

Table 12-11 GC_PARSER_PROFILES Table

Column Name Data Type Description

COUNTRY_CODE

VARCHAR2(2)

2- letter ISO country code of the country for the keyword. (Required)

KEYWORDS

SDO_KEYWORDARRAY

A single array of keywords for a specific address field. The array may contain a single word, or a group of words and abbreviations that can be used with the same meaning; for example, United States of America, USA, and United States all refer to the US. The first word of this array should be the official full name of the keyword, if there is any. The US uses over 400 keywords in parsing addresses. The following are some examples of keyword arrays and keywords from the US data set; however, only a single SDO_KEYWORDARRAY object is stored in each row:

SDO_KEYWORDARRAY( 'UNITED STATES OF AMERICA','US', 'USA', 'UNITED STATES', 'U.S.A.', 'U.S.')

SDO_KEYWORDARRAY('AVENUE','AV', 'AVE', 'AVEN', 'AVENU', 'AVN', 'AVNUE', 'AV.','AVE.')

SDO_KEYWORDARRAY('40TH', 'FORTIETH')

SDO_KEYWORDARRAY('NEW YORK','NY')

SDO_KEYWORDARRAY('LIBRARY')

OUTPUT_KEYWORD

VARCHAR2(2000)

A keyword used in the geocoder data to represent an address field. It must be the same as one of the keywords used in the keyword array. The output keyword is used to match the addresses stored in the geocoding data tables to the user's input, for example, if the output keyword AV is used for street type Avenue in the GC_ROAD_US table, wherever a user enters an address containing any of the keywords (AVENUE, AV, AVE, AVEN, AVENU, AVN, AVNUE, AV., AVE.), the keyword will be interpreted and matched to the output keyword AV to help find the address in the database The following are some examples of output keywords; however, only a single output keyword is stored in each row:

US

AV

40TH

NY

LIBRARY

SECTION_LABEL

VARCHAR2(30)

A label used to identify the type of keyword represented in the KEYWORDS and OUTPUT_KEYWORD columns. There are the multiple different section labels; however, only a single section label for each row is used in identifying the type of keywords:

COUNTRY_NAME: Identifies keywords that are used to represent country names.

LOCALITY_KEYWORD_DICTIONARY: Identifies keywords that are used to replace words in a locality (city, state, province, and so on) with a standardized form of the word. For example, Saint is replaced by St; and by doing so, the city names Saint Thomas and St. Thomas will be standardized to St Thomas, which is stored in the database.

PLACE_NAME_KEYWORD: Identifies a point of interest (POI) name keyword, such as for a restaurant or a hotel.

REGION_LIST: Identifies keywords that are known names of regions, such as NY, New York, NH, and New Hampshire. The regions identified must be administrative areas that belong to the third-lowest area level or third-smallest area used in addressing. In the US this is the state level (the lowest area level or smallest area is the city level).

SECOND_UNIT_KEYWORD: Identifies keywords used in second-unit descriptions, such as Floor, #, Suite, and Apartment.

STREET_KEYWORD_DICTIONARY: Identifies keywords used to replace non-street-type keywords in street names (such as 40TH and Fortieth) with a standardized form.

STREET_PREFIX_KEYWORD: Identifies street name prefix keywords, such as South, North, West, and East.

STREET_TYPE_KEYWORD: Identifies street type keywords, such as Road, Street, and Drive.

IN_LINE_STREET_TYPE_KEYWORD: Identifies street type keywords that are attached to street names, such as strasse in the German street name Steinstrasse.

POSITION

VARCHAR2(1)

The position of the keyword relative to a street name. It indicates whether the keyword can precede (P) or follow (F) the actual street name, or both (B). Thus, P, F, and B are the only valid entries. In the US, most street type keywords follow the street names, for example, the street type Blvd in Hollywood Blvd. In France, however, street type keywords usually precede the street names, for example, the street type Avenue in Avenue De Paris.

SEPARATENESS

VARCHAR2(1)

Indicates whether the keyword is separate from a street name. Keywords are either separable (S) or non-separable (N). Thus, S and N are the only valid entries. In the US, all street-type keywords are separate words from the street name, for example, the street type Blvd in Hollywood Blvd. In Germany, however, the street-type keywords are not separate from the street name, for example, the street type strasse in Augustenstrasse.

12.5.6 GC_PARSER_PROFILEAFS Table

The GC_PARSER_PROFILEAFS table stores the XML definition of postal-address formats. An XML string describes each address format for a specific country. In the Oracle Geocoder 10g and earlier, the J2EE geocoder uses a country_name.ppr file instead of this table. The content of the country_name.ppr file is equivalent to the content of the ADDRESS_FORMAT_STRING attribute. This table contains the columns shown in Table 12-12.

Table 12-12 GC_PARSER_PROFILEAFS Table

Column Name Data Type Description

COUNTRY_CODE

VARCHAR2(2)

2- letter ISO country code of the country. (Required)

ADDRESS_FORMAT_STRING

CLOB

XML string describing the address format for the country specified in the COUNTRY_CODE column. (Example 12-4 shows the XML definition for the US address format, and ADDRESS_FORMAT_STRING Description explains the elements used in the US address format definition.).

Example 12-4 shows the ADDRESS_FORMAT_STRING definition for the US address format.

Example 12-4 XML Definition for the US Address Format

<address_format unit_separator="," replace_hyphen="true">
 <address_line>
   <place_name />
 </address_line>
 <address_line>
   <street_address>
     <house_number>
       <format form="0*" effective="0-1" output="$" />
       <format form="0*1*" effective="0-1" output="$">
         <exception form="0*TH" />
         <exception form="0*ST" />
         <exception form="0*2ND" />
         <exception form="0*3RD" />
       </format>
       <format form="0*10*" effective="0-1" output="$" />
       <format form="0*-0*" effective="0-1" output="$" />
       <format form="0*.0*" effective="0-1" output="$" />
       <format form="0* 0*/0*" effective="0-1" output="$" />
     </house_number>
     <street_name>
       <prefix />
       <base_name />
    	<suffix />
       <street_type />
       <special_format>
   <format form="1* HWY 0*" effective="7-8" addon_effective="0-1" addon_output="$ HWY"/>
   <format form="1* HIGHWAY 0*" effective="11-12" addon_effective="0-1" addon_output="$ HWY"/>
   <format form="1* HWY-0*" effective="7-8" addon_effective="0-1" addon_output="$ HWY"/>
   <format form="1* HIGHWAY-0*" effective="11-12" addon_effective="0-1" addon_output="$ HWY"/>
         <format form="HWY 0*" effective="4-5" addon_output="HWY" />
         <format form="HIGHWAY 0*" effective="8-9" addon_output="HWY" />
         <format form="ROUTE 0*" effective="6-7" addon_output="RT" />
         <format form="I 0*" effective="2-3" addon_output="I" />
         <format form="11 0*" effective="3-4" addon_effective="0-1" />
         <format form="I0*" effective="1-2" addon_output="I" />
         <format form="I-0*" effective="2-3" addon_output="I" />
         <format form="11-0*" effective="3-4" addon_effective="0-1" />
         <format form="ROUTE-0*" effective="6-7" addon_output="RT" />
         <format form="US0*" effective="2-3" addon_output="US" />
         <format form="HWY-0*" effective="2-3" addon_output="US" />
         <format form="HIGHWAY-0*" effective="8-9" addon_output="HWY" />
       </special_format>
     </street_name>
     <second_unit>
       <special_format>
         <format form="# 0*" effective="2-3" output="APT $" />
         <format form="#0*" effective="1-2" output="APT $" />
       </special_format>
     </second_unit>
   </street_address>
 </address_line>
 <address_line>
   <po_box>
     <format form="PO BOX 0*" effective="7-8" />
     <format form="P.O. BOX 0*" effective="9-10" />
     <format form="PO 0*" effective="3-4" />
     <format form="P.O. 0*" effective="5-6" />
     <format form="POBOX 0*" effective="6-7" />
   </po_box>
 </address_line>
 <address_line>
   <city optional="no" />
   <region optional="no" order="1" />
   <postal_code>
     <format form="00000" effective="0-4" />
     <format form="00000-0000" effective="0-4" addon_effective="6-9" />
     <format form="00000 0000" effective="0-4" addon_effective="6-9" />
   </postal_code>
 </address_line>
</address_format>

12.5.6.1 ADDRESS_FORMAT_STRING Description

The ADDRESS_FORMAT_STRING column of the GC_PARSER_PROFILEAFS table describes the format of address fields and their positioning in valid postal addresses. The address format string is organized by address lines, because postal addresses are typically written in multiple address lines.

The address parser uses the format description defined in the XML address format, combined with the keyword definition for each address field defined in the GC_PARSER_PROFILES table, to parse the input address and identify individual address fields.

<address_format> Element

The <address_format> element includes the unit_separator and replace_hyphen attributes. The unit_separator attribute is used to separate fields in the stored data. By default it is a comma (unit_separator=","). The replace_hyphen attribute specifies whether to replace all hyphens in the user's input with a space. By default it is set to true (replace_hyphen="true"), that is, it is expected that all names in the data tables will contain a space instead of a hyphen.

If replace_hyphen="true", administrative-area names in the data tables containing hyphens will not be matched during geocoding if replace_hyphen="true"; however, these area names with hyphens can be placed in the REAL_NAME column of the GC_AREA table to be returned as the administrative-area name in the geocoded result. Road names in the NAME column of the GC_ROAD table containing hyphens will, however, be matched during geocoding, but the matching performance will be degraded

<address_line> Elements

Each <address_line> element in the XML address format string describes the format of an address line. Each <address_line> element can have one or more child elements describing the individual address fields, such as street address, city, state (region or province), and postal code. These address field elements are listed in the order that the address fields appear in valid postal addresses. The optional attribute of the address field element is set to "no" if the address field is mandatory. By default, address field elements are optional.

<format> Elements

The format descriptions for house number, special street name, post box, and postal code elements are specified with a single or multiple <format> elements. Each <format> element specifies a valid layout and range of values for a particular address field. The following example illustrates the format used to define a special street name:

<format 
     form="1* HWY 0*" 
     effective="7-8" 
     output="$"
     addon_effective="0-1" 
     addon_output="$ HIGHWAY" />

The form attribute uses a regular expression-like string to describe the format: 1 stands for any alphabetic letter; 0 stands for any numerical digit; 2 stands for any alphabetic letter or any numerical digit; 1* specifies a sting consisting of all alphabetic letters; 0* specifies a string consisting of all numerical digits; 2* specifies a string consisting of any combination of numerical digits and alphabetic letters. All other symbols represent themselves.

Any string matching the pattern specified by the form attribute is considered to be a valid string for its (parent) address field. A valid string can then be broken down into segments specified by the attributes effective and addon_effective. The effective attribute specifies the more important, primary piece of the address string; the addon_effective attribute specifies the secondary piece of the address string.

  • The effective attribute specifies a substring of the full pattern using the start and end positions for the end descriptor of the form attribute. In the preceding example, effective="7-8" retrieves the substring (counting from position 0) starting at position 7 and ending at position 8, which is the substring defined by 0*, at the end of the form attribute.

  • The addon_effective attribute specifies a substring of the full pattern using the start and end positions for the start descriptor of the form attribute. In the preceding example, addon_effective="0-1" retrieves the substring, (counting from position 0) starting at position 0 and ending at position 1, which is the substring defined by 1*, at the beginning of the form attribute.

The output and addon_output attributes specify the output form of the address string for segments specified by the effective and addon_effective attributes, respectively. These output forms are used during address matching. The symbol $ stands for the matched string, and other symbols represent themselves. In the preceding example:

  • In output="$", the $ stands for the substring that was matched in the effective attribute.

  • In addon_output="$ HIGHWAY", the $ HIGHWAY stands for the substring that was matched in the addon_effective attribute, followed by a space, followed by the word HIGHWAY.

Using the <format> element in the preceding example, with form="1* HWY 0*", the input string 'STATE HWY 580' will have effective=580, output=580, addon_effective=STATE, and addon_output=STATE HIGHWAY.

The <format> element may also contain an <exception> element. The <exception> element specifies a string that has a valid form, but must be excluded from the address field. For example, in a <house_number> element with valid numbers 0*1* (that is, any numeric digits followed by any alphabetic letters), specifying <exception form="0*TH" /> means that any house number with (or without) numeric digits and ending with "TH" must be excluded.

12.5.7 GC_POI_<suffix> Table

The GC_POI_<suffix> table (for example, GC_POI_US) stores point of interest (POI) information for the country or group of countries associated with the table name suffix. POIs include features such as airports, monuments, and parks. This table contains one or more rows for each point of interest. (For example, it can contain multiple rows for a POI if the POI is associated with multiple settlements.) The GC_POI_<suffix> table contains the columns shown in Table 12-13.

Table 12-13 GC_POI_<suffix> Table

Column Name Data Type Description

POI_ID

NUMBER

ID number of the POI. (Required)

NAME

VARCHAR2(64)

Name of the POI. (Required)

LANG_CODE

VARCHAR2(3)

3-letter ISO national language code for the language for the POI name. (Required)

FEATURE_CODE

NUMBER

Feature code for the POI, if the data vendor classifies POIs by category. (Optional)

HOUSE_NUMBER

VARCHAR2(10)

House number of the POI; may contain non-numeric characters. (Required)

STREET_NAME

VARCHAR2(80)

Road name of the POI. (Required)

SETTLEMENT_ID

NUMBER(10)

ID number of the settlement to which the POI belongs. (Required if the POI is associated with a settlement)

MUNICIPALITY_ID

NUMBER(10)

ID number of the municipality to which the POI belongs. (Required if the POI is associated with a municipality)

REGION_ID

NUMBER(10)

ID number of the region to which the POI belongs. (Required if the POI is associated with a region)

SETTLEMENT_NAME

VARCHAR2(64)

Name of the settlement to which the POI belongs. (Required if the POI is associated with a settlement)

MUNICIPALITY_NAME

VARCHAR2(64)

Name of the municipality to which the POI belongs. (Required if the POI is associated with a municipality)

REGION_NAME

VARCHAR2(64)

Name of the region to which the POI belongs. (Required if the POI is associated with a region)

POSTAL_CODE

VARCHAR2(16)

Postal code of the POI. (Required)

VANITY_CITY

VARCHAR2(35)

Name of the city popularly associated with the POI, if it is different from the actual city containing the POI. For example, the London Heathrow Airport is actually located in a town named Hayes, which is part of greater London, but people tend to associate the airport only with London. In this case, the VANITY_CITY value is London. (Optional)

ROAD_SEGMENT_ID

NUMBER

ID of the road segment on which the POI is located. (Required)

SIDE

VARCHAR2(1)

Side of the street on which the POI is located. Possible values: L (left) or R (right). (Required)

PERCENT

NUMBER

Percentage value at which the POI is located on the road. It is computed by dividing the distance from the street segment start point to the POI by the length of the street segment. (Required)

TELEPHONE_NUMBER

VARCHAR2(20)

Telephone number of the POI. (Optional)

LOC_LONG

NUMBER

Longitude coordinate value of the POI. (Required)

LOC_LAT

NUMBER

Latitude coordinate value of the POI. (Required)

COUNTRY_CODE_2

VARCHAR2(2)

2- letter ISO country code of the country to which the POI belongs. (Required)

PARTITION_ID

NUMBER

Partition key used for partitioning geocoder data by geographic boundaries. If the data is not partitioned, set the value to 1. (Required)

12.5.8 GC_POSTAL_CODE_<suffix> Table

The GC_POSTAL_CODE_<suffix> table (for example, GC_POSTAL_CODE_US) stores postal code information for the country or group of countries associated with the table-name suffix, if postal codes are used in the address format. This table contains one or more rows for each postal code; it may contain multiple rows for a postal code when the postal code is associated with multiple settlements. The GC_POSTAL_CODE_<suffix> table contains the columns shown in Table 12-14.

Table 12-14 GC_POSTAL_CODE_<suffix> Table

Column Name Data Type Description

POSTAL_CODE

VARCHAR2(16)

Postal code for the postal code area. (Required)

SETTLEMENT_NAME

VARCHAR2(64)

Name of the settlement to which the postal code belongs. (Required if the postal code is associated with a settlement)

MUNICIPALITY_NAME

VARCHAR2(64)

Name of the municipality to which the postal code belongs. (Required if the postal code is associated with a municipality)

REGION_NAME

VARCHAR2(64)

Name of the region to which the postal code belongs. (Required if the postal code is associated with a region)

LANG_CODE

VARCHAR2(3)

3-letter ISO national language code for the language associated with the area. (Required)

SETTLEMENT_ID

NUMBER(10)

ID number of the settlement to which the postal code belongs. (Required if the postal code is associated with a settlement)

MUNICIPALITY_ID

NUMBER(10)

ID number of the municipality to which the postal code belongs. (Required if the postal code is associated with a municipality)

REGION_ID

NUMBER(10)

ID number of the region to which the postal code belongs. (Required if the postal code is associated with a region)

CENTER_LONG

NUMBER

Longitude value of the center of the postal-code area. The center (longitude, latitude) value is set to the start- or end-point of the closest road segment to the center, depending on which point is closer. Oracle recommends that the CENTER_LONG and CENTER_LAT values be correctly set. If these values are not set, the longitude, latitude values of the geocoded result for an area will be (0,0). (Optional)

CENTER_LAT

NUMBER

Latitude value of the center of the area. (See the explanation for the CENTER_LONG column.) (Optional)

ROAD_SEGMENT_ID

NUMBER(10)

ID of the road segment to which the area center is set. This value must be set correctly if the geocoder is intended to work with the Oracle Spatial routing engine (described in Routing Engine); otherwise, it can be set to any nonzero value, but it cannot be null. (Required)

COUNTRY_CODE_2

VARCHAR2(2)

2- letter ISO country code of the country to which the area belongs. (Required)

PARTITION_ID

NUMBER

Partition key used for partitioning geocoder data by geographic boundaries. If the data is not partitioned, set the value to 1. (Required)

NUM_STREETS

NUMBER

The number of streets inside this postal code area. (Optional)

12.5.9 GC_ROAD_<suffix> Table

The GC_ROAD_<suffix> table (for example, GC_ROAD_US) stores road information for the country associated with the table name suffix. A road is a collection of road segments with the same name in the same settlement area; a road segment is defined in GC_ROAD_SEGMENT_<suffix> Table. The GC_ROAD_<suffix> table contains one or more rows for each road. (For example, it can contain multiple rows for a road if the road is associated with multiple settlements.) The GC_ROAD_<suffix> table contains the columns shown in Table 12-15.

Table 12-15 GC_ROAD_<suffix> Table

Column Name Data Type Description

ROAD_ID

NUMBER

ID number of the road. (Required)

SETTLEMENT_ID

NUMBER(10)

ID number of the settlement to which the road belongs. (Required if the road is associated with a settlement)

MUNICIPALITY_ID

NUMBER(10)

ID number of the municipality to which the road belongs. (Required if the road is associated with a municipality)

PARENT_AREA_ID

NUMBER(10)

ID number of the parent area of the municipality to which the road belongs. (Required if the road is associated with a parent area)

LANG_CODE

VARCHAR2(3)

3-letter ISO national language code for the language for the road name. (Required)

NAME

VARCHAR2(64)

Name of the road, including the type (if any), the prefix (if any), and the suffix (if any). For example, N Main St as NAME. (Required)

BASE_NAME

VARCHAR2(64)

Name of the road, excluding the type (if any), the prefix (if any), and the suffix (if any). For example, N Main St as NAME, with Main as BASE_NAME. (Required)

PREFIX

VARCHAR2(32)

Prefix of the road name. For example, N Main St as NAME, with N as PREFIX. (Required if the road name has a prefix)

SUFFIX

VARCHAR2(32)

Suffix of the road name. For example, Main St NW as NAME, with NW as SUFFIX. (Required if the road name has a suffix)

STYPE_BEFORE

VARCHAR2(32)

Street type that precedes the base name. For example, Avenue Victor Hugo as NAME, with Avenue as STYPE_BEFORE and Victor Hugo as BASE_NAME. (Required if the road type precedes the base name)

STYPE_AFTER

VARCHAR2(32)

Street type that follows the base name. For example, Main St as NAME, with St as STYPE_AFTER and Main as BASE_NAME. (Required if the road type follows the base name)

STYPE_ATTACHED

VARCHAR2(1)

Contains T if the street type is in the same word with the street name; contains F if the street type is a separate word from the street name. For example, in a German street address of 123 Beethovenstrass, the street type is strass, and it is in the same word with the street name, which is Beethoven. (Required)

START_HN

NUMBER(5)

The lowest house number on the road. It is returned when a specified house number is lower than this value.

CENTER_HN

NUMBER(5)

Leading numerical part of the center house number. The center house number is the left side house number at the start point of the center road segment, which is located in the center of the whole road. (See the explanation of house numbers after Table 12-16 in GC_ROAD_SEGMENT_<suffix> Table.) It is returned when no house number is specified in an input address. (Required)

END_HN

NUMBER(5)

The highest house number on the road. It is returned when a specified house number is higher than this value.

START_HN_SIDE

VARCHAR2(1)

Side of the road of the lowest house number: L for left or R for right.

CENTER_HN_SIDE

VARCHAR2(1)

Side of the road of the center house number: L for left or R for right. The center house number is the left side house number at the start point of the center road segment, which is located in the center of the whole road. (See the explanation of house numbers after Table 12-16 in GC_ROAD_SEGMENT_<suffix> Table.) (Required if there are houses on the road)

END_HN_SIDE

VARCHAR2(1)

Side of the road of the highest house number: L for left or R for right.

START_LONG

NUMBER

Longitude value of the lowest house number.

START_LAT

NUMBER

Latitude value of the lowest house number.

CENTER_LONG

NUMBER

Longitude value of the center house number. The center house number is the left side house number at the start point of the center road segment, which is located in the center of the whole road. (See the explanation of house numbers after Table 12-16 in GC_ROAD_SEGMENT_<suffix> Table.) (Required)

CENTER_LAT

NUMBER

Latitude value of the center house number. (See also the explanation of the CENTER_LONG column.) (Required)

END_LONG

NUMBER

Longitude value of the highest house number.

END_LAT

NUMBER

Latitude value of the highest house number.

START_ROAD_SEG_ID

NUMBER(5)

ID number of the road segment at the start of the road.

CENTER_ROAD_SEG_ID

NUMBER(5)

ID number of the road segment at the center point of the road. (Required)

END_ROAD_SEG_ID

NUMBER(5)

ID number of the road segment at the end of the road.

POSTAL_CODE

VARCHAR2(16)

Postal code for the road. (Required)

COUNTRY_CODE_2

VARCHAR2(2)

2- letter ISO country code of the country to which the road belongs. (Required)

PARTITION_ID

NUMBER

Partition key used for partitioning geocoder data by geographic boundaries. If the data is not partitioned, set the value to 1. (Required)

CENTER_HN2

VARCHAR2(10)

The second part of the center house number. (See the explanation of house numbers after Table 12-16 in GC_ROAD_SEGMENT_<suffix> Table.) (Required)

12.5.10 GC_ROAD_SEGMENT_<suffix> Table

The GC_ROAD_SEGMENT_<suffix> table (for example, GC_ROAD_SEGMENT_US) stores road segment information for the country associated with the table name suffix. A road segment is the portion of a road between two continuous intersections along the road; an intersection occurs when roads meet or cross each other. A road segment can also be the portion of a road between the start (or end) of the road and its closest intersection along the road, or it can be the entire length of a road if there are no intersections along the road. The GC_ROAD_SEGMENT_<suffix> table contains one row for each road segment, and it contains the columns shown in Table 12-16.

Table 12-16 GC_ROAD_SEGMENT_<suffix> Table

Column Name Data Type Description

ROAD_SEGMENT_ID

NUMBER

ID number of the road segment. A positive value, as explained in Relationship between Routing Engine and Geocoder. (Required)

ROAD_ID

NUMBER

ID number of the road containing this road segment. (Required)

L_ADDR_FORMAT

VARCHAR2(1)

Left side address format. Specify N if there are one or more house numbers on the left side of the road segment; leave null if there is no house number on the left side of the road segment. (Required)

R_ADDR_FORMAT

VARCHAR2(1)

Right side address format. Specify N if there are one or more house numbers on the right side of the road segment; leave null if there is no house number on the right side of the road segment. (Required)

L_ADDR_SCHEME

VARCHAR2(1)

Numbering scheme for house numbers on the left side of the road segment: O (all odd numbers), E (all even numbers), or M (mixture of odd and even numbers). (Required)

R_ADDR_SCHEME

VARCHAR2(1)

Numbering scheme for house numbers on the right side of the road segment: O (all odd numbers), E (all even numbers), or M (mixture of odd and even numbers). (Required)

START_HN

NUMBER(5)

The lowest house number on this road segment. (Required)

END_HN

NUMBER(5)

The highest house number on this road segment. (Required)

L_START_HN

NUMBER(5)

The leading numerical part of the left side starting house number. (See the explanation of house numbers after this table.) (Required)

L_END_HN

NUMBER(5)

The leading numerical part of the left side ending house number. (See the explanation of house numbers after this table.) (Required)

R_START_HN

NUMBER(5)

The leading numerical part of the right side starting house number. (See the explanation of house numbers after this table.) (Required)

R_END_HN

NUMBER(5)

The leading numerical part of the right side ending house number. (See the explanation of house numbers after this table.) (Required)

POSTAL_CODE

VARCHAR2(16)

Postal code for the road segment. If the left side and right side of the road segment belong to two different postal codes, create two rows for the road segment with identical values in all columns except for POSTAL_CODE. (Required)

GEOMETRY

SDO_GEOMETRY

Spatial geometry object representing the road segment. (Required)

COUNTRY_CODE_2

VARCHAR2(2)

2- letter ISO country code of the country to which the road segment belongs. (Required)

PARTITION_ID

NUMBER

Partition key used for partitioning geocoder data by geographic boundaries. If the data is not partitioned, set the value to 1. (Required)

L_START_HN2

VARCHAR2(10)

The second part of the left side starting house number. (See the explanation of house numbers after this table.) (Required if the left side starting house number has a second part)

L_END_HN2

VARCHAR2(10)

The second part of the left side ending house number. (See the explanation of house numbers after this table.) (Required if the left side ending house number has a second part)

R_START_HN2

VARCHAR2(10)

The second part of the right side starting house number. (See the explanation of house numbers after this table.) (Required if the right side starting house number has a second part)

R_END_HN2

VARCHAR2(10)

The second part of the right side ending house number. (See the explanation of house numbers after this table.) (Required if the right side ending house number has a second part)

A house number is a descriptive part of an address that helps identify the location of a establishment along a road segment. A house number is divided into two parts: the leading numerical part and the second part, which is the rest of the house number. The leading numerical part is the numerical part of the house number that starts from the beginning of the complete house number string and ends just before the first non-numeric character (if present). If the house number contains non-numeric characters, the second part of the house number is the portion from the first non-numeric character through the last character of the string. For example, if the house number is 123, the leading numerical part is 123 and the second part is null; however, if the house number is 123A23, the leading numerical part is 123 and the second part is A23.

The starting house number is the house number at the start point of a road segment; the start point of the road segment is the first shape point of the road segment geometry. The ending house number is the house number at the end point of a road segment; the end point of the road segment is the last shape point of the road segment geometry. The left and right side starting house numbers do not need to be lower than the left and right side ending house numbers. The house number attributes in the data tables follow these conventions in locating establishments along road segments.

12.5.11 Indexes on Tables for Geocoding

To use the vendor-supplied tables for geocoding, indexes must be created on many of the tables, and the names of these indexes must follow certain requirements.

Example 12-5 lists the format of CREATE INDEX statements that create the required indexes. In each statement, you must use the index name, table name, column names, and (if multiple columns are indexed) sequence of column names as shown in Example 12-5, except that you must replace all occurrences of <suffix> with the appropriate string (for example, US for the United States). Note that the first index in the example is a spatial index. Optionally, you can also include other valid keywords and clauses in the CREATE INDEX statements.

Example 12-5 Required Indexes on Tables for Geocoding

CREATE INDEX idx_<suffix>_road_geom ON gc_road_segment_<suffix> (geometry) INDEXTYPE IS mdsys.spatial_index_v2;
CREATE INDEX idx_<suffix>_road_seg_rid ON gc_road_segment_<suffix> (road_id, start_hn, end_hn);
CREATE INDEX idx_<suffix>_road_id ON gc_road_<suffix> (road_id);
CREATE INDEX idx_<suffix>_road_setbn ON gc_road_<suffix> (settlement_id, base_name);
CREATE INDEX idx_<suffix>_road_munbn ON gc_road_<suffix> (municipality_id, base_name);
CREATE INDEX idx_<suffix>_road_parbn ON gc_road_<suffix> (parent_area_id, country_code_2, base_name);
CREATE INDEX idx_<suffix>_road_setbnsd ON gc_road_<suffix> (settlement_id, soundex(base_name));
CREATE INDEX idx_<suffix>_road_munbnsd ON gc_road_<suffix> (municipality_id, soundex(base_name));
CREATE INDEX idx_<suffix>_road_parbnsd ON gc_road_<suffix> (parent_area_id, country_code_2, soundex(base_name));
CREATE INDEX idx_<suffix>_inters ON gc_intersection_<suffix> (country_code_2, road_id_1, road_id_2);
CREATE INDEX idx_<suffix>_area_name_id ON gc_area_<suffix> (country_code_2, area_name, admin_level);
CREATE INDEX idx_<suffix>_area_id_name ON gc_area_<suffix> (area_id, area_name, country_code_2);
CREATE INDEX idx_<suffix>_poi_name ON gc_poi_<suffix> (country_code_2, name);
CREATE INDEX idx_<suffix>_poi_setnm ON gc_poi_<suffix> (country_code_2, settlement_id, name);
CREATE INDEX idx_<suffix>_poi_ munnm ON gc_poi_<suffix> (country_code_2, municipality_id, name);
CREATE INDEX idx_<suffix>_poi_ regnm ON gc_poi_<suffix> (country_code_2, region_id, name);
CREATE INDEX idx_<suffix>_ postcode ON gc_postal_code_<suffix> (country_code_2, postal_code);
CREATE INDEX idx_<suffix>_addrpt_addr ON gc_address_point_<suffix> (road_segment_id, road_id, house_number, side);