Address GeoTagger

The Address GeoTagger returns geographical information for a valid global address.

The geographical information includes all of the possible administrative divisions for a specific address, as well as the latitude and longitude information for that address. The Address GeoTagger only runs on valid, unambiguous addresses which correspond to a city. In addition, the length of the input text must be less than or equal to 350 characters.

For triggering on auto-enrichment, the Address GeoTagger requires two or more match points to exist. For a postcode to match, it must be accompanied by a country.

Some valid formats are:
  • City + State
  • City + State + Postcode
  • City + Postcode
  • Postcode + Country
  • City + State + Country
  • City + Country (if the country has multiple cities of that name, information is returned for the city with the largest population)
For example, these inputs generate geographical information for the city of Boston, Massachusetts:
  • Boston, MA (or Boston, Massachusetts)
  • Boston, Massachusetts 02116
  • 02116 US
  • Boston, MA US
  • Boston US

The final example ("Boston US") returns information for Boston, Massachusetts because even though there are several cities and towns named "Boston" in the US, Boston, Massachusetts has the highest population of all the cities named "Boston" in the US.

Note that for this module to run automatically, the minimum requirement is that the city plus either a state or a postcode are specified.

Keep in mind that regardless of the input address, the geographical resolution does not get finer than the city level. For example, this module will not resolve down to the street level if given a full address. In other words, this full address input:
400 Oracle Parkway, Redwood City, CA 94065
produces the same results as supplying only the city and state:
Redwood City, CA

GeoNames data

The information returned by this geocode tagger comes from the GeoNames geographical database, which is included as part of the Data Enrichment package in Big Data Discovery.

Configuration options

This module is run (on well-formed addresses) during a Data Processing sampling operation. However, there are no configuration options for such an operation.

Output

The output information includes the latitude and longitude, as well as all levels of administrative areas.

Depending on the country, the output attributes consist of these administrative divisions, as well as the geocode of the address:
  • <attribute>_geo_geocode — the latitude and longitude values of the address (such as "42.35843 -71.05977").
  • <attribute>_geo_city — corresponds to a city (such as "Boston").
  • <attribute>_geo_country — the country code (such as "US").
  • <attribute>_geo_postcode — corresponds to a postcode, such as a zip code in the US (such as "02117").
  • <attribute>_geo_region — corresponds to a geographical region, such as a state in the US (such as "Massachusetts").
  • <attribute>_geo_regionid — the ID of the region in the GeoNames database (such as "6254926" for Massachusetts).
  • <attribute>_geo_subregion — corresponds to a geographical sub-region, such as a county in the US (such as "Suffolk County").
  • <attribute>_geo_subregionid — the ID of the sub-region in the GeoNames database (such as "4952349" for Suffolk County in Massachusetts).

All are output as single-assign string (mdex:string) attributes, except for Geocode which is a single-assign geocode (mdex:geocode) attribute.

Note that if an invalid input is provided (such as a zip code that is not valid for a city and state), the output may be NULL.

Examples

The following output might be returned for the "Boston, Massachusetts USA" address:
ext_geo_city              Boston
ext_geo_country           US
ext_geo_geocode           42.35843 -71.05977
ext_geo_postcode          02117
ext_geo_region            Massachusetts
ext_geo_regionid          6254926
ext_geo_subregion         Suffolk Country
ext_geo_subregionid       4952349
This sample output is for the "London England" address:
ext_geo_city              City of London
ext_geo_country           GB
ext_geo_geocode           51.51279 -0.09184
ext_geo_postcode          ec4r
ext_geo_region            England
ext_geo_regionid          6269131
ext_geo_subregion         Greater London
ext_geo_subregionid       2648110