The Address GeoTagger returns geographical information for a valid global address.
The geographical information includes all of the possible administrative divisions for a specific address, as well as the latitude and longitude information for that address. The Address GeoTagger only runs on valid, unambiguous addresses which correspond to a city. In addition, the length of the input text must be less than or equal to 350 characters.
For triggering on auto-enrichment, the Address GeoTagger requires two or more match points to exist. For a postcode to match, it must be accompanied by a country.
The final example ("Boston US") returns information for Boston, Massachusetts because even though there are several cities and towns named "Boston" in the US, Boston, Massachusetts has the highest population of all the cities named "Boston" in the US.
Note that for this module to run automatically, the minimum requirement is that the city plus either a state or a postcode are specified.
400 Oracle Parkway, Redwood City, CA 94065produces the same results as supplying only the city and state:
Redwood City, CA
GeoNames data
The information returned by this geocode tagger comes from the GeoNames geographical database, which is included as part of the Data Enrichment package in Big Data Discovery.
Configuration options
This module is run (on well-formed addresses) during a Data Processing sampling operation. However, there are no configuration options for such an operation.
Output
The output information includes the latitude and longitude, as well as all levels of administrative areas.
<attribute>_geo_geocode
— the latitude and longitude values of the address (such as "42.35843 -71.05977").<attribute>_geo_city
— corresponds to a city (such as "Boston").<attribute>_geo_country
— the country code (such as "US").<attribute>_geo_postcode
— corresponds to a postcode, such as a zip code in the US (such as "02117").<attribute>_geo_region
— corresponds to a geographical region, such as a state in the US (such as "Massachusetts").<attribute>_geo_regionid
— the ID of the region in the GeoNames database (such as "6254926" for Massachusetts).<attribute>_geo_subregion
— corresponds to a geographical sub-region, such as a county in the US (such as "Suffolk County").<attribute>_geo_subregionid
— the ID of the sub-region in the GeoNames database (such as "4952349" for Suffolk County in Massachusetts).All are output as single-assign string (mdex:string
) attributes, except for Geocode
which is a single-assign geocode (mdex:geocode
) attribute.
Note that if an invalid input is provided (such as a zip code that is not valid for a city and state), the output may be NULL.
Examples
ext_geo_city Boston ext_geo_country US ext_geo_geocode 42.35843 -71.05977 ext_geo_postcode 02117 ext_geo_region Massachusetts ext_geo_regionid 6254926 ext_geo_subregion Suffolk Country ext_geo_subregionid 4952349
ext_geo_city City of London ext_geo_country GB ext_geo_geocode 51.51279 -0.09184 ext_geo_postcode ec4r ext_geo_region England ext_geo_regionid 6269131 ext_geo_subregion Greater London ext_geo_subregionid 2648110