Results 1 -
3 of
3
Automated probabilistic address standardisation and verification
- in ‘Australasian Data Mining Conference’ (AusDM’05
, 2005
"... Abstract. Addresses are a key part of many records containing information about people and organisations, and it is therefore important that accurate address information is available before such data is mined or stored in data warehouses. Unfortunately, addresses are often captured in non-standard a ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract. Addresses are a key part of many records containing information about people and organisations, and it is therefore important that accurate address information is available before such data is mined or stored in data warehouses. Unfortunately, addresses are often captured in non-standard and free-text formats, usually with some degree of spelling and typographical errors. Additionally, addresses change over time, for example when people move, when streets are renamed, or when new suburbs are built. Cleaning and standardising addresses, as well as verifying if they really exist, are therefore important steps in data mining pre-processing. In this paper we present an automated probabilistic approach based on a hidden Markov model (HMM), which uses national address guidelines and a comprehensive national address database to clean, standardise and verify raw input addresses. Initial experiments show that our system can correctly standardise even complex and unusual addresses.
Crosslingual Location Search
"... Address geocoding, the process of finding the map location for a structured postal address, is a relatively well-studied problem. In this paper we consider the more general problem of crosslingual location search, where the queries are not limited to postal addresses, and the language and script use ..."
Abstract
- Add to MetaCart
Address geocoding, the process of finding the map location for a structured postal address, is a relatively well-studied problem. In this paper we consider the more general problem of crosslingual location search, where the queries are not limited to postal addresses, and the language and script used in the search query is different from the one in which the underlying data is stored. To the best of our knowledge, our system is the first crosslingual location search system that is able to geocode complex addresses. We use a statistical machine transliteration system to convert location names from the script of the query to that of the stored data. However, we show that it is not sufficient to simply feed the resulting transliterations into a monolingual geocoding system, as the ambiguity inherent in the conversion drastically expands the location search space and significantly lowers the quality of results. The strength of our approach lies in its integrated, end-toend nature: we use abstraction and fuzzy search (in the text domain) to achieve maximum coverage despite transliteration ambiguities, while applying spatial constraints (in the geographic domain) to focus only on viable interpretations of the query. Our experiments with structured and unstructured queries in a set of diverse languages and scripts (Arabic, English, Hindi and Japanese) searching for locations in different regions of the world, show full crosslingual location search accuracy at levels comparable to that of commercial monolingual systems. We achieve these levels of performance using techniques that may be applied to crosslingual searches in any language/script, and over arbitrary spatial data.
SEMANTIC SELECTION OF GEOREFERENCING SERVICES FOR URBAN MANAGEMENT
"... SUMMARY: Geocoding has become one of the most popular on-line services. Nowadays, there exist many Web Services providing geocoding functionality which differ not only in technological aspects (interface, invocation style, etc.) or terms of use, but also in type of geographic information provided an ..."
Abstract
- Add to MetaCart
SUMMARY: Geocoding has become one of the most popular on-line services. Nowadays, there exist many Web Services providing geocoding functionality which differ not only in technological aspects (interface, invocation style, etc.) or terms of use, but also in type of geographic information provided and spatial data quality. Currently, there is no problem to find geocoding providers but to choose the proper one, which is determined by user’s requirements. The public administration is responsible for providing the official and appropriate information for citizens or deal with risk and health management issues; therefore, urban management systems require geocoding services of high quality in terms of quality of service (QoS) and spatial data. However, the present Web geocoding public market is dominated by geocoding services for average users, i.e. users that can accept low QoS or are not interested in the lineage of data. In this situation, a compound geocoder can join data from several services of geographic information and provide services oriented to the users that demand high quality, for example urban management systems. This paper presents an architectural approach for adaptive compound geocoding Web services built on diverse Web services of geographic information, such as gazetteers, cadastral services and address geocoding services. The proposed architecture is characterized by extensibility and adaptivity thanks to application of the ontologies (Administrative Unit Applied Ontology of Spain and Service Characteristic Ontology) and advances in Semantic Web related to strategies for source selection and

