@MISC{Peng01automaticmulti-lingual, author = {Fuchun Peng}, title = {Automatic Multi-Lingual Information Extraction}, year = {2001} }
Share
OpenURL
Abstract
Information Extraction(IE) is a burgeoning technique because of the explosion of internet. So far, most of the IE systems are focusing on English text; and most of them are in the supervised learning framework, which requires large amount of human labor; and most of them can only work in narrow domain, which is domain dependent. These systems are difficult to be ported to other languages, other domains because of these inherent shortcomings. Currently, besides western languages like English, there are many other Asian languages which are much different from English. In English, words are delimited by white-spaces so computer can easily tokenize the input text string. In many languages like Chinese, Japanese, Thai and Korea, they do not have word boundaries between words. This poses a difficult problem for the information extraction for those languages.