• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Efficient support vector classifiers for named entity recognition (2002)

by Hideki Isozaki, Hideto Kazawa
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 19
Next 10 →

Towards large-scale, open-domain and ontology-based named entity classification

by Johanna Völker - In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’05 , 2005
"... ..."
Abstract - Cited by 29 (6 self) - Add to MetaCart
Abstract not found

SVM Based Learning System For Information Extraction

by Yaoyong Li, Kalina Bontcheva, Hamish Cunningham - In Proceedings of Sheffield Machine Learning Workshop, Lecture Notes in Computer Science , 2005
"... Abstract. This paper presents an SVM-based learning system for information extraction (IE). One distinctive feature of our system is the use of a variant of the SVM, the SVM with uneven margins, which is particularly helpful for small training datasets. In addition, our approach needs fewer SVM clas ..."
Abstract - Cited by 16 (7 self) - Add to MetaCart
Abstract. This paper presents an SVM-based learning system for information extraction (IE). One distinctive feature of our system is the use of a variant of the SVM, the SVM with uneven margins, which is particularly helpful for small training datasets. In addition, our approach needs fewer SVM classifiers to be trained than other recent SVM-based systems. The paper also compares our approach to several state-of-theart systems (including rule learning and statistical learning algorithms) on three IE benchmark datasets: CoNLL-2003, CMU seminars, and the software jobs corpus. The experimental results show that our system outperforms a recent SVM-based system on CoNLL-2003, achieves the highest score on eight out of 17 categories on the jobs corpus, and is second best on the remaining nine. 1

A Search Result Clustering Method using Informatively Named Entities

by Hiroyuki Toda, Hikarinooka Yokosuka-shi - In Proceeding of the 7th ACM International Workshop on Web Information and Data Management (WIDM , 2005
"... Clustering the results of a search helps the user to overview the information returned. In this paper, we regard the clustering task as indexing the search results. Here, an index means a structured label list that can makes it easier for the user to comprehend the labels and search results. To real ..."
Abstract - Cited by 12 (1 self) - Add to MetaCart
Clustering the results of a search helps the user to overview the information returned. In this paper, we regard the clustering task as indexing the search results. Here, an index means a structured label list that can makes it easier for the user to comprehend the labels and search results. To realize this goal, we make three proposals. First is to use Named Entity Extraction for term extraction. Second is a new label selecting criterion based on importance in the search result and the relation between terms and search queries. The third is label categorization using category information of labels, which is generated by NE extraction. We implement a prototype system based on these proposals and find that it offers much higher performance than existing methods; we focus on news articles in this paper.

Machine Learning for Information Extraction in Genomics - State of the Art and Perspectives, Text Mining and its Applications

by C. Nédellec - Results of the NEMIS Launch Conference Series: Studies in Fuzziness and Soft Computing, Sirmakessis, Spiros (Ed.), Springer Verlag. Nédellec C., Ould Abdel Vetah M. and Bessières P , 2004
"... The considerable development of multimedia communication goes along with an exponentially increasing volume of ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
The considerable development of multimedia communication goes along with an exponentially increasing volume of

Thai Named Entity Extraction by incorporating Maximum Entropy Model with Simple Heuristic Information

by Hutchatai Chanlekha, Asanee Kawtrakul - Proc. IJCNLP , 2004
"... The role of Named entity (NE) extraction is very important in many NLP tasks, such as information extraction, etc. In Thai, the problems of NE extraction are much more difficult due to the characteristics of Thai language, that are lack of orthographical information to signal NEs, and no boundary in ..."
Abstract - Cited by 4 (4 self) - Add to MetaCart
The role of Named entity (NE) extraction is very important in many NLP tasks, such as information extraction, etc. In Thai, the problems of NE extraction are much more difficult due to the characteristics of Thai language, that are lack of orthographical information to signal NEs, and no boundary indicator between words. In this paper, we present Thai NE extraction system by using Maximum Entropy model, with heuristic information and dictionary. Our system is divided into three steps. The first step is to identify the boundary of candidate NE that composes of many words by using heuristic rules, dictionary and statistic of word cooccurrence. The second step is NE extraction by using Maximum Entropy model. The final step is to extract the undiscovered NE by matching the extracted NEs against the rest of document. On Thai political news test data, the evaluation of the system shows that the Fmeasures of person, location, and organization names are 90.44%, 82.16 % and 89.87 % respectively. 1

splitSVM: Fast, space-efficient, non-heuristic, polynomial kernel computation for NLP applications

by Yoav Goldberg, Michael Elhadad - In Proceedings of the 46st Annual Meeting of the Association of Computational Linguistics (ACL , 2008
"... We present a fast, space efficient and nonheuristic method for calculating the decision function of polynomial kernel classifiers for NLP applications. We apply the method to the MaltParser system, resulting in a Java parser that parses over 50 sentences per second on modest hardware without loss of ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
We present a fast, space efficient and nonheuristic method for calculating the decision function of polynomial kernel classifiers for NLP applications. We apply the method to the MaltParser system, resulting in a Java parser that parses over 50 sentences per second on modest hardware without loss of accuracy (a 30 time speedup over existing methods). The method implementation is available as the open-source splitSVM Java library. 1

Improving NER in Arabic Using a Morphological Tagger

by Benjamin Farber, Dayne Freitag, Nizar Habash, Owen Rambow
"... We discuss a named entity recognition system for Arabic, and show how we incorporated the information provided by MADA, a full morphological tagger which uses a morphological analyzer. Surprisingly, the relevant features used are the capitalization of the English gloss chosen by the tagger, and the ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
We discuss a named entity recognition system for Arabic, and show how we incorporated the information provided by MADA, a full morphological tagger which uses a morphological analyzer. Surprisingly, the relevant features used are the capitalization of the English gloss chosen by the tagger, and the fact that an analysis is returned (that a word is not OOV to the morphological analyzer). The use of the tagger also improves over a third system which just uses a morphological analyzer, yielding a 14 % reduction in error over the baseline. We conduct a thorough error analysis to identify sources of success and failure among the variations, and show that by combining the systems in simple ways we can significantly influence the precision-recall trade-off. 1.

A Clustering Method for News Articles Retrieval System

by Hiroyuki Toda, Ryoji Kataoka, Hikarinooka Yokosuka-shi , 2005
"... Organizing the results of a search facilitates the user in overviewing the information returned. We regard the clustering task as the tasks of making labels for a list of items and we focus on news articles and propose a clustering method that uses named entity extraction. ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Organizing the results of a search facilitates the user in overviewing the information returned. We regard the clustering task as the tasks of making labels for a list of items and we focus on news articles and propose a clustering method that uses named entity extraction.

Weighted Kernel Functions for SVM Learning in String Domains: A Distance Function Viewpoint

by Bram Vanschoenwinkel, Feng Liu, Bernard Manderick - In Proceedings of ICMLC (International Conference on Machine Learning and Cybernetics , 2005
"... This paper extends the idea of weighted distance functions to kernels and support vector machines. Here, we focus on applications that rely on sliding a window over a sequence of string data. For this type of problems it is argued that a symbolic, context-based representation of the data should be p ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
This paper extends the idea of weighted distance functions to kernels and support vector machines. Here, we focus on applications that rely on sliding a window over a sequence of string data. For this type of problems it is argued that a symbolic, context-based representation of the data should be preferred over a continuous, real format as this is a much more intuitive setting for working with (weighted) distance functions. It is shown how a weighted string distance can be decomposed and subsequently used in di#erent kernel functions and how these kernel functions correspond to inner products between real vectors. As a case-study named entity recognition is used with information gain ratio as a weighting scheme.

NIL is not Nothing: Recognition of Chinese Network Informal Language Expressions

by Yunqing Xia, Kam-fai Wong, Wei Gao - 4th SIGHAN Workshop at IJCNLP'05 , 2005
"... Informal language is actively used in network-mediated communication, e.g. chat room, BBS, email and text message. We refer the anomalous terms used in such context as network informal language (NIL) expressions. For example, “�(ou3) ” is used to replace “�(wo3) ” in Chinese ICQ. Without unconventio ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
Informal language is actively used in network-mediated communication, e.g. chat room, BBS, email and text message. We refer the anomalous terms used in such context as network informal language (NIL) expressions. For example, “�(ou3) ” is used to replace “�(wo3) ” in Chinese ICQ. Without unconventional resource, knowledge and techniques, the existing natural language processing approaches exhibit less effectiveness in dealing with NIL text. We propose to study NIL expressions with a NIL corpus and investigate techniques in processing NIL expressions. Two methods for Chinese NIL expression
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University