Results 1 
6 of
6
A Survey of Automatic Query Expansion in Information Retrieval
"... The relative ineffectiveness of information retrieval systems is largely caused by the inaccuracy with which a query formed by a few keywords models the actual user information need. One well known method to overcome this limitation is automatic query expansion (AQE), whereby the user’s original que ..."
Abstract

Cited by 55 (2 self)
 Add to MetaCart
(Show Context)
The relative ineffectiveness of information retrieval systems is largely caused by the inaccuracy with which a query formed by a few keywords models the actual user information need. One well known method to overcome this limitation is automatic query expansion (AQE), whereby the user’s original query is augmented by new features with a similar meaning. AQE has a long history in the information retrieval community but it is only in the last years that it has reached a level of scientific and experimental maturity, especially in laboratory settings such as TREC. This survey presents a unified view of a large number of recent approaches to AQE that leverage various data sources and employ very different principles and techniques. The following questions are addressed: Why is query expansion so important to improve search effectiveness? What are the main steps involved in the design and implementation of an AQE component? What approaches to AQE are available and how do they compare? Which issues must still be resolved before AQE becomes a standard component of large operational information retrieval systems (e.g., search engines)?
Collecting novel technical terms from the Web by estimating the domain specificity of a term
 In
, 2006
"... Abstract. This paper proposes a method of domain specificity estimation of technical terms using the Web. In the proposed method, it is assumed that, for a certain technical domain, a list of known technical terms of the domain is given. Technical documents of the domain are collected through the ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. This paper proposes a method of domain specificity estimation of technical terms using the Web. In the proposed method, it is assumed that, for a certain technical domain, a list of known technical terms of the domain is given. Technical documents of the domain are collected through the Web search engine, which are then used for generating a vector space model for the domain. The domain specificity of a target term is estimated according to the distribution of the domain of the sample pages of the target term. We apply this technique of estimating domain specificity of a term to the task of discovering novel technical terms that are not included in any of existing lexicons of technical terms of the domain. Out of randomly selected 1,000 candidates of technical terms per a domain, we discovered about 100 ∼ 200 novel technical terms. 1
RESEARCH STATEMENT
"... My primary research interests in algebraic geometry lie in the Minimal Model Program and its applications, moduli spaces of stable maps (curves), and moduli spaces of branchvarieties. Recently I have also paid close attention to the new advances in computational algebraic geometry. 1 Minimal Model P ..."
Abstract
 Add to MetaCart
(Show Context)
My primary research interests in algebraic geometry lie in the Minimal Model Program and its applications, moduli spaces of stable maps (curves), and moduli spaces of branchvarieties. Recently I have also paid close attention to the new advances in computational algebraic geometry. 1 Minimal Model Program and its applications In many branches of mathematics, classifications among objects up to certain relations are central themes. For example, topologists can classify topological spaces up to homeomorphism or up to the weaker relation of homotopy. Similarly, in algebraic geometry, we classify algebraic varieties up to either isomorphism or a weaker relation, birational equivalence (two varieties X and Y are birationally equivalent if there exist rational maps f: X − → Y and g: Y − → X such that g ◦f and f ◦ g are identity maps on some open subsets U ⊂ X and V ⊂ Y). Among algebraic varieties in the same birational equivalence class, we want to single out some “good ” representatives. Such good representatives are called minimal models. It is well known that every surface has a minimal model. Is there a minimal model for every higher dimensional algebraic variety? The answer was unknown for a long period of time even for threefolds. At first people tried to find a minimal model in the smooth category, but this turned out to be impossible. Gradually people realized that one can only
Web Mining for Unsupervised Classification
"... Data acquisition is a major concern in text classification. The excessive human efforts required by conventional methods to build up quality training collection might not always be available to research workers. In this paper, we look into possibilities to automatically collect training data by samp ..."
Abstract
 Add to MetaCart
(Show Context)
Data acquisition is a major concern in text classification. The excessive human efforts required by conventional methods to build up quality training collection might not always be available to research workers. In this paper, we look into possibilities to automatically collect training data by sampling the Web with a set of given class names. The basic idea is to populate appropriate keywords and submit them as queries to search engines for acquiring training data. Two methods are presented in this study: One method is based on sampling the common concepts among the classes, and the other based on sampling the discriminative concepts for each class. A series of experiments were carried out independently on two different datasets, and the result shows that the proposed methods significantly improve classifier performance even without using manually labeled training data. Our strategy for 53 retrieving Web samples, we find that, is substantially helpful in conventional document classification in terms of accuracy and efficiency.
Article in Systems and Computers in Japan · December 2007 DOI: 10.1002/scj.20852 · Source: DBLP CITATIONS 8 READS
"... All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
 Add to MetaCart
All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
unknown title
"... This paper proposes a method of domain classification of technical terms using the Web. In the proposed method, it is assumed that, for a certain technical domain, a list of known technical terms of the domain is given. Technical documents of the domain are collected through the Web search engine, ..."
Abstract
 Add to MetaCart
This paper proposes a method of domain classification of technical terms using the Web. In the proposed method, it is assumed that, for a certain technical domain, a list of known technical terms of the domain is given. Technical documents of the domain are collected through the Web search engine, which are then used for generating a vector space model for the domain. The domain specificity of a target term is estimated according to the distribution of the domain of the sample pages of the target term. Experimental evaluation results show that the proposed method of domain classification of a technical term achieved mostly 90 % precision/recall. We then apply this technique of estimating domain specificity of a term to the task of discovering novel technical terms that are not included in any existing lexicons of technical terms of the domain. Out of 1000 randomly selected candidates of technical terms per domain, we discovered about 100 to 200