Results 1 - 10
of
41
Context in Web Search
- IEEE Data Engineering Bulletin
, 2000
"... Web search engines generally treat search requests in isolation. The results for a given query are identical, independent of the user, or the context in which the user made the request. Nextgeneration search engines will make increasing use of context information, either by using explicit or implici ..."
Abstract
-
Cited by 100 (0 self)
- Add to MetaCart
Web search engines generally treat search requests in isolation. The results for a given query are identical, independent of the user, or the context in which the user made the request. Nextgeneration search engines will make increasing use of context information, either by using explicit or implicit context information from users, or by implementing additional functionality within restricted contexts. Greater use of context in web search may help increase competition and diversity on the web.
Using Web Structure for Classifying and Describing Web Pages
, 2002
"... The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents (documents that link to the target document) for search. We analyze the relative utility of document text, and the text i ..."
Abstract
-
Cited by 70 (3 self)
- Add to MetaCart
The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents (documents that link to the target document) for search. We analyze the relative utility of document text, and the text in citing documents near the citation, for classification and description. Results show that the text in citing documents, when available, often has greater discriminative and descriptive power than the text in the target document itself. The combination of evidence from a document and citing documents can improve on either information source alone. Moreover, by ranking words and phrases in the citing documents according to expected entropy loss, we are able to accurately name clusters of web pages, even with very few positive examples. Our results confirm, quantify, and extend previous research using web sn'ucture in these areas, introducing new methods for classification and description of pages.
Personalized web search by mapping user queries to categories
, 2002
"... Current web search engines are built to serve all users, independent of the needs of any individual user. Personalization of web search is to carry out retrieval for each user incorporating his/her interests. We propose a novel technique to map a user query to a set of categories, which represent th ..."
Abstract
-
Cited by 61 (1 self)
- Add to MetaCart
Current web search engines are built to serve all users, independent of the needs of any individual user. Personalization of web search is to carry out retrieval for each user incorporating his/her interests. We propose a novel technique to map a user query to a set of categories, which represent the user's search intention. This set of categories can serve as a context to disambiguate the words in the user's query. A user profile and a general profile are learned from the user's search history and a category hierarchy respectively. These two profiles are combined to map a user query into a set of categories. Several learning and combining algorithms are evaluated and found to be effective. Among the algorithms to learn a user profile, we choose the Rocchio-based method for its simplicity, efficiency and its ability to be adaptive. Experimental results indicate that our technique to personalize web search is both effective and efficient.
Probabilistic question answering on the Web
- Journal of the American Society for Information Science and Technology
, 2002
"... Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five step ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this paper we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search en-1 Radev et al. 2 gines and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of.20 on the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
Personalized Web search for improving retrieval effectiveness
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2004
"... Current Web search engines are built to serve all users, independent of the special needs of any individual user. Personalization of Web search is to carry out retrieval for each user incorporating his/her interests. We propose a novel technique to learn user profiles from users’ search histories. T ..."
Abstract
-
Cited by 38 (1 self)
- Add to MetaCart
Current Web search engines are built to serve all users, independent of the special needs of any individual user. Personalization of Web search is to carry out retrieval for each user incorporating his/her interests. We propose a novel technique to learn user profiles from users’ search histories. The user profiles are then used to improve retrieval effectiveness in Web search. A user profile and a general profile are learned from the user’s search history and a category hierarchy, respectively. These two profiles are combined to map a user query into a set of categories which represent the user’s search intention and serve as a context to disambiguate the words in the user’s query. Web search is conducted based on both the user query and the set of categories. Several profile learning and category mapping algorithms and a fusion algorithm are provided and evaluated. Experimental results indicate that our technique to personalize Web search is both effective and efficient.
Building Minority Language Corpora by Learning to Generate Web Search Queries
- Knowledge and Information Systems
, 2000
"... The Web is an obvious source of valuable information but the process of collecting, organizing and utilizing these resources is difficult. We describe CorpusBuilder, an approach for automatically generating Web-search queries for collecting documents matching a minority concept. We use the concept o ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
The Web is an obvious source of valuable information but the process of collecting, organizing and utilizing these resources is difficult. We describe CorpusBuilder, an approach for automatically generating Web-search queries for collecting documents matching a minority concept. We use the concept of text documents belonging to a minority natural language on the Web. Individual documents are automatically labeled as relevant or non-relevant using a language filter and the feedback is used to learn what query-lengths and inclusion/exclusion term-selection methods are helpful for finding previously unseen documents in the target language. Our system learns to select good query terms using a variety of term scoring methods. We find that using odds-ratio scores calculated over the documents acquired so far was one of the most consistently accurate query-generation methods. We also parameterize the query length using a Gamma distribution and present empirical results with learning methods that vary the time horizon used when learning from the results of past queries. We find that our systems performs well whether we initialize it with a whole document, or with a handful of words elicited from a user. Experiments applying the same approach to multiple languages are also presented showing that our approach generalizes well across several languages regardless of the initial conditions. 1.
Extracting Query Modifications from Nonlinear SVMs
, 2002
"... When searching the WWW, users often desire results restricted to a particular document category. Ideally, a user would be able to filter results with a text classifier to minimize false positive results; however, current search engines allow only simple query modifications. To automate the process o ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
When searching the WWW, users often desire results restricted to a particular document category. Ideally, a user would be able to filter results with a text classifier to minimize false positive results; however, current search engines allow only simple query modifications. To automate the process of generating effective query modifications, we introduce a sensitivity analysis-based method for extracting rules from nonlinear support vector machines. The proposed method allows the user to specify a desired precision while attempting to maximize the recall. Our method performs several levels of dimensionality reduction and is vastly faster than searching the combination feature space; moreover, it is very effective on real-world data.
PEBL: Web Page Classification without Negative Examples
- IEEE Transactions on Knowledge and Data Engineering
, 2004
"... Web page classification is one of the essential techniques for Web mining because classifying Web pages of an interesting class is often the first step of mining the Web. However, constructing a classifier for an interesting class requires laborious preprocessing such as collecting positive and ne ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Web page classification is one of the essential techniques for Web mining because classifying Web pages of an interesting class is often the first step of mining the Web. However, constructing a classifier for an interesting class requires laborious preprocessing such as collecting positive and negative training examples. For instance, in order to construct a "homepage" classifier, one needs to collect a sample of homepages (positive examples) and a sample of nonhomepages (negative examples). In particular, collecting negative training examples requires arduous work and caution to avoid bias. This paper presents a framework, called Positive Example Based Learning (PEBL), for Web page classification which eliminates the need for manually collecting negative training examples in preprocessing. The PEBL framework applies an algorithm, called Mapping-Convergence (M-C), to achieve high classification accuracy (with positive and unlabeled data) as high as that of a traditional SVM (with positive and negative data). M-C runs in two stages: the mapping stage and convergence stage. In the mapping stage, the algorithm uses a weak classifier that draws an initial approximation of "strong" negative data. Based on the initial approximation, the convergence stage iteratively runs an internal classifier (e.g., SVM) which maximizes margins to progressively improve the approximation of negative data. Thus, the class boundary eventually converges to the true boundary of the positive class in the feature space. We present the M-C algorithm with supporting theoretical and experimental justifications. Our experiments show that, given the same set of positive examples, the M-C algorithm outperforms one-class SVMs, and it is almost as accurate as the traditional SVMs.
Further Experiments on Collaborative Ranking in Community-Based Web Search
- Artificial Intelligence Review
, 2004
"... Abstract. As the search engine arms-race continues, search engines are constantly looking for ways to improve the manner in which they respond to user queries. Given the vagueness of Web search queries, recent research has focused on ways to introduce context into the search process as a means of cl ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
Abstract. As the search engine arms-race continues, search engines are constantly looking for ways to improve the manner in which they respond to user queries. Given the vagueness of Web search queries, recent research has focused on ways to introduce context into the search process as a means of clarifying vague, underspecified or ambiguous query terms. In this paper we describe a novel approach to using context in Web search that seeks to personalize the results of a generic search engine for the needs of a specialist community of users. In particular we describe two separate evaluations in detail that demonstrate how the collaborative search method has the potential to deliver significant search-performance benefits to endusers while avoiding many of the privacy and security concerns that are commonly associated with related personalization research.
Domain-Specific Web Search with Keyword Spices
- IEEE Transactions on knowledge and data engineering
, 2004
"... Domain-specific web search engines are effective tools for reducing the difficulty in acquiring information from the web. Existing methods for building domain-specific web search engines require human expertise or specific facilities. However, we can build a domain-specific search engine simply by a ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Domain-specific web search engines are effective tools for reducing the difficulty in acquiring information from the web. Existing methods for building domain-specific web search engines require human expertise or specific facilities. However, we can build a domain-specific search engine simply by adding domain specific keywords called "keyword spices" to the user's input query and forwarding it to a generalpurpose web search engine. Keyword spices can be effectively discovered from web documents using machine learning technologies. This paper will describe domain-specific web search engines that use keyword spices for locating cooking recipes, restaurants, and used cars. To fully automate the construction of domain-specific search engines, we also present trials of using web pages in an existing web directory as training examples.

