Results 1 -
9 of
9
Bringing order to the web: Automatically categorizing search results
, 2000
"... hchen @ sims.berkeley.edu We developed a user interface that organizes Web search results into hierarchical categories. Text classification algorithms were used to automatically classify arbitrary search results into an existing category structure on-the-fly. A user study compared our new category i ..."
Abstract
-
Cited by 109 (2 self)
- Add to MetaCart
hchen @ sims.berkeley.edu We developed a user interface that organizes Web search results into hierarchical categories. Text classification algorithms were used to automatically classify arbitrary search results into an existing category structure on-the-fly. A user study compared our new category interface with the typical ranked list interface of search results. The study showed that the category interface is superior both in objective and subjective measures. Subjects liked the category interface much better than the list interface, and they were 50 % faster at finding information that was organized into categories. Organizing search results allows users to focus on items in categories of interest rather than having to browse through all the results sequentially.
Web Page Classification: Features and Algorithms
, 2007
"... Classification of web page content is essential to many tasks in web information retrieval such as maintaining web directories and focused crawling. The uncontrolled nature of web content presents additional challenges to web page classification as compared to traditional text classification, but th ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Classification of web page content is essential to many tasks in web information retrieval such as maintaining web directories and focused crawling. The uncontrolled nature of web content presents additional challenges to web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process. As we review work in web page classification, we note the importance of these web-specific features and algorithms, describe state-of-the-art practices, and track the underlying assumptions behind the use of information from neighboring pages. 1
Query enrichment for web-query classification
- ACMTOIS
, 2006
"... Web search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this paper, we present a new technique called query enrichment, which takes a short query and maps it to the intermediate objects. Based on the coll ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Web search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this paper, we present a new technique called query enrichment, which takes a short query and maps it to the intermediate objects. Based on the collected intermediate objects, the query is then mapped to the target categories. To build the necessary mapping functions, we use an ensemble of search engines to produce an enrichment of the queries. Our technique was applied to ACM Knowledge-discovery and data mining competition (ACM KDDCUP) in 2005, where we won the championship on all three evaluation metrics (precision, F1 measure, which combines precision and recall together, and creativity, which is judged by the organizers) among a total of 33 teams worldwide. In this paper, we show that, despite the difficulty in an abundance of ambiguous queries and a lack of training data, our query-enrichment technique can solve the problem satisfactorily through a two-phase classification framework. We present a detailed description of our algorithm and experimental evaluation. Our best result of F1 and precision are 42.4 % and 44.4%, respectively, which are 9.6%
Q2c@ust: our winning solution to query classification in kddcup 2005
- SIGKDD Explor. Newsl
, 2005
"... In this paper, we describe our ensemble-search based approach, ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
In this paper, we describe our ensemble-search based approach,
Extending Ontology Tree Using NLP Technique
"... This paper proposes a method of creating a web document representation using a web ontology concepts instead of `bag-ofwords '. However, since the web domain has a very small vocabulary, we are unable to transform all or most of the keywords of the web document into web ontology concepts. This parti ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper proposes a method of creating a web document representation using a web ontology concepts instead of `bag-ofwords '. However, since the web domain has a very small vocabulary, we are unable to transform all or most of the keywords of the web document into web ontology concepts. This particular problem is solved by creating an extended part of the web ontology with words obtained from an external linguistics knowledgebase. The promising outcome as the result of Natural Language Processing (NLP) and Information Retrieval (IR) fields being merged together convinces us to create the extended ontology using NLP technique.
directory construction using lexical chains
- In Proceedings of the 10 th NLDB Conference 2005
, 2005
"... Abstract. Web Directories provide a way of locating relevant information on the Web. Typically, Web Directories rely on humans putting in significant time and effort into finding important pages on the Web and categorizing them in the Directory. In this paper we present a way for automating the crea ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. Web Directories provide a way of locating relevant information on the Web. Typically, Web Directories rely on humans putting in significant time and effort into finding important pages on the Web and categorizing them in the Directory. In this paper we present a way for automating the creation of a Web Directory. At a high level, our method takes as input a subject hierarchy and a collection of pages. We first leverage a variety of lexical resources from the Natural Language Processing community to enrich our hierarchy. After that, we process the pages and identify sequences of important terms, which are referred to as lexical chains. Finally, we use the lexical chains in order to decide where in the enriched subject hierarchy we should assign every page. Our experimental results with real Web data show that our method is quite promising into assisting humans during page categorization. 1
2003c) “A Security Incident Sharing and Classification System for Building Trust
- in Cross Media Enterprises,” presented at International Conference on Cross-Media Service Delivery (CMSD-2003
, 2003
"... Abstract: Trust in cross-media applications is essential to successful collaboration. Cross media service delivery encompasses different types of security incidents and assumes a level of trust on the part of the participants of any one transaction. As enterprises and participants of cross media tra ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract: Trust in cross-media applications is essential to successful collaboration. Cross media service delivery encompasses different types of security incidents and assumes a level of trust on the part of the participants of any one transaction. As enterprises and participants of cross media transactions become more susceptible to security risks facilitated by the heterogeneity of data being exchanged, it is important to develop protective infrastructures. Such infrastructures should enable reporting of security violations or misconduct on a regular basis with effortless incident submission, automatic classification of reported incidents, searching and collective knowledge extraction from similar incidents and sharing of information by authorized users. We report on such a system currently being developed. The Security Incident Sharing and Classification system (SISC), collects incidents in a database, though its incident submission interface, and classifies them according to different parameters. We demonstrate an automatic classification scheme based on the level of incident severity, where severe incidents are processed faster. The system builds trust through its monitoring and recommendation capabilities, thus preparing enterprises to encounter new security incidents that may arise. This is an open, customizable, self-standing risk monitoring system which can be built into any enterprise. The recommendation component of SISC extracts solution scenarios from the gathered knowledge of classified incidents and makes them available to SISC users.
Using Text Analysis to Inform Clients of the Subject of a Document
, 2003
"... Contemporary informa tion databases contain many millions of electronic documents. Locating information on the Internet today is problematic, due to the enormous number of documents it contains. Several other studies have found that associating documents with a subject or list of topics can improve ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Contemporary informa tion databases contain many millions of electronic documents. Locating information on the Internet today is problematic, due to the enormous number of documents it contains. Several other studies have found that associating documents with a subject or list of topics can improve locatability of information on the Internet (Drori, 2000a 2000b 2000c). Effective cataloguing of information is performed manually, requiring extensive resources. Consequently, most information is currently not catalogued. This paper aims to present a software tool that automatically locates the subject of a document and to show the results of a test performed, using the software tool, TextAnalysis, specially developed for this purpose. The main purpose of this study is to inform clients of the subject of the corpus of texts it obtains from search engines as a search results list.
Automatic Topic Identification
- In the Proc. of the 2 nd International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2001
, 2001
"... This paper proposes a method of using ontology hierarchy in automatic topic identification. The fundamental idea behind this work is to exploit an ontology hierarchical structure in order to find a topic of a text. The keywords which are extracted from a given text will be mapped onto their corr ..."
Abstract
- Add to MetaCart
This paper proposes a method of using ontology hierarchy in automatic topic identification. The fundamental idea behind this work is to exploit an ontology hierarchical structure in order to find a topic of a text. The keywords which are extracted from a given text will be mapped onto their corresponding concepts in the ontology. By optimizing the corresponding concepts, we will pick a single node among the concepts nodes which we believe is the topic of the target text. However, a limited vocabulary problem is encountered while mapping the keywords onto their corresponding concepts. This situation forces us to extend the ontology by enriching each of its concepts with new concepts using the external linguistics knowledge-base (WordNet). Our intuition of a high number keywords mapped onto the ontology concepts is that our topic identification technique can perform at its best.

