Results 1 -
8 of
8
Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition
, 2004
"... Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo---a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe meth ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo---a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe methods used in the algorithm: algebraic transformations of the term-document matrix and frequent phrase extraction using su#x arrays. Finally, we discuss results acquired from an empirical evaluation of the algorithm.
Text-mining based journal splitting
- Proceedings of ICDAR 2003
, 2003
"... table of contents, OCR, journal splitting, text mining, text chunking, document understanding This paper introduces a novel journal splitting algorithm. It takes full advantage of various kinds of information such as text match, layout and page numbers. The core procedure is a highly efficient text- ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
table of contents, OCR, journal splitting, text mining, text chunking, document understanding This paper introduces a novel journal splitting algorithm. It takes full advantage of various kinds of information such as text match, layout and page numbers. The core procedure is a highly efficient text-mining algorithm, which detects the matched phrases between the content pages and the title pages of individual articles. Experiments show that this algorithm is robust and able to split a wide range of journals, magazines and books.
Conceptual Clustering Using Lingo Algorithm: Evaluation on Open Directory Project Data
- In IIPWM04
, 2004
"... Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search hits list, returned from a search engine. In this paper we present the results of an experimental evaluation of a new algorithm named Lingo. We use Open Directory Project as a source of hi ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search hits list, returned from a search engine. In this paper we present the results of an experimental evaluation of a new algorithm named Lingo. We use Open Directory Project as a source of high-quality narrowtopic document references and mix them into several multi-topic test sets for the algorithm. We then compare the clusters acquired from Lingo to the expected set of ODP categories mixed in the input. Finally we discuss observations from the experiment, highlighting the algorithm's strengths and weaknesses and conclude with research directions for the future.
A topology-driven approach to the design of web meta-search clustering engines
- In Theory and Practice of Computer Science (SOFSEM ’05), volume 3381 of Lecture Notes in Computer Science
, 2005
"... The paradigm adopted by classical Web search engines to output the results of a query is often inadequate. It typically consists of a ranked list of URLs, which may be very long and difficult to browse for the interested user. Recently, a lot of attention has been devoted to the design of Web meta-s ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The paradigm adopted by classical Web search engines to output the results of a query is often inadequate. It typically consists of a ranked list of URLs, which may be very long and difficult to browse for the interested user. Recently, a lot of attention has been devoted to the design of Web meta-search clustering engines. These systems support the user by grouping the URLs returned by a search engine into distinct semantic categories, which are organized in a hierarchy; each category is properly labeled with a sentence that reflects its topics. However, even the most effective Web meta-search engines usually end-up by presenting many “meaningful ” categories together with a few “inexpressive ” categories on some specific queries. In this paper we describe a novel topology-driven approach to the design of a Web meta-search clustering engine. By this approach the set of URLs is modeled as a suitable graph and the hierarchy of categories is obtained by variants of classical graph-clustering algorithms. The topologydriven approach turns out to be comparable with traditional text-based strategies for the definition of the cluster hierarchy. In addition, our approach makes it natural to use graph visualization techniques to support the user in handling inexpressive labels. Namely, categories with inexpressive labels can be visually related to more meaningful ones. 1
Personalized profile based search interface with ranked and clustered display
- In 2001 International Conference on Intelligent Agents Web Technologies and Internet Commerce - IAWTIC’2001
, 2001
"... We have developed an experimental meta-search engine, which takes the snippets from traditional search engines and presents them to the user either in the form of clusters, indices or re-ranked list optionally based on the user’s profile. The system also allows the user to give positive or negative ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We have developed an experimental meta-search engine, which takes the snippets from traditional search engines and presents them to the user either in the form of clusters, indices or re-ranked list optionally based on the user’s profile. The system also allows the user to give positive or negative feedback on the documents, clusters and indices. The architecture allows different algorithms for each of the features to be plugged-in easily, i.e. various clustering, indexing and relevance feedback algorithms, and profiling methods. 1.
ExtMiner: Combining Multiple Ranking and Clustering Algorithms for Structured Document Retrieval
- In: Proceedings of International workshop on Integrating Data Mining, Databases and Information Retrieval (IDDI’05), 16th International Workshop on Database and Expert Systems Applications
, 2005
"... This paper introduces ExtMiner, a platform and potential tool for information management in SMEs (Small& Medium-size Enterprise), or for organizational workgroups. ExtMiner supports interactive and iterative clustering of documents. It provides users with a visual clusterand list views at the same t ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper introduces ExtMiner, a platform and potential tool for information management in SMEs (Small& Medium-size Enterprise), or for organizational workgroups. ExtMiner supports interactive and iterative clustering of documents. It provides users with a visual clusterand list views at the same time, supporting iterative search process. ExtMiner may also be applied as a platform for research on retrieval fusion, since it combines search, clustering and visualization algorithms. ExtMiner was evaluated with three document collections. Although the findings were encouraging the user interface and performance with large document repositories need further development. 1.
Inter-document similarity in web searches
, 2004
"... are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address. Orientador: Júri: ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address. Orientador: Júri:
Carrot 2 and Language Properties
- In Proceedings of AWIC-2003, First International Atlantic Web Intelligence Conference
, 2003
"... This paper relates to a technique of improving results visualization in Web search engines known as search results clustering. We introduce an open extensible research system for examination and devel- opment of search results clustering algorithms Carrot 2. We also discuss attempts to measuring ..."
Abstract
- Add to MetaCart
This paper relates to a technique of improving results visualization in Web search engines known as search results clustering. We introduce an open extensible research system for examination and devel- opment of search results clustering algorithms Carrot 2. We also discuss attempts to measuring quality of discovered clusters and demonstrate results of our experiments with quality assessment when inflectionally rich language (Polish) is clustered using a representative algorithm - Suffix Tree Clustering.

