Results 1 -
7 of
7
Experiments with citation mining and key-term extraction for Prior Art Search
"... This technical note presents the system built for the IP track of CLEF 2010 based on PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS), the modular search infrastructure initially realized for CLEF IP 2009. We largely reused the system of the previous CLEF IP but at a relatively smaller ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This technical note presents the system built for the IP track of CLEF 2010 based on PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS), the modular search infrastructure initially realized for CLEF IP 2009. We largely reused the system of the previous CLEF IP but at a relatively smaller scale and with the improvement of three main components: • A new citation mining tool based on Conditional Random Fields (CRF). • A key-term extraction module developed for technical and scientific documents and adapted to patent document structures using a vast ranges of metrics, features and a bagged decision tree. • An improvement of our multi-domain terminological database called GRISP. We used the Okapi BM25 and the Indri retrieval models for the prior art task and a KNN model for the automatic classification task under the IPC subclasses. In both tasks, specific final re-ranking techniques were used, including multiple regression models based on SVM. Although the Prior Art task was more challenging and we used a more limited number of retrieval models, we maintained similar results as last year. We performed, however, miserably at the classification task, and we consider that an instance-based KNN algorithm is not competitive with standard classifiers based on preliminary large scale training.
Conceptual language models for domain-specific retrieval
- INFORMATION PROCESSING AND MANAGEMENT
"... ..."
Context-Sensitive Semantic Smoothing using Semantically Relatable Sequences
"... We propose a novel approach to context sensitive semantic smoothing by making use of an intermediate, ”semantically light ” representation for sentences, called Semantically Relatable Sequences (SRS). SRSs of a sentence are tuples of words appearing in the semantic graph of the sentence as linked no ..."
Abstract
- Add to MetaCart
We propose a novel approach to context sensitive semantic smoothing by making use of an intermediate, ”semantically light ” representation for sentences, called Semantically Relatable Sequences (SRS). SRSs of a sentence are tuples of words appearing in the semantic graph of the sentence as linked nodes depicting dependency relations. In contrast to patterns based on consecutive words, SRSs make use of groupings of non-consecutive but semantically related words. Our experiments on TREC AP89 collection show that the mixture model of SRS translation model and Two Stage Language Model (TSLM) of Lafferty and Zhai achieves MAP scores better than the mixture model of MultiWord Expression (MWE) translation model and TSLM. Furthermore, a system, which for each test query selects either the SRS or the MWE mixture model based on better query MAP score, shows significant improvements over the individual mixture models. 1
2008 Proc. Int’l Conf. on Dublin Core and Metadata Applications Theme Creation for Digital Collections
"... This paper presents an approach for integrating multiple sources of semantics for the creating metadata. A new framework is proposed to define topics and themes with both manually and automatically generated terms. The automatically generated terms include: terms from a semantic analysis of the coll ..."
Abstract
- Add to MetaCart
This paper presents an approach for integrating multiple sources of semantics for the creating metadata. A new framework is proposed to define topics and themes with both manually and automatically generated terms. The automatically generated terms include: terms from a semantic analysis of the collections and terms from previous user’s queries. An interface is developed to facilitate the creation and use of such topics and themes for metadata creation. The framework and the interface promote human-computer collaboration in metadata creation. Several principles underlying such approach are also discussed.
Author manuscript, published in "CLEF 2010- Conference on Multilingual and Multimodal Information Access Evaluation, Padua: Italy (2010)" Experiments with citation mining and key-term extraction for Prior Art Search
, 2010
"... This technical note presents the system built for the IP track of CLEF 2010 based on PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS), the modular searchinfrastructure initially realized for CLEF IP 2009. We largely reused the system of the previous CLEF IP but at a relatively smaller ..."
Abstract
- Add to MetaCart
This technical note presents the system built for the IP track of CLEF 2010 based on PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS), the modular searchinfrastructure initially realized for CLEF IP 2009. We largely reused the system of the previous CLEF IP but at a relatively smaller scale and with the improvement of three main components: • A new citation mining tool based on Conditional Random Fields (CRF). • A key-term extraction module developed for technical and scientific documents andadaptedtopatentdocumentstructuresusingavastrangesofmetrics, features and a bagged decision tree. • An improvement of our multi-domain terminological database called GRISP. We used the Okapi BM25 and the Indri retrieval models for the prior art task and a KNN model for the automatic classification task under the IPC subclasses. In both tasks, specific final re-ranking techniques were used, including multiple regression models based on SVM. Although the Prior Art task was more challenging and we used a more limited number of retrieval models, we maintained similar results as last year. We performed, however, miserably at the classification task, and we consider that an instance-based KNN algorithm is not competitive with standard classifiers based on preliminary large scale training.
Entity based Q&A retrieval
"... Bridging the lexical gap between the user’s question and the question-answer pairs in the Q&A archives has been a major challenge for Q&A retrieval. State-of-the-art approaches address this issue by implicitly expanding the queries with additional words using statistical translation models. While us ..."
Abstract
- Add to MetaCart
Bridging the lexical gap between the user’s question and the question-answer pairs in the Q&A archives has been a major challenge for Q&A retrieval. State-of-the-art approaches address this issue by implicitly expanding the queries with additional words using statistical translation models. While useful, the effectiveness of these models is highly dependant on the availability of quality corpus in the absence of which they are troubled by noise issues. Moreover these models perform word based expansion in a context agnostic manner resulting in translation that might be mixed and fairly general. This results in degraded retrieval performance. In this work we address the above issues by extending the lexical word based translation model to incorporate semantic concepts (entities). We explore strategies to learn the translation probabilities between words and the concepts using the Q&A archives and a popular entity catalog. Experiments conducted on a large scale real data show that the proposed techniques are promising. 1
A Dynamic Visualization Interface for Search Service
"... Visualization methods such as node-link trees and space-filling representations expose semantic relationships using spatial arguments to communicate information in ways that text cannot. In this paper, we describe a prototype system that visualizes semantic relationships of search results from the X ..."
Abstract
- Add to MetaCart
Visualization methods such as node-link trees and space-filling representations expose semantic relationships using spatial arguments to communicate information in ways that text cannot. In this paper, we describe a prototype system that visualizes semantic relationships of search results from the XML-based search service APIs of a large database. OSTI, the source selected for our proof-of-concept prototype, is a major government energy database offering broad coverage of alternative energy resource information including solar, wind, hydroelectric and geothermal topics, among others. The intent is to expand this prototype to facilitate document retrieval clustering around subject terms from any of several large databases with XML-enabled APIs.

