Results 1 - 10
of
23
Video Search Reranking via Information Bottleneck Principle
- In ACM Multimedia
, 2006
"... We propose a novel and generic video/image reranking algorithm, IB reranking, which reorders results from text-only searches by discovering the salient visual patterns of relevant and irrelevant shots from the approximate relevance provided by text results. The IB reranking method, based on a rigoro ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
We propose a novel and generic video/image reranking algorithm, IB reranking, which reorders results from text-only searches by discovering the salient visual patterns of relevant and irrelevant shots from the approximate relevance provided by text results. The IB reranking method, based on a rigorous Information Bottleneck (IB) principle, finds the optimal clustering of images that preserves the maximal mutual information between the search relevance and the high-dimensional low-level visual features of the images in the text search results. Evaluating the approach on the TRECVID 2003-2005 data sets shows significant improvement upon the text search baseline, with relative increases in average performance of up to 23%. The method requires no image search examples from the user, but is competitive with other state-of-the-art example-based approaches. The method is also highly generic and performs comparably with sophisticated models which are highly tuned for specific classes of queries, such as named-persons. Our experimental analysis has also confirmed the proposed reranking method works well when there exist sufficient recurrent visual patterns in the search results, as often the case in multi-source news videos.
BAutomatic discovery of query-classdependent models for multimodal search
- in Proc. 13th Annu. ACM Int. Conf. Multimedia, 2005
"... We develop a framework for the automatic discovery of query classes for query-class-dependent search models in multimodal retrieval. The framework automatically discovers useful query classes by clustering queries in a training set according to the performance of various unimodal search methods, yie ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
We develop a framework for the automatic discovery of query classes for query-class-dependent search models in multimodal retrieval. The framework automatically discovers useful query classes by clustering queries in a training set according to the performance of various unimodal search methods, yielding classes of queries which have similar fusion strategies for the combination of unimodal components for multimodal search. We further combine these performance features with the semantic features of the queries during clustering in order to make discovered classes meaningful. The inclusion of the semantic space also makes it possible to choose the correct class for new, unseen queries, which have unknown performance space features. We evaluate the system against the TRECVID 2004 automatic video search task and find that the automatically discovered query classes give an improvement of 18% in MAP over hand-defined query classes used in previous works. We also find that some hand-defined query classes, such as “Named Person ” and “Sports ” do, indeed, have similarities in search method performance and are useful for query-class-dependent multimodal search, while other hand-defined classes, such as “Named Object” and “General Object ” do not have consistent search method performance and should be split apart or replaced with other classes. The proposed framework is general and can be applied to any new domain without expert domain knowledge.
Diagnosing meaning errors in short answers to reading comprehension questions
- Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications, held at ACL 2008. Columbus, Ohio: Associa12 for Computational Linguistics
, 2008
"... A common focus of systems in Intelligent Computer-Assisted Language Learning (ICALL) is to provide immediate feedback to language learners working on exercises. Most of this research has focused on providing feedback on the form of the learner input. Foreign language practice and second language acq ..."
Abstract
-
Cited by 14 (12 self)
- Add to MetaCart
A common focus of systems in Intelligent Computer-Assisted Language Learning (ICALL) is to provide immediate feedback to language learners working on exercises. Most of this research has focused on providing feedback on the form of the learner input. Foreign language practice and second language acquisition research, on the other hand, emphasizes the importance of exercises that require the learner to manipulate meaning. The ability of an ICALL system to diagnose and provide feedback on the meaning conveyed by a learner response depends on how well it can deal with the response variation allowed by an activity. We focus on short-answer reading comprehension questions which have a clearly defined target response but the learner may convey the meaning of the target in multiple ways. As empirical basis of our work, we collected an English as a Second Language (ESL) learner corpus of short-answer reading comprehension questions, for which two graders provided target answers and correctness judgments. On this basis, we developed a Content-Assessment Module (CAM), which performs shallow semantic analysis to diagnose meaning errors. It reaches an accuracy of 88 % for semantic error detection and 87 % on semantic error diagnosis on a held-out test data set. 1
Automated Duplicate Detection for Bug Tracking Systems
"... Bug tracking systems are important tools that guide the maintenance activities of software developers. The utility of these systems is hampered by an excessive number of duplicate bug reports–in some projects as many as a quarter of all reports are duplicates. Developers must manually identify dupli ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Bug tracking systems are important tools that guide the maintenance activities of software developers. The utility of these systems is hampered by an excessive number of duplicate bug reports–in some projects as many as a quarter of all reports are duplicates. Developers must manually identify duplicate bug reports, but this identification process is time-consuming and exacerbates the already high cost of software maintenance. We propose a system that automatically classifies duplicate bug reports as they arrive to save developer time. This system uses surface features, textual semantics, and graph clustering to predict duplicate status. Using a dataset of 29,000 bug reports from the Mozilla project, we perform experiments that include a simulation of a real-time bug reporting environment. Our system is able to reduce development cost by filtering out 8 % of duplicate bug reports while allowing at least one report for each real defect to reach developers. 1.
Application of a probability-based algorithm to extraction of product features from online reviews
, 2006
"... Prior research has demonstrated the viability of automatically extracting product features from online reviews. This paper presents a probability-based algorithm and compares it to an existing support-based approach. Specifically, I used each algorithm to extract features from 7 Amazon.com product c ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Prior research has demonstrated the viability of automatically extracting product features from online reviews. This paper presents a probability-based algorithm and compares it to an existing support-based approach. Specifically, I used each algorithm to extract features from 7 Amazon.com product categories and then asked end users to rate the features in terms of helpfulness for choosing products. The end users preferred the features identified by the probability-based algorithm. This probability-based algorithm can identify features that comprise a single noun or two successive nouns (which end users rated as more helpful than features comprising only one noun), yet even for collections of tens of thousands of reviews, it still executes fast enough (at around 1ms per review) for practical use. Over one dozen colleagues helped pre-test early versions of the survey. Norman Sadeh and George Duncan provided valuable input concerning the study’s design and analysis. This work has been funded in part by the EUSES Consortium via the National Science Foundation (ITR-0325273) and by the National Science Foundation under Grant CCF-0438929. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the sponsors.
Can common sense uncover cultural differences in computer applications
- In Proc. IFIP WCC2006, Spring-Verlag
, 2006
"... Abstract. Cultural differences play a very important role in matching computer interfaces to the expectations of users from different national and cultural backgrounds. But to date, there has been little systematic research as to the extent of such differences, and how to produce software that autom ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. Cultural differences play a very important role in matching computer interfaces to the expectations of users from different national and cultural backgrounds. But to date, there has been little systematic research as to the extent of such differences, and how to produce software that automatically takes into account these differences. We are studying these issues using a unique resource: Common Sense knowledge bases in different languages. Our research points out that this kind of knowledge can help computer systems to consider cultural differences. We describe our experiences with knowledge bases containing thousands of sentences describing people and everyday activities, collected from volunteer Web contributors in three different cultures: Brazil, Mexico and the USA, and software which automatically searches for cultural differences amongst the three cultures, alerting the user to potential differences. 1.
Bootstrapping ontology learning for information retrieval using formal concept analysis and information anchors
- 14TH INTERNATIONAL CONFERENCE ON CONCEPTUAL STRUCTURES
"... We present an innovative approach to information retrieval for domain-specific digital library collections. We use a combination of Formal Concept Analysis (FCA) and a notion of information anchors to facilitate information delivery to the end user. This approach (1) uses ranked objects in attribut ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present an innovative approach to information retrieval for domain-specific digital library collections. We use a combination of Formal Concept Analysis (FCA) and a notion of information anchors to facilitate information delivery to the end user. This approach (1) uses ranked objects in attribute concepts to facilitate topical queries for experts and expertise profiles; (2) formulates (keyword by keyword) context for concept lattice construction via a set of heuristics, including those based on information anchors for selecting descriptive phrases, (3) bootstraps the learning of domain-specific concept hierarchies using FCA, and (4) incorporates the learnt concept hierarchies and WordNet for content-based document classification. To demonstrate the feasibility and utility of this approach, we implemented a prototype online information retrieval systemmemsworldonline.case.edu (MWOL) for the emerging engineering discipline of MEMS (microelectromechanical systems) incorporating these ideas. MWOL has been actively used by a non-trivial group of MEMS practitioners; all user queries are processed in a fraction of a second as a result of inverse indexing strategy using Berkeley DB. Voluntary user feedback using online forms has been encouraging. However, no other systems with similar features are available for a comparative study at this point.
Tell me more, not just ”more of the same
- In IUI ’10: Proceeding of the 14th international conference on Intelligent user interfaces, 81–90
, 2010
"... The Web makes it possible for news readers to learn more about virtually any story that interests them. Media outlets and search engines typically augment their information with links to similar stories. It is up to the user to determine what new information is added by them, if any. In this paper w ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The Web makes it possible for news readers to learn more about virtually any story that interests them. Media outlets and search engines typically augment their information with links to similar stories. It is up to the user to determine what new information is added by them, if any. In this paper we present Tell Me More, a system that performs this task automatically: given a seed news story, it mines the web for similar stories reported by different sources and selects snippets of text from those stories which offer new information beyond the seed story. New content may be classified as supplying: additional quotes, additional actors, additional figures and additional information depending on the criteria used to select it. In this paper we describe how the system identifies new and informative content with respect to a news story. We also show that providing an explicit categorization of new information is more useful than a binary classification (new/not-new). Lastly, we show encouraging results from a preliminary evaluation of the system that validates our approach and encourages further study.
JETCAT – Japanese-English Translation Using Corpus-Based Acquisition of Transfer Rules
"... Abstract — In this paper we present a rule-based formalism for the acquisition, representation, and application of the transfer knowledge used in a Japanese-English machine translation system. The transfer knowledge is learnt automatically from a parallel corpus by using structural matching between ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract — In this paper we present a rule-based formalism for the acquisition, representation, and application of the transfer knowledge used in a Japanese-English machine translation system. The transfer knowledge is learnt automatically from a parallel corpus by using structural matching between the parse trees of translation pairs. The user can customize the rule base by simply correcting translation results. We have extended the machine translation system with two user-friendly front ends: an MS Word interface and a Web interface. Since our system is mainly intended as a tool for language students to convey a better understanding of Japanese, we also offer the display of detailed information about lexical, syntactic, and transfer knowledge. The system has been implemented in Amzi! Prolog, using the Amzi! Logic Server Visual Basic Module and the Amzi! Logic Server CGI Interface to develop the front ends. Index Terms — natural language processing, machine translation, linguistic knowledge acquisition, parallel corpora, logic programming I.
WETCAT -- Web-Enabled Translation Using Corpus-Based Acquisition of Transfer Rules
, 2006
"... In this paper we present a Web interface to a Japanese-English rule-based machine translation system. One main feature of our translation system is that the transfer rules have not been designed by hand but are learnt automatically from a parallel corpus. The user can customize the rule base by simp ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper we present a Web interface to a Japanese-English rule-based machine translation system. One main feature of our translation system is that the transfer rules have not been designed by hand but are learnt automatically from a parallel corpus. The user can customize the rule base by simply correcting translation results. In addition, it is possible to display token lists, parse trees, and transfer rules, which makes our system also a very useful tool for language students. The system has been implemented in Amzi! Prolog, using the Amzi! Logic Server CGI Interface to develop the Web application.

