• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Conditional Random Fields: An introduction (0)

by Hanna M Wallach
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 28
Next 10 →

BANNER: An executable survey of advances in biomedical named entity recognition

by Robert Leaman, Graciela Gonzalez - In Pac Symp Biocomput , 2008
"... There has been an increasing amount of research on biomedical named entity recognition, the most basic text extraction problem, resulting in significant progress by different research teams around the world. This has created a need for a freely-available, open source system implementing the advances ..."
Abstract - Cited by 21 (2 self) - Add to MetaCart
There has been an increasing amount of research on biomedical named entity recognition, the most basic text extraction problem, resulting in significant progress by different research teams around the world. This has created a need for a freely-available, open source system implementing the advances described in the literature. In this paper we present BANNER, an open-source, executable survey of advances in biomedical named entity recognition, intended to serve as a benchmark for the field. BANNER is implemented in Java as a machine-learning system based on conditional random fields and includes a wide survey of the best techniques recently described in the literature. It is designed to maximize domain independence by not employing brittle semantic features or rule-based processing steps, and achieves significantly better performance than existing baseline systems. It is therefore useful to developers as an extensible NER implementation, to researchers as a standard for comparing innovative techniques, and to biologists requiring the ability to find novel entities in large amounts of text. BANNER is available for download at

Machine Learning Based on Attribute Interactions

by Aleks Jakulin , 2005
"... ii ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
Abstract not found

Global Ranking by Exploiting User Clicks

by Shihao Ji, Gui-rong Xue, Ke Zhou, O. Chapelle, Gordon Sun, Ciya Liao, Zhaohui Zheng, Hongyuan Zha
"... It is now widely recognized that user interactions with search results can provide substantial relevance information on the documents displayed in the search results. In this paper, we focus on extracting relevance information from one source of user interactions, i.e., user click data, which record ..."
Abstract - Cited by 5 (3 self) - Add to MetaCart
It is now widely recognized that user interactions with search results can provide substantial relevance information on the documents displayed in the search results. In this paper, we focus on extracting relevance information from one source of user interactions, i.e., user click data, which records the sequence of documents being clicked and not clicked in the result set during a user search session. We formulate the problem as a global ranking problem, emphasizing the importance of the sequential nature of user clicks, with the goal to predict the relevance labels of all the documents in a search session. This is distinct from conventional learning to rank methods that usually design a ranking model defined on a single document; in contrast, in our model the relational information among the documents as manifested by an aggregation of user clicks is exploited to rank all the documents jointly. In particular, we adapt several sequential supervised learning algorithms, including the conditional random field (CRF), the sliding window method and the recurrent sliding window method, to the global ranking problem. Experiments on the click data collected from a commercial search engine demonstrate that our methods can outperform the baseline models for search results re-ranking.

A Language-Independent Transliteration Schema Using Character Aligned Models At NEWS 2009

by Praneeth Shishtla, Surya Ganesh V, Sethuramalingam Subramaniam, Vasudeva Varma
"... In this paper we present a statistical transliteration technique that is language independent. This technique uses statistical alignment models and Conditional Random Fields (CRF). Statistical alignment models maximizes the probability of the observed (source, target) word pairs using the expectatio ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
In this paper we present a statistical transliteration technique that is language independent. This technique uses statistical alignment models and Conditional Random Fields (CRF). Statistical alignment models maximizes the probability of the observed (source, target) word pairs using the expectation maximization algorithm and then the character level alignments are set to maximum posterior predictions of the model. CRF has efficient training and decoding processes which is conditioned on both source and target languages and produces globally optimal solution. 1

Assessing Map Quality using Conditional Random Fields

by Manjari Ch, Paul Newman
"... Summary. This paper is concerned with assessing the quality of work-space maps. While there has been much work in recent years on building maps of field settings, little attention has been given to endowing a machine with introspective competencies which would allow assessing the reliability/plausib ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Summary. This paper is concerned with assessing the quality of work-space maps. While there has been much work in recent years on building maps of field settings, little attention has been given to endowing a machine with introspective competencies which would allow assessing the reliability/plausibility of the representation. We classify regions in 3D point-cloud maps into two binary classes — “plausible ” or “suspicious”. In this paper we concentrate on the classification of urban maps and use a Conditional Random Fields to model the intrinsic qualities of planar patches and crucially, their relationship to each other. A bipartite labelling of the map is acquired via application of the Graph Cut algorithm. We present results using data gathered by a mobile robot equipped with a 3D laser range sensor while operating in a typical urban setting. 1

An Analysis of Logistic Models: Exponential Family Connections and Online Performance

by Arindam Banerjee - In SIAM International Conference on Data Mining , 2002
"... Logistic models are arguably one of the most widely used data analysis techniques. In this paper, we present analyses focussing on two important aspects of logistic models—its relationship with exponential family based generative models, and its performance in online and potentially adversarial sett ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Logistic models are arguably one of the most widely used data analysis techniques. In this paper, we present analyses focussing on two important aspects of logistic models—its relationship with exponential family based generative models, and its performance in online and potentially adversarial settings. In particular, we present two new theoretical results on logistic models focusing on the above two aspects. First, we establish an exact connection between logistic models and exponential family based generative models, resolving a long-standing ambiguity over their relationship. Second, we show that online Bayesian logistic models are competitive to the best batch models, even in potentially adversarial settings. Further, we discuss relevant connections of our analysis to the literature on integral transforms, and also present a new optimality result for Bayesian models. The analysis makes a strong case for using logistic models and partly explains the success of such models for a wide range of practical problems. 1

Towards a Unified Architecture for in-RDBMS Analytics

by Xixuan Feng, Arun Kumar, Benjamin Recht, Christopher Ré
"... The increasing use of statistical data analysis in enterprise applications has created an arms race among database vendors to offer ever more sophisticated in-database analytics. One challenge in this race is that each new statistical technique must be implemented from scratch in the RDBMS, which le ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
The increasing use of statistical data analysis in enterprise applications has created an arms race among database vendors to offer ever more sophisticated in-database analytics. One challenge in this race is that each new statistical technique must be implemented from scratch in the RDBMS, which leads to a lengthy and complex development process. We argue that the root cause for this overhead is the lack of a unified architecture for in-database analytics. Our main contribution in this work is to take a step towards such a unified architecture. A key benefit of our unified architecture is that performance optimizations for analytics techniques can be studied generically instead of an ad hoc, per-technique fashion. In particular, our technical contributions are theoretical and empirical studies of two key factors that we found impact performance: the order data is stored, and parallelization of computations on a single-node multicore RDBMS. We demonstrate the feasibility of our architecture by integrating several popular analytics techniques into two commercial and one open-source RDBMS. Our architecture requires changes to only a few dozen lines of code to integrate a new statistical technique. We then compare our approach with the native analytics tools offered by the commercial RDBMSes on various analytics tasks, and validate that our approach achieves competitive or higher performance, while still achieving the same quality.

Conditional random fields for transmembrane helix prediction

by Lior Lukov, Sanjay Chawla, W. Bret Church - In 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining , 2005
"... Abstract. It is estimated that 20 % of genes in the human genome code for integral membrane proteins(IMPs) and some estimates are much higher. IMPs control a broad range of events essential to the proper functioning of cells, tissues and organisms. IMPs include the most common targets of clinically ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Abstract. It is estimated that 20 % of genes in the human genome code for integral membrane proteins(IMPs) and some estimates are much higher. IMPs control a broad range of events essential to the proper functioning of cells, tissues and organisms. IMPs include the most common targets of clinically useful drugs, such as the G protein coupled receptors (GPCR), the target for more than 50 % of prescription drugs [1]. However there is a dearth of high-resolution 3D structural information on the IMPs. The number of the IMPs depositions in the major structural holding, the Protein Data Bank is less than 0.4 % of the collection [2]. Therefore good prediction methods of IMPs structures are to be highly valued. In this paper we apply Conditional Random Fields (CRFs) to build a probabilistic model to segment and label sequence data to solve the membrane protein helix prediction problem. The advantage of a CRFs is that it allows seamless integration of biological domain knowledge into the model. Our results show that the CRF model outperforms other well known helix prediction approaches on several important measures. 1

Extracting Relevant Named Entities for Automated Expense Reimbursement

by Guangyu Zhu
"... Expense reimbursement is a time-consuming and labor-intensive process across organizations. In this paper, we present a prototype expense reimbursement system that dramatically reduces the elapsed time and costs involved, by eliminating paper from the process life cycle. Our complete solution involv ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Expense reimbursement is a time-consuming and labor-intensive process across organizations. In this paper, we present a prototype expense reimbursement system that dramatically reduces the elapsed time and costs involved, by eliminating paper from the process life cycle. Our complete solution involves (1) an electronic submission infrastructure that provides multi-channel image capture, secure transport and centralized storage of paper documents; (2) an unconstrained data mining approach to extracting relevant named entities from un-structured document images; (3) automation of auditing procedures that enables automatic expense validation with minimum human interaction. Extracting relevant named entities robustly from document images with unconstrained layouts and diverse formatting is a fundamental technical challenge to image-based data mining, question answering, and other information retrieval tasks. In many applications that require such capability, applying traditional language modeling techniques to the stream of OCR text does not give satisfactory result due to the absence of linguistic context. We present an approach for extracting relevant named entities from document images by combining rich page layout features in the image space with language content in the OCR text using a discriminative conditional random field (CRF) framework. We integrate this named entity extraction engine into our expense reimbursement solution and evaluate the system performance on large collections of real-world receipt images

Classical Probabilistic Models and Conditional Random Fields

by Roman Klinger, Katrin Tomanek, Roman Klinger , 2007
"... ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract not found
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University