Results 1 - 10
of
47
Identifying comparative sentences in text documents
- In Proc. of the 29th SIGIR
, 2006
"... This paper studies the problem of identifying comparative sentences in text documents. The problem is related to but quite different from sentiment/opinion sentence identification or classification. Sentiment classification studies the problem of classifying a document or a sentence based on the sub ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
This paper studies the problem of identifying comparative sentences in text documents. The problem is related to but quite different from sentiment/opinion sentence identification or classification. Sentiment classification studies the problem of classifying a document or a sentence based on the subjective opinion of the author. An important application area of sentiment/opinion identification is business intelligence as a product manufacturer always wants to know consumers ’ opinions on its products. Comparisons on the other hand can be subjective or objective. Furthermore, a comparison is not concerned with an object in isolation. Instead, it compares the object with others. An example opinion sentence is “the sound quality of CD player X is poor”. An example comparative sentence is “the sound quality of CD player X is not as good as that of CD player Y”. Clearly, these two sentences give different information. Their language constructs are quite different too. Identifying comparative sentences is also useful in practice because direct comparisons are perhaps one of the most convincing ways of evaluation, which may even be more important than opinions on each individual object. This paper proposes to study the comparative sentence identification problem. It first categorizes comparative sentences into different types, and then presents a novel integrated pattern discovery and supervised learning approach to identifying comparative sentences from text documents. Experiment results using three types of documents, news articles, consumer reviews of products, and Internet forum postings, show a precision of 79% and recall of 81%. More detailed results are given in the paper.
Discriminative subsequence mining for action classification
- IN: 11TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, LOS ALAMITOS
, 2007
"... Recent approaches to action classification in videos have used sparse spatio-temporal words encoding local appearance around interesting movements. Most of these approaches use a histogram representation, discarding the temporal order among features. But this ordering information can contain importa ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Recent approaches to action classification in videos have used sparse spatio-temporal words encoding local appearance around interesting movements. Most of these approaches use a histogram representation, discarding the temporal order among features. But this ordering information can contain important information about the action itself, e.g. consider the sport disciplines of hurdle race and long jump, where the global temporal order of motions (running, jumping) is important to discriminate between the two. In this work we propose to use a sequential representation which retains this temporal order. Further, we introduce Discriminative Subsequence Mining to find optimal discriminative subsequence patterns. In combination with the LPBoost classifier, this amounts to simultaneously learning a classification function and performing feature selection in the space of all possible feature sequences. The resulting classifier linearly combines a small number of interpretable decision functions, each checking for the presence of a single discriminative pattern. The classifier is benchmarked on the KTH action classification data set and outperforms the best known results in the literature.
Mining minimal distinguishing subsequence patterns with gap constraints
- In ICDM
, 2005
"... Discovering contrasts between collections of data is an important task in data mining. In this paper, we introduce a new type of contrast pattern, called a Minimal Distinguishing Subsequence (MDS). An MDS is a minimal subsequence that occurs frequently in one class of sequences and infrequently in s ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
Discovering contrasts between collections of data is an important task in data mining. In this paper, we introduce a new type of contrast pattern, called a Minimal Distinguishing Subsequence (MDS). An MDS is a minimal subsequence that occurs frequently in one class of sequences and infrequently in sequences of another class. It is a natural way of representing strong and succinct contrast information between two sequential datasets and can be useful in applications such as protein comparison, document comparison and building sequential classification models. Mining MDS patterns is a challenging task and is significantly different from mining contrasts between relational/transactional data. One particularly important type of constraint that can be integrated into the mining process is the maximum gap constraint. We present an efficient algorithm called ConSGapMiner, to mine all MDSs according to a maximum gap constraint. It employs highly efficient bitset and boolean operations, for powerful gap based pruning within a prefix growth framework. A performance evaluation with both sparse and dense datasets, demonstrates the scalability of ConSGapMiner and shows its ability to mine patterns from high dimensional datasets at low supports. 1.
Traffic Density-Based Discovery of Hot Routes in Road Networks
, 2007
"... Finding hot routes (traffic flow patterns) in a road network is an important problem. They are beneficial to city planners, police departments, real estate developers, and many others. Knowing the hot routes allows the city to better direct traffic or analyze congestion causes. In the past, this p ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Finding hot routes (traffic flow patterns) in a road network is an important problem. They are beneficial to city planners, police departments, real estate developers, and many others. Knowing the hot routes allows the city to better direct traffic or analyze congestion causes. In the past, this problem has largely been addressed with domain knowledge of city. But in recent years, detailed information about vehicles in the road network have become available. With the development and adoption of RFID and other location sensors, an enormous amount of moving object trajectories are being collected and can be used towards finding hot routes. This is a challenging problem due to the complex nature of the data. If objects traveled in organized clusters, it would be straightforward to use a clustering algorithm to find the hot routes. But, in the real world, objects move in unpredictable ways. Variations in speed, time, route, and other factors cause them to travel in rather fleeting “clusters. ” These properties make the problem difficult for a naive approach. To this end, we propose a new density-based algorithm named FlowScan. Instead of clustering the moving objects, road segments are clustered based on the density of common traffic they share. We implemented FlowScan and tested it under various conditions. Our experiments show that the system is both efficient and effective at discovering hot routes.
Measuring the Similarity between Implicit Semantic Relations from the Web
- WWW 2009 MADRID! TRACK: SEMANTIC/DATA WEB / SESSION: MINING FOR SEMANTICS
, 2009
"... Measuring the similarity between semantic relations that hold among entities is an important and necessary step in various Web related tasks such as relation extraction, information retrieval and analogy detection. For example, consider the case in which a person knows a pair of entities (e.g. Googl ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
Measuring the similarity between semantic relations that hold among entities is an important and necessary step in various Web related tasks such as relation extraction, information retrieval and analogy detection. For example, consider the case in which a person knows a pair of entities (e.g. Google, YouTube), between which a particular relation holds (e.g. acquisition). The person is interested in retrieving other such pairs with similar relations (e.g. Microsoft, Powerset). Existing keyword-based search engines cannot be applied directly in this case because, in keyword-based search, the goal is to retrieve documents that are relevant to the words used in a query – not necessarily to the relations implied by a pair of words. We propose a relational similarity measure, using a Web search engine, to compute the similarity between semantic relations implied by two pairs of words. Our method has three components: representing
Verifying and mining frequent patterns from large windows over data streams
, 2007
"... Abstract — Mining frequent itemsets from data streams has proved to be very difficult because of computational complexity and the need for real-time response. In this paper, we introduce a novel verification algorithm which we then use to improve the performance of monitoring and mining tasks for as ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Abstract — Mining frequent itemsets from data streams has proved to be very difficult because of computational complexity and the need for real-time response. In this paper, we introduce a novel verification algorithm which we then use to improve the performance of monitoring and mining tasks for association rules. Thus, we propose a frequent itemset mining method for sliding windows, which is faster than the state-of-the-art methods—in fact, its running time that is nearly constant with respect to the window size entails the mining of much larger windows than it was possible before. The performance of other frequent itemset mining methods (including those on static data) can be improved likewise, by replacing their counting methods (e.g., those using hash trees) by our verification algorithm. I.
Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web
, 2010
"... Extracting semantic relations among entities is an important first step in various tasks in Web mining and natural language processing such as information extraction, relation detection, and social network mining. A relation can be expressed extensionally by stating all the instances of that relatio ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Extracting semantic relations among entities is an important first step in various tasks in Web mining and natural language processing such as information extraction, relation detection, and social network mining. A relation can be expressed extensionally by stating all the instances of that relation or intensionally by defining all the paraphrases of that relation. For example, consider the AC-QUISITION relation between two companies. An extensional definition of ACQUISITION contains all pairs of companies in which one company is acquired by another (e.g. (YouTube, Google) or (Powerset, Microsoft)). On the other hand we can intensionally define ACQUISITION as the relation described by lexical patterns such as X is acquired by Y, or Y purchased X, where X and Y denote two companies. We use this dual representation of semantic relations to propose a novel sequential co-clustering algorithm that can
Frequent Closed Sequence Mining without Candidate Maintenance
, 2007
"... Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only a more compact yet complete result set but also better efficiency. However, most of the previously develo ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only a more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-andtest paradigm, which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. In this paper, we present BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. It adopts a novel sequence closure checking scheme called BI-Directional Extension and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method. A thorough performance study with both sparse and dense, real, and synthetic data sets has demonstrated that BIDE significantly outperforms the previous algorithm: It consumes an order(s) of magnitude less memory and can be more than an order of magnitude faster. It is also linearly scalable in terms of database size.
Exploiting syntactic and semantic information for relation extraction from wikipedia
- In IJCAI07-TextLinkWS
, 2007
"... Abstract. The exponential growth of Wikipedia recently attracts the attention of a large number of researchers and practitioners. However, one of the current challenges on Wikipedia is to make the encyclopedia processable for machines. In this paper, we deal with the problem of extracting relations ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. The exponential growth of Wikipedia recently attracts the attention of a large number of researchers and practitioners. However, one of the current challenges on Wikipedia is to make the encyclopedia processable for machines. In this paper, we deal with the problem of extracting relations between entities from Wikipedia’s English articles, which can straightforwardly be transformed into Semantic Web meta data. We propose a novel method to exploit syntactic and semantic information for relation extraction. We mine frequent subsequences from the path between an entity pair in the syntactic and semantic structure in order to explore key patterns reflecting the relationship between the pair. In addition, our method can utilize the nature of Wikipedia to automatically obtain training data. The preliminary results of our experiments strongly support our hyperthesis that analyzing language in higher level is better for relation extraction on Wikipedia and show that our method is promising for text understanding. 1
A Framework for Mining Sequential Patterns from Spatio-Temporal Event Data Sets
"... Abstract—Given a large spatio-temporal database of events, where each event consists of the fields event ID, time, location, and event type, mining spatio-temporal sequential patterns identifies significant event-type sequences. Such spatio-temporal sequential patterns are crucial to the investigati ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Abstract—Given a large spatio-temporal database of events, where each event consists of the fields event ID, time, location, and event type, mining spatio-temporal sequential patterns identifies significant event-type sequences. Such spatio-temporal sequential patterns are crucial to the investigation of spatial and temporal evolutions of phenomena in many application domains. Recent research literature has explored the sequential patterns on transaction data and trajectory analysis on moving objects. However, these methods cannot be directly applied to mining sequential patterns from a large number of spatio-temporal events. Two major research challenges still remain: 1) the definition of significance measures for spatio-temporal sequential patterns to avoid spurious ones and 2) the algorithmic design under the significance measures, which may not guarantee the downward closure property. In this paper, we propose a sequence index as the significance measure for spatio-temporal sequential patterns, which is meaningful due to its interpretability using spatial statistics. We propose a novel algorithm called Slicing-STS-Miner to tackle the algorithmic design challenge using the spatial sequence index, which does not preserve the downward closure property. We compare the proposed algorithm with a simple algorithm called STS-Miner that utilizes the weak monotone property of the sequence index. Performance evaluations using both synthetic and real-world data sets show that the Slicing-STS-Miner is an order of magnitude faster than STS-Miner for large data sets. Index Terms—Spatio-temporal sequential pattern, density ratio, sequence index, slicing, performance. Ç 1

