Results 1 -
5 of
5
Stochastic grammatical inference with Multinomial Tests
, 2002
"... We present a new statistical framework for stochastic grammatical inference algorithms based on a state merging strategy. We propose to use multinomial statistical tests to decide which states should be merged. This approach has three main advantages. First, since it is not based on asymptotic resul ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We present a new statistical framework for stochastic grammatical inference algorithms based on a state merging strategy. We propose to use multinomial statistical tests to decide which states should be merged. This approach has three main advantages. First, since it is not based on asymptotic results, small sample case can be specifically dealt with. Second, all the probabilities associated to a state are included in a single test so that statistical evidence is cumulated. Third, a statistical score is associated to each possible merging operation and can be used for best-first strategy. Improvement over classical stochastic grammatical inference algorithm is shown on artificial data.
FlowCube: Constructing RFID FlowCubes for Multi-Dimensional Analysis of Commodity Flows
- In: VLDB 2006
, 2006
"... With the advent of RFID (Radio Frequency Identification) technology, manufacturers, distributors, and retailers will be able to track the movement of individual objects throughout the supply chain. The volume of data generated by a typical RFID application will be enormous as each item will generate ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
With the advent of RFID (Radio Frequency Identification) technology, manufacturers, distributors, and retailers will be able to track the movement of individual objects throughout the supply chain. The volume of data generated by a typical RFID application will be enormous as each item will generate a complete history of all the individual locations that it occupied at every point in time, possibly from a specific production line at a given factory, passing through multiple warehouses, and all the way to a particular checkout counter in a store. The movement trails of such RFID data form gigantic commodity flowgraph representing the locations and durations of the path stages traversed by each item. This commodity flow contains rich multi-dimensional information on the characteristics, trends, changes and outliers of commodity movements. In this paper, we propose a method to construct a warehouse of commodity flows, called flowcube. As in standard OLAP, the model will be composed of cuboids that aggregate item flows at a given abstraction level. The flowcube differs from the traditional data cube in two major ways. First, the measure of each cell will not be a scalar aggregate but a commodity flowgraph that captures the major movement trends and significant deviations of the items aggregated in the cell. Second, each flowgraph itself can be viewed at multiple levels by changing the level of abstraction of path stages. In this paper, we motivate the importance of the model, and present an efficient method to compute it by (1) performing simultaneous aggregation of paths to all interesting abstraction levels, (2) pruning low support path segments along the item and path stage abstraction lattices, and (3) compressing the cube by removing rarely occurring cells, and cells whose commodity flows can be inferred from higher level cells.
Learning Hidden Markov Models to Fit Long-Term Dependencies
, 2005
"... this report a novel approach to the induction of the structure of Hidden Markov Models (HMMs). The notion of partially observable Markov models (POMMs) is introduced. POMMs form a particular case of HMMs where any state emits a single letter with probability one, but several states can emit the ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
this report a novel approach to the induction of the structure of Hidden Markov Models (HMMs). The notion of partially observable Markov models (POMMs) is introduced. POMMs form a particular case of HMMs where any state emits a single letter with probability one, but several states can emit the same letter. It is shown that any HMM can be represented by an equivalent POMM. The proposed induction algorithm aims at finding a POMM fitting the dynamics of the target machine, that is to best approximate the stationary distribution and the mean first passage times observed in the sample. The induction relies on non-linear optimization and iterative state splitting from an initial order one Markov chain. Experimental results illustrate the advantages of the proposed approach as compared to Baum-Welch HMM estimation or back-o# smoothed Ngrams equivalent to variable order Markov chains
Grammatical Inference as a Principal Component Analysis Problem
"... One of the main problems in probabilistic grammatical inference consists in inferring a stochastic language, i.e. a probability distribution, in some class of probabilistic models, from a sample of strings independently drawn according to a fixed unknown target distribution p. Here, we consider the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
One of the main problems in probabilistic grammatical inference consists in inferring a stochastic language, i.e. a probability distribution, in some class of probabilistic models, from a sample of strings independently drawn according to a fixed unknown target distribution p. Here, we consider the class of rational stochastic languages composed of stochastic languages that can be computed by multiplicity automata, which can be viewed as a generalization of probabilistic automata. Rational stochastic languages p have a useful algebraic characterization: all the mappings ˙up: v → p(uv) lie in a finite dimensional vector subspace V ∗ p of the vector space R〈〈Σ〉 〉 composed of all real-valued functions defined over Σ ∗. Hence, a first step in the grammatical inference process can consist in identifying the subspace V ∗ p. In this paper, we study the possibility of using Principal Component Analysis to achieve this task. We provide an inference algorithm which computes an estimate of this space and then build a multiplicity automaton which computes an estimate of the target distribution. We prove some theoretical properties of this algorithm and we provide results from numerical simulations that confirm the relevance of our approach. 1.
Negative Feedback: The Forsaken Nature Available for Re-ranking
"... Re-ranking for Information Retrieval aims to elevate relevant feedbacks and depress negative ones in initial retrieval result list. Compared to relevance feedback-based re-ranking method widely adopted in the literature, this paper proposes a new method to well use three features in known negative f ..."
Abstract
- Add to MetaCart
Re-ranking for Information Retrieval aims to elevate relevant feedbacks and depress negative ones in initial retrieval result list. Compared to relevance feedback-based re-ranking method widely adopted in the literature, this paper proposes a new method to well use three features in known negative feedbacks to identify and depress unknown negative feedbacks. The features include: 1) the minor (lower-weighted) terms in negative feedbacks; 2) hierarchical distance (HD) among feedbacks in a hierarchical clustering tree; 3) obstinateness strength of negative feedbacks. We evaluate the method on the TDT4 corpus, which is made up of news topics and their relevant stories. And experimental results show that our new scheme substantially outperforms its counterparts. 1.

