Results 1 - 10
of
60
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
, 2001
"... The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extraction process, the paper develops a novel technique to compare HTML pages and generate a wrapper based on their similarities ..."
Abstract
-
Cited by 248 (6 self)
- Add to MetaCart
The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extraction process, the paper develops a novel technique to compare HTML pages and generate a wrapper based on their similarities and di#erences. Experimental results on real-life data-intensive Web sites confirm the feasibility of the approach. 1
Discovering Models of Software Processes from Event-Based Data
- ACM Transactions on Software Engineering and Methodology
, 1998
"... this article we describe a Markov method that we developed specifically for process discovery, as well as describe two additional methods that we adopted from other domains and augmented for our purposes. The three methods range from the purely algorithmic to the purely statistical. We compare the m ..."
Abstract
-
Cited by 187 (7 self)
- Add to MetaCart
this article we describe a Markov method that we developed specifically for process discovery, as well as describe two additional methods that we adopted from other domains and augmented for our purposes. The three methods range from the purely algorithmic to the purely statistical. We compare the methods and discuss their application in an industrial case study.
Workflow Mining: Discovering process models from event logs
- IEEE Transactions on Knowledge and Data Engineering
, 2003
"... Contemporary workflow management systems are driven by explicit process models, i.e., a completely specified workflow design is required in order to enact a given workflow process. Creating a workflow design is a complicated time-consuming process and typically there are discrepancies between the ac ..."
Abstract
-
Cited by 159 (28 self)
- Add to MetaCart
Contemporary workflow management systems are driven by explicit process models, i.e., a completely specified workflow design is required in order to enact a given workflow process. Creating a workflow design is a complicated time-consuming process and typically there are discrepancies between the actual workflow processes and the processes as perceived by the management. TherefS3A we have developed techniques fi discovering workflow models. Starting pointfS such techniques is a so-called "workflow log" containinginfg3SfiHfl" about the workflow process as it is actually being executed. We present a new algorithm to extract a process modelf3q such a log and represent it in terms of a Petri net. However, we will also demonstrate that it is not possible to discover arbitrary workflow processes. In this paper we explore a classof workflow processes that can be discovered. We show that the #-algorithm can successfqFS mine any workflow represented by a so-called SWF-net. Key words: Workflow mining, Workflow management, Data mining, Petri nets. 1
XTRACT: A System for Extracting Document Type Descriptors from XML Documents
- In ACM SIGMOD
, 2000
"... XML is rapidly emerging as the new standard for data representation and exchange on the Web. An XML document can be accompanied by a Document Type Descriptor (DTD) which plays the role of a schema for an XML data collection. DTDs contain valuable information on the structure of documents and thus ha ..."
Abstract
-
Cited by 85 (4 self)
- Add to MetaCart
XML is rapidly emerging as the new standard for data representation and exchange on the Web. An XML document can be accompanied by a Document Type Descriptor (DTD) which plays the role of a schema for an XML data collection. DTDs contain valuable information on the structure of documents and thus have a crucial role in the efficient storage of XML data, as well as the effective formulation and optimization of XML queries. In this paper, we propose XTRACT, a novel system for inferring a DTD schema for a database of XML documents. Since the DTD syntax incorporates the full expressive power of regular expressions, naive approaches typically fail to produce concise and intuitive DTDs. Instead, the XTRACT inference algorithms employ a sequence of sophisticated steps that involve: (1) finding patterns in the input sequences and replacing them with regular expressions to generate "general" candidate DTDs, (2) factoring candidate DTDs using adaptations of algorithms from the logic optimization...
Learning Models of Intelligent Agents
, 1996
"... Agents that operate in a multi-agent system need an efficient strategy to handle their encounters with other agents involved. Searching for an optimal interactive strategy is a hard problem because it depends mostly on the behavior of the others. In this work, interaction among agents is represented ..."
Abstract
-
Cited by 80 (2 self)
- Add to MetaCart
Agents that operate in a multi-agent system need an efficient strategy to handle their encounters with other agents involved. Searching for an optimal interactive strategy is a hard problem because it depends mostly on the behavior of the others. In this work, interaction among agents is represented as a repeated two-player game, where the agents' objective is to look for a strategy that maximizes their expected sum of rewards in the game. We assume that agents' strategies can be modeled as finite automata. A model-based approach is presented as a possible method for learning an effective interactive strategy. First, we describe how an agent should find an optimal strategy against a given model. Second, we present an unsupervised algorithm that infers a model of the opponent's automaton from its input/output behavior. A set of experiments that show the potential merit of the algorithm is reported as well. Introduction In recent years, a major research effort has been invested in desi...
Diversity-based Inference of Finite Automata
- Journal of ACM
, 1994
"... Abstract. We present new procedures for inferring the structure of a finite-state automaton (FSA) from its input \ output behavior, using access to the automaton to perform experiments. Our procedures use a new representation for finite automata, based on the notion of equivalence between tesfs. We ..."
Abstract
-
Cited by 63 (1 self)
- Add to MetaCart
Abstract. We present new procedures for inferring the structure of a finite-state automaton (FSA) from its input \ output behavior, using access to the automaton to perform experiments. Our procedures use a new representation for finite automata, based on the notion of equivalence between tesfs. We call the number of such equivalence classes the diLersL@of the automaton; the diversity may be as small as the logarithm of the number of states of the automaton. For the special class of pennatatton aatornata, we describe an inference procedure that runs in time polynomial in the diversity and log(l/6), where 8 is a given upper bound on the probability that our procedure returns an incorrect result. (Since our procedure uses randomization to perform experiments, there is a certain controllable chance that it will return an erroneous result.) We also discuss techniques for handling more general automata. We present evidence for the practical efficiency of our approach. For example, our procedure is able to infer the structure of an automaton based on Rubik’s Cube (which has approximately 10 lY states) in about 2 minutes on a DEC MicroVax. This automaton is many orders of magnitude larger than possible with previous techniques, which would require time proportional at least to the number of global states. (Note that in this example, only a small fraction (10-14, of the global
Probably Approximately Correct Learning
- Proceedings of the Eighth National Conference on Artificial Intelligence
, 1990
"... This paper surveys some recent theoretical results on the efficiency of machine learning algorithms. The main tool described is the notion of Probably Approximately Correct (PAC) learning, introduced by Valiant. We define this learning model and then look at some of the results obtained in it. We th ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
This paper surveys some recent theoretical results on the efficiency of machine learning algorithms. The main tool described is the notion of Probably Approximately Correct (PAC) learning, introduced by Valiant. We define this learning model and then look at some of the results obtained in it. We then consider some criticisms of the PAC model and the extensions proposed to address these criticisms. Finally, we look briefly at other models recently proposed in computational learning theory. 2 Introduction It's a dangerous thing to try to formalize an enterprise as complex and varied as machine learning so that it can be subjected to rigorous mathematical analysis. To be tractable, a formal model must be simple. Thus, inevitably, most people will feel that important aspects of the activity have been left out of the theory. Of course, they will be right. Therefore, it is not advisable to present a theory of machine learning as having reduced the entire field to its bare essentials. All ...
Opponent Modeling in a Multi-agent System
- Lecture note in AI, 1042: Adaptation and Learning in Multi-agent Systems, Lecture Notes in Artificial Intelligence
, 1995
"... Agents that operate in a multi-agent system need an efficient strategy to handle their encounters with other agents involved in that system. Searching for an optimal interactive strategy is a hard problem because it depends mostly on the behavior of the others. In this work, interaction among agents ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
Agents that operate in a multi-agent system need an efficient strategy to handle their encounters with other agents involved in that system. Searching for an optimal interactive strategy is a hard problem because it depends mostly on the behavior of the others. In this work, interaction among agents is represented as a repeated two-player game, where an agents' objective is to look for a strategy that maximizes their expected sum of rewards in the game. We assume that agents' strategies can be modeled as finite automata. A model based reasoning approach is presented as a possible method for learning an efficient interactive strategy. First, we describe how an agent should find an optimal strategy against a given model. Second, we present a heuristic algorithm that infers a model of the opponent's automata from its input/output behavior. A set of experiments that show the potential merit of the algorithm is reported as well. Keywords: Opponent modeling, Model based reasoning, Finite au...
Efficient Reinforcement Learning
- In Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory
, 1994
"... In this paper we propose a new formal model for studying reinforcement learning, based on Valiant's PAC framework. In our model the learner does not have direct access to every state of the environment. Instead, every sequence of experiments starts in a fixed initial state and the learner is provide ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
In this paper we propose a new formal model for studying reinforcement learning, based on Valiant's PAC framework. In our model the learner does not have direct access to every state of the environment. Instead, every sequence of experiments starts in a fixed initial state and the learner is provided with a "reset" operation that interrupts the current sequence of experiments and starts a new one (from the initial state). We do not require the agent to learn the optimal policy but only a good approximation of it with high probability. More precisely, we require the learner to produce a policy whose expected value from the initial state is "-close to that of the optimal policy, with probability no less than 1 \Gamma ffi . For this model, we describe an algorithm that produces such an (",ffi)-optimal policy, for any environment, in time polynomial in N , K, 1=", 1=ffi, 1=(1 \Gamma fi) and r max , where N is the number of states of the environment, K is the maximum number of actions in a...
Learning Shallow Context-Free Languages under Simple Distributions
, 1999
"... this paper I present the EMILE 3.0 algorithm ..."

