Results 1 - 10
of
37
Predestination: Inferring Destinations from Partial Trajectories
- In Ubicomp
, 2006
"... Abstract. We describe a method called Predestination that uses a history of a driver’s destinations, along with data about driving behaviors, to predict where a driver is going as a trip progresses. Driving behaviors include types of destinations, driving efficiency, and trip times. Beyond consideri ..."
Abstract
-
Cited by 58 (11 self)
- Add to MetaCart
Abstract. We describe a method called Predestination that uses a history of a driver’s destinations, along with data about driving behaviors, to predict where a driver is going as a trip progresses. Driving behaviors include types of destinations, driving efficiency, and trip times. Beyond considering previously visited destinations, Predestination leverages an open-world modeling methodology that considers the likelihood of users visiting previously unobserved locations based on trends in the data and on the background properties of locations. This allows our algorithm to smoothly transition between “out of the box ” with no training data to more fully trained with increasing numbers of observations. Multiple components of the analysis are fused via Bayesian inference to produce a probabilistic map of destinations. Our algorithm was trained and tested on hold-out data drawn from a database of GPS driving data gathered from 169 different subjects who drove 7,335 different trips. 1
Augmenting Naive Bayes Classifiers with Statistical Language Models
, 2003
"... We augment naive Bayes models with statistical n-gram language models to address shortcomings of the standard naive Bayes text classifier. The result is a generalized naive Bayes classifier ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
We augment naive Bayes models with statistical n-gram language models to address shortcomings of the standard naive Bayes text classifier. The result is a generalized naive Bayes classifier
ACAS: Automated construction of application signatures
- In SIGCOMM’05 MineNet Workshop
, 2005
"... An accurate mapping of traffic to applications is important for a broad range of network management and measurement tasks. Internet applications have traditionally been identified using well-known default server network-port numbers in the TCP or UDP headers. However this approach has become increas ..."
Abstract
-
Cited by 38 (1 self)
- Add to MetaCart
An accurate mapping of traffic to applications is important for a broad range of network management and measurement tasks. Internet applications have traditionally been identified using well-known default server network-port numbers in the TCP or UDP headers. However this approach has become increasingly inaccurate. An alternate, more accurate technique is to use specific application-level features in the protocol exchange to guide the identification. Unfortunately deriving the signatures manually is very time consuming and difficult. In this paper, we explore automatically extracting application signatures from IP traffic payload content. In particular we apply three statistical machine learning algorithms to automatically identify signatures for a range of applications. The results indicate that this approach is highly accurate and scales to allow online application identification on high speed links. We also discovered that content signatures still work in the presence of encryption. In these cases we were able to derive content signature for unencrypted handshakes negotiating the encryption parameters of a particular connection.
Combining Naive Bayes and n-Gram Language Models for Text Classification
- In 25th European Conference on Information Retrieval Research (ECIR
, 2003
"... We augment the naive Bayes model with an n-gram language model to address two shortcomings of naive Bayes text classifiers. ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
We augment the naive Bayes model with an n-gram language model to address two shortcomings of naive Bayes text classifiers.
A learning-based approach for IP geolocation
- In Proceedings of the Passive and Active Measurement Conference (PAM
, 2010
"... Abstract. The ability to pinpoint the geographic location of IP hosts is compelling for applications such as on-line advertising and network attack diagnosis. While prior methods can accurately identify the location of hosts in some regions of the Internet, they produce erroneous results when the de ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Abstract. The ability to pinpoint the geographic location of IP hosts is compelling for applications such as on-line advertising and network attack diagnosis. While prior methods can accurately identify the location of hosts in some regions of the Internet, they produce erroneous results when the delay or topology measurement on which they are based is limited. The hypothesis of our work is that the accuracy of IP geolocation can be improved through the creation of a flexible analytic framework that accommodates different types of geolocation information. In this paper, we describe a new framework for IP geolocation that reduces to a machine-learning classification problem. Our methodology considers a set of lightweight measurements from a set of known monitors to a target, and then classifies the location of that target based on the most probable geographic region given probability densities learned from a training set. For this study, we employ a Naive Bayes framework that has low computational complexity and enables additional environmental information to be easily added to enhance the classification process. To demonstrate the feasibility and accuracy of our approach, we test IP geolocation on over 16,000 routers given ping measurements from 78 monitors with known geographic placement. Our results show that the simple application of our method improves geolocation accuracy for over 96 % of the nodes identified in our data set, with on average accuracy 70 miles closer to the true geographic location versus prior constraintbased geolocation. These results highlight the promise of our method and indicate how future expansion of the classifier can lead to further improvements in geolocation accuracy. 1
Inducing optimal emotional state for learning in Intelligent Tutoring Systems
- International Conference on Intelligent Tutoring Systems, Maceïo
, 2004
"... Abstract. Emotions play an important role in cognitive processes and specially in learning tasks. Moreover, there are some evidences that the emotional state of the learner correlated with his performance. Furthermore, it’s important that new Intelligent Tutoring Systems involve this emotional aspec ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Abstract. Emotions play an important role in cognitive processes and specially in learning tasks. Moreover, there are some evidences that the emotional state of the learner correlated with his performance. Furthermore, it’s important that new Intelligent Tutoring Systems involve this emotional aspect; they may be able to recognize the emotional state of the learner, and to change it so as to be in the best conditions for learning. In this paper we describe such an architecture developed in order to determine the optimal emotional state for learning and to induce it. Based on experimentation, we have used the Naïve Bayes classifier to predict the optimal emotional state according to the personality and then we induce it using a hybrid technique which combines the guided imagery technique, music and images. 1
Realistic driving trips for location privacy
- In Proc. Pervasive
, 2009
"... Abstract. Simulated, false location reports can be an effective way to confuse a privacy attacker. When a mobile user must transmit his or her location to a central server, these location reports can be accompanied by false reports that, ideally, cannot be distinguished from the true one. The realis ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. Simulated, false location reports can be an effective way to confuse a privacy attacker. When a mobile user must transmit his or her location to a central server, these location reports can be accompanied by false reports that, ideally, cannot be distinguished from the true one. The realism of the false reports is important, because otherwise an attacker could filter out all but the real data. Using our database of GPS tracks from over 250 volunteer drivers, we developed probabilistic models of driving behavior and applied the models to create realistic driving trips. The simulations model realistic start and end points, slightly non-optimal routes, realistic driving speeds, and spatially varying GPS noise.
Mining Log Files for Data-Driven System Management
- SIGKDD Explorations, Volume 7, issue
, 2005
"... are becoming increasingly more complex with an increasing variety of heterogeneous software and hardware components. They are thus becoming increasingly more difficult to monitor, manage and maintain. Traditional approaches to system management have been largely based on domain experts through a kno ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
are becoming increasingly more complex with an increasing variety of heterogeneous software and hardware components. They are thus becoming increasingly more difficult to monitor, manage and maintain. Traditional approaches to system management have been largely based on domain experts through a knowledge acquisition process that translates domain knowledge into operating rules and policies. This has been well known and experienced as a cumbersome, labor intensive, and error prone process. In addition, this process is difficult to keep up with the rapidly changing environments. There is thus a pressing need for automatic and efficient approaches to monitor and manage complex computing systems. A popular approach to system management is based on analyzing system log files. However, some new aspects of the log files have been less emphasized in existing methods from data mining and machine learning community. The various formats and relatively short text messages of log files, and temporal characteristics in data representation pose new challenges. In this paper, we will describe our research efforts on mining system log files for automatic management. In particular, we apply text mining techniques to categorize messages in log files into common situations, improve categorization accuracy by considering the temporal characteristics of log messages, and utilize visualization tools to evaluate and validate the interesting temporal patterns for system management.
Activity recognition from on-body sensors by classifier fusion: Sensor scalability and robustness
- INTERNATIONAL CONFERENCE ON INTELLIGENT SENSORS, SENSOR NETWORKS, AND INFORMATION PROCESSING
, 2007
"... Activity recognition from on-body sensors is affected by sensor degradation, interconnections failures, and jitter in sensor placement and orientation. We investigate how this may be balanced by exploiting redundant sensors distributed on the body. We recognize activities by a meta-classifier that f ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Activity recognition from on-body sensors is affected by sensor degradation, interconnections failures, and jitter in sensor placement and orientation. We investigate how this may be balanced by exploiting redundant sensors distributed on the body. We recognize activities by a meta-classifier that fuses the information of simple classifiers operating on individual sensors. We investigate the robustness to faults and sensor scalability which follows from classifier fusion. We compare a reference majority voting and a naive Bayesian fusion scheme. We validate this approach by recognizing a set of 10 activities carried out by workers in the quality assurance checkpoint of a car assembly line. Results show that classification accuracy greatly increases with additional sensors (50% with 1 sensor, 80% and 98% with 3 and 57 sensors), and that sensor fusion implicitly allows to compensate for typical faults up to high fault rates. These results highlight the benefit of large on- body sensor network rather than a minimum set of sensors for activity recognition and prompts further investigation.
The Modules and Methods of Topic Detection and Tracking
- Proceedings of the 2 nd student conference on IT, University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science
"... This report presents the methods and techniques used to perform the tasks of Topic Detection and Tracking (TDT). It starts with an introduction to TDT and its five tasks: Story ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This report presents the methods and techniques used to perform the tasks of Topic Detection and Tracking (TDT). It starts with an introduction to TDT and its five tasks: Story

