Results 1 -
5 of
5
LogTree: A Framework for Generating System Events from Raw Textual Logs
"... Abstract—Modern computing systems are instrumented to generate huge amounts of system logs and these data can be utilized for understanding and complex system behaviors. One main fundamental challenge in automated log analysis is the generation of system events from raw textual logs. Recent works ap ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract—Modern computing systems are instrumented to generate huge amounts of system logs and these data can be utilized for understanding and complex system behaviors. One main fundamental challenge in automated log analysis is the generation of system events from raw textual logs. Recent works apply clustering techniques to translate the raw log messages into system events using only the word/term information. In this paper, we first illustrate the drawbacks of existing techniques for event generation from system logs. We then propose LogTree, a novel and algorithm-independent framework for events generation from raw system log messages. LogTree utilizes the format and structural information of the raw logs in the clustering process to generate system events with better accuracy. In addition, an indexing data structure, Message Segment Table, is proposed in LogTree to significantly improve the efficiency of events generation. Extensive experiments on real system logs demonstrate the effectiveness and efficiency of LogTree. Keywords-event generation; mining system log data; LogTree; log message clustering. I.
Mining Temporal Invariants from Partially Ordered Logs
"... A common assumption made in log analysis research is that the underlying log is totally ordered. For concurrent systems, this assumption constrains the generated log to either exclude concurrency altogether, or to capture a particular interleaving of concurrent events. This paper argues that capturi ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A common assumption made in log analysis research is that the underlying log is totally ordered. For concurrent systems, this assumption constrains the generated log to either exclude concurrency altogether, or to capture a particular interleaving of concurrent events. This paper argues that capturing concurrency as a partial order is useful and often indispensable for answering important questions about concurrent systems. To this end, we motivate a family of event ordering invariants over partially ordered event traces, give three algorithms for mining these invariants from logs, and evaluate their scalability on simulated distributed system logs. 1
LogSig: Generating system events from raw textual logs
- in Proceedings of ACM CIKM
, 2011
"... Modern computing systems generate large amounts of log data. System administrators or domain experts utilize the log data to understand and optimize system behaviors. Most system logs are raw textual and unstructured. One main fundamental challenge in automated log analysis is the generation of syst ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Modern computing systems generate large amounts of log data. System administrators or domain experts utilize the log data to understand and optimize system behaviors. Most system logs are raw textual and unstructured. One main fundamental challenge in automated log analysis is the generation of system events from raw textual logs. Log messages are relatively short text messages but may have a large vocabulary, which often result in poor performance when applying traditional text clustering techniques to the log data. Other related methods have various limitations and only work well for some particular system logs. In this paper, we propose a message signature based algorithm logSig to generate system events from textual log messages. By searching the most representative message signatures, logSig categorizes log messages into a set of event types. logSig can handle various types of log data, and is able to incorporate human’s domain knowledge to achieve a high performance. We conduct experiments on five real system log data. Experiments show that logSig outperforms other alternative algorithms in terms of the overall performance.
LearnPADS ++ Incremental Inference of Ad Hoc Data Formats
"... Abstract. An ad hoc data source is any semi-structured, non-standard data source. The format of such data sources is often evolving and frequently lacking documentation. Consequently, off-the-shelf tools for processing such data often do not exist, forcing analysts to develop their own tools, a cost ..."
Abstract
- Add to MetaCart
Abstract. An ad hoc data source is any semi-structured, non-standard data source. The format of such data sources is often evolving and frequently lacking documentation. Consequently, off-the-shelf tools for processing such data often do not exist, forcing analysts to develop their own tools, a costly and time-consuming process. In this paper, we present an incremental algorithm that automatically infers the format of large-scale data sources. From the resulting format descriptions, we can generate a suite of data processing tools automatically. The system can handle large-scale or streaming data sources whose formats evolve over time. Furthermore, it allows analysts to modify inferred descriptions as desired and incorporates those changes in future revisions. 4 1
Bistro Data Feed Management System
"... Data feed management is a critical component of many data intensive applications that depend on reliable data delivery to support real-time data collection, correlation and analysis. Data is typically collected from a wide variety of sources and organizations, using a range of mechanisms- some data ..."
Abstract
- Add to MetaCart
Data feed management is a critical component of many data intensive applications that depend on reliable data delivery to support real-time data collection, correlation and analysis. Data is typically collected from a wide variety of sources and organizations, using a range of mechanisms- some data are streamed in real time, while other data are obtained at regular intervals or collected in an ad hoc fashion. Individual applications are forced to make separate arrangements with feed providers, learn the structure of incoming files, monitor data quality, and trigger any processing necessary. The Bistro data feed manager, designed and implemented at AT&T Labs-Research, simplifies and automates this complex task of data feed management: efficiently handling incoming raw files, identifying data feeds and distributing them to remote subscribers. Bistro supports a flexible specification language to define logical data feeds using the naming structure of physical data files, and to identify feed subscribers. Based on the specification, Bistro matches data files to feeds, performs file normalization and compression, efficiently delivers files, and notifies subscribers using a trigger mechanism. We describe our feed analyzer that discovers the naming structure of incoming data files to detect new feeds, dropped feeds, feed changes, or lost data in an existing feed. Bistro is currently deployed within AT&T Labs and is responsible for the real-time delivery of over 100 different raw feeds, distributing data to several large-scale stream warehouses.

