Results 1  10
of
12
Evaluation of Traditional, Orthogonal, and Radial Tree Diagrams by an Eye Tracking Study
"... Abstract—Nodelink diagrams are an effective and popular visualization approach for depicting hierarchical structures and for showing parentchild relationships. In this paper, we present the results of an eye tracking experiment investigating traditional, orthogonal, and radial nodelink tree layou ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Nodelink diagrams are an effective and popular visualization approach for depicting hierarchical structures and for showing parentchild relationships. In this paper, we present the results of an eye tracking experiment investigating traditional, orthogonal, and radial nodelink tree layouts as a piece of empirical basis for choosing between those layouts. Eye tracking was used to identify visual exploration behaviors of participants that were asked to solve a typical hierarchy exploration task by inspecting a static tree diagram: finding the least common ancestor of a given set of marked leaf nodes. To uncover exploration strategies, we examined fixation points, duration, and saccades of participants ’ gaze trajectories. For the nonradial diagrams, we additionally investigated the effect of diagram orientation by switching the position of the root node to each of the four main orientations. We also recorded and analyzed correctness of answers as well as completion times in addition to the eye movement data. We found out that traditional and orthogonal tree layouts significantly outperform radial tree layouts for the given task. Furthermore, by applying trajectory analysis techniques we uncovered that participants crosschecked their task solution more often in the radial than in the nonradial layouts. Index Terms—Hierarchy visualization, nodelink layout, eye tracking, user study. 1
Unsupervised Detection of Anomalous Text
, 2008
"... This thesis describes work on the detection of anomalous material in text without the use of training data. We use the term anomalous to refer to text that is irregular, or deviates significantly from its surrounding context. In this thesis we show that identifying such abnormalities in text can be ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
This thesis describes work on the detection of anomalous material in text without the use of training data. We use the term anomalous to refer to text that is irregular, or deviates significantly from its surrounding context. In this thesis we show that identifying such abnormalities in text can be viewed as a type of outlier detection because these anomalies will differ significantly from the writing style in the majority of the data. We consider segments of text which are anomalous with respect to topic (i.e. about a different subject), author (written by a different person), or genre (written for a different audience or from a different source) and experiment with whether it is possible to identify these anomalous segments automatically. Five different innovative approaches to this problem are introduced and assessed using many experiments over large document collections, created to contain randomly inserted anomalous segments. In order to identify anomalies in text successfully, we investigate and evaluate 166 stylistic and linguistic features used to characterize writing, some of which are wellestablished stylistic determiners, but many of which are original. Using these features with each of our methods, we examine the effect of segment size on our ability to detect anomaly, allowing segments of size 100 words, 500 words and 1000 words. We show substantial improvements over a baseline in all cases for all methods, and identify a novel method which performs consistently better than others and the features that contribute most to unsupervised anomaly detection.
ON AN EXTREME VALUE VERSION OF THE BIRNBAUM–SAUNDERS DISTRIBUTION Authors:
"... The Birnbaum–Saunders model is a life distribution originated from a problem of material fatigue that has been largely studied and applied in recent decades. A random variable following the Birnbaum–Saunders distribution can be stochastically represented by another random variable used as basis. The ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The Birnbaum–Saunders model is a life distribution originated from a problem of material fatigue that has been largely studied and applied in recent decades. A random variable following the Birnbaum–Saunders distribution can be stochastically represented by another random variable used as basis. Then, the Birnbaum–Saunders model can be generalized by switching the distribution of the basis variable using diverse arguments allowing to construct more general classes of models. Extreme value distributions are useful to determinate the probability of events that are larger or smaller than others previously observed. In this paper, we propose, characterize, implement and apply an extreme value version of the Birnbaum–Saunders distribution. KeyWords: domain of attraction; extreme data; likelihood method; R computer language.
A ROBUSTIFICATION OF THE CHAINLADDER METHOD
 NORTH AMERICAN ACTUARIAL JOURNAL
"... In a non–life insurance business an insurer often needs to build up a reserve to able to meet his or her future obligations arising from incurred but not reported completely claims. To forecast these claims reserves, a simple but generally accepted algorithm is the classical chainladder method. Rec ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In a non–life insurance business an insurer often needs to build up a reserve to able to meet his or her future obligations arising from incurred but not reported completely claims. To forecast these claims reserves, a simple but generally accepted algorithm is the classical chainladder method. Recent research essentially focused on the underlying model for the claims reserves to come to appropriate bounds for the estimates of future claims reserves. Our research concentrates on scenarios with outlying data. On closer examination it is demonstrated that the forecasts for future claims reserves are very dependent on outlying observations. The paper focuses on two approaches to robustify the chainladder method: the first method detects and adjusts the outlying values, whereas the second method is based on a robust generalized linear model technique. In this way insurers will be able to find a reserve that is similar to the reserve they would have found if the data contained no outliers. Because the robust method flags the outliers, it is possible to examine these observations for further examination. For obtaining the corresponding standard errors the bootstrapping technique is applied. The robust chainladder method is applied to several runoff triangles with and without outliers, showing its excellent performance.
Feature Scattering in the Large: A Longitudinal Study of Linux Kernel Device Drivers
"... Feature code is often scattered across wide parts of the code base. But, scattering is not necessarily bad if used with care— in fact, systems with highly scattered features have evolved successfully over years. Among others, feature scattering allows developers to circumvent limitations in programm ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Feature code is often scattered across wide parts of the code base. But, scattering is not necessarily bad if used with care— in fact, systems with highly scattered features have evolved successfully over years. Among others, feature scattering allows developers to circumvent limitations in programming languages and system design. Still, little is known about the characteristics governing scattering, which factors influence it, and practical limits in the evolution of large and longlived systems. We address this issue with a longitudinal case study of feature scattering in the Linux kernel. We quantitatively and qualitatively analyze almost eight years of its development history, focusing on scattering of devicedriver features. Among others, we show that, while scattered features are regularly added, their proportion is lower than nonscattered ones, indicating that the kernel architecture allows most features to be integrated in a modular manner. The median scattering degree of features is constant and low, but the scatteringdegree distribution is heavily skewed. Thus, using the arithmetic mean is not a reliable threshold to monitor the evolution of feature scattering. When investigating influencing factors, we find that platformdriver features are 2.5 times more likely to be scattered across architectural (subsystem) boundaries when compared to nonplatform ones. Their use illustrates a maintenanceperformance tradeoff in creating architectures as for Linux kernel device drivers.
OUTLIER DETECTION FOR SKEWED DATA
, 2007
"... Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of the underlying distribution. We propose an outlier detection method which does not need the assumption of symmetry and does not rely on visual inspection. Our method is a generalization of the St ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of the underlying distribution. We propose an outlier detection method which does not need the assumption of symmetry and does not rely on visual inspection. Our method is a generalization of the StahelDonoho outlyingness. The latter approach assigns to each observation a measure of outlyingness, which is obtained by projection pursuit techniques that only use univariate robust measures of location and scale. To allow skewness in the data, we adjust this measure of outlyingness by using a robust measure of skewness as well. The observations corresponding to an outlying value of the adjusted outlyingness are then considered as outliers. For bivariate data, our approach leads to two graphical representations. The first one is a contour plot of the adjusted outlyingness values, and can be considered as an approximation of the density contours of the underlying distribution. We also construct an extension of the boxplot for bivariate data, in the spirit of the bagplot [1] which is based on the concept of half space depth. We illustrate our outlier detection method on several simulated and real data.
Outlier Detection in Test and Questionnaire Data
"... Classical methods for detecting outliers deal with continuous variables. These methods are not readily applicable to categorical data, such as incorrect/correct scores (0/1) and ordered rating scale scores (e.g., 0; : : : ; 4) typical of multiitem tests and questionnaires. This study proposes two d ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Classical methods for detecting outliers deal with continuous variables. These methods are not readily applicable to categorical data, such as incorrect/correct scores (0/1) and ordered rating scale scores (e.g., 0; : : : ; 4) typical of multiitem tests and questionnaires. This study proposes two definitions of outlier scores suited for categorical data. One definition combines information on outliers from scores on all the items in the test, and the other definition combines information from all pairs of item scores. For a particular itemscore vector, an outlier score expresses the degree to which the itemscore vector is unusual. For ten realdata sets, the distribution of each of the two outlier scores is inspected by means of Tukey’s fences and the extreme studentized deviate procedure. It is investigated whether the outliers that are identified are influential with respect to the statistical analysis performed on these data. Recommendations are given for outlier identification and accommodation in test and questionnaire data.
Software and Systems Modeling manuscript No. (will be inserted by the editor) The Shape of Feature Code: An Analysis of Twenty CPreprocessorBased Systems
"... Abstract Feature annotations (e.g., code fragments guarded by #ifdef Cpreprocessor directives) control code extensions related to features. Feature annotations have long been said to be undesirable. When maintaining features that control many annotations, there is a high risk of ripple effects. Al ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Feature annotations (e.g., code fragments guarded by #ifdef Cpreprocessor directives) control code extensions related to features. Feature annotations have long been said to be undesirable. When maintaining features that control many annotations, there is a high risk of ripple effects. Also, excessive use of feature annotations lead to code clutter, hindering program comprehension and hardening maintenance. To prevent such problems, developers should monitor the use of feature annotations, for example, by setting acceptable thresholds. Interestingly, little is known about how to extract thresholds in practice, and which values are representative for featurerelated metrics. To address this issue, we analyze the statistical distribution of three featurerelated metrics collected from a corpus of 20 wellknown and longlived Cpreprocessorbased systems from different domains. We consider three metrics: scattering degree of feature constants, tangling degree of feature expressions, and nesting depth of preprocessor annotations. Our findings show that feature scattering is highly skewed; in 14 systems (70%), the scattering distributions match a power law, making averages and standard deviations unreliable limits. Regarding tangling and nesting, the values tend to follow a uniform distribution; although outliers exist, they have little impact on the mean, suggesting that central statistics measures are reliable thresholds for tangling and nesting. Following our findings, we then propose thresholds from our benchmark data, as a basis for further investigations.
Robust Independent Component Analysis
"... Independent Component Analysis (ICA) is a statistical method for transforming multidimensional random vectors into components which are statistically as independent from each other as possible. It can be seen as a generalization of Principal Component Analysis, which seeks for uncorrelated factors. ..."
Abstract
 Add to MetaCart
(Show Context)
Independent Component Analysis (ICA) is a statistical method for transforming multidimensional random vectors into components which are statistically as independent from each other as possible. It can be seen as a generalization of Principal Component Analysis, which seeks for uncorrelated factors. In recent years, many algorithms were proposed that perform well in many situations, but
Monetaire en Informatie
"... In this paper we give the outline of a research project developed in a cooperation between the actuarial, financial and statistical research groups of the Faculty of Economics and Applied Economics and the research group on statistics in the Mathematical Department. The main purpose consists indeter ..."
Abstract
 Add to MetaCart
In this paper we give the outline of a research project developed in a cooperation between the actuarial, financial and statistical research groups of the Faculty of Economics and Applied Economics and the research group on statistics in the Mathematical Department. The main purpose consists indetermining quantitative tools for managing unvertainty (in a financial insurance environment). I. FINANCIAL, ACTUARIAL AND ECONOMIC ASPECTS OF MANAGING UNCERTAINTY A. Risk measures, valuation principles, and ValueatRisk We will introduce a distinction between risk measures and valuation principles such as premium principle, allocation principle, solvency or capital principle, where the capital might be a regulatory, a management or rating capital. The difference between a risk measure and a principle stems from the different levels on which they operate. Indeed a risk measure is a functional