#### DMCA

## A comparative survey of business process similarity measures

Venue: | Computers in Industry |

Citations: | 5 - 0 self |

### Citations

2252 | WordNet: A lexical database for English
- Miller
- 1995
(Show Context)
Citation Context ...g et al. [6] measure the similarity of PMs based on so-called semantic business process models - predicate transition Petri nets transformed in an ontological representation. Let A0 = { a01, a 0 2, . . . a 0 #A0 } and A1 = { a11, a 1 2, . . . a 1 #A1 } be the sets of activities of the models. Ehrig et al. describe three measures for similarity between the label of an activity a0 ∈ A0 and the label of an activity a1 ∈ A1. First, the Levenshtein string edit distance [23] is used to calculate the syntactic similarity. Second, linguistic similarity handles synonyms in element labels using WordNet [24] and takes the meaning of the words in the label into account. Finally, structural similarity compares the context (such as attribute names and values or succeeding nodes) of single activities. The similarity measure corr(a0, a1) between activity a0 ∈ A0 and activity a1 ∈ A1 is aggregated by the weighted combination of the three similarity types. The similarity of the models as a whole is calculated as sim(M0,M1) = 1 #A0 #A0∑ i=1 max j=1,...#A1 corr(a0i , a 1 j ) (2) Table 5. Adherence to properties and similarity values of [6] Syntactic, Linguistic and Structural Similarity of Activity Labels... |

1975 |
Binary codes capable of correcting deletions, insertions, and reversals
- Levenshtein
- 1966
(Show Context)
Citation Context ...eans that instead of corr, the function corr′(x, y) = { corr(x, y) if corr(x, y) ≥ threshold, 0 otherwise is used. The label matching similarity is defined as sim(M0,M1) = 2 ∑ x∈A0 corr(x,map(x)) #A0 + #A1 . Table 4. Adherence to properties and similarity values of [4] Label Matching Similarity [4] Adherence to property . . . 1–yes 2 - no 3 - no 3a - no 5–yes 6 - yes 7–yes 8–yes Similarity between V0 and . . . V0 V1 V2 V3 V4 V5 V6 V7 1.00 1.00 1.00 0.82 1.00 1.00 1.00 1.00 Calculation Settings and Results: To establish correspondences between nodes, we use the Levenshtein string edit distance [23] as function corr. Furthermore, to match two activities, we use a threshold of 0.5. Since we only have singleletter words in our example models the resulting similarities between individual activities is either 0 or 1. The resulting similarity values are shown in Tab. 4. Discussion: As can be seen from Tab. 4, the similarity values for the approach proposed by [4] match with the results established by [22] (Sect. 6.1.I) This is due to the fact that the measure of [4] also does not take into account any information about the order of nodes. For example, sim(V0, V5) would be 1 while sim(V0, V3) ... |

1452 | Features of similarity
- Tversky
- 1977
(Show Context)
Citation Context ...uirement: Property 3a: dist(M0,M1) = 0⇔ Σ(M0) ≡ Σ(M1). Property 4, the triangle inequality, is not essential for measuring the dissimilarity (distance) between PMs (or for (dis)similarity measures in general, see [11]). Therefore, we will not examine the suggested measures with respect to this property. It is a useful property anyway, because a distance measure that fulfills all four properties given above allows to organise a PM repository using data structures in which the search for similar models is very fast [12]. From an information-theoretic discussion of the concept of similarity (see [11, 13]), one more requirement for a similarity measure can be derived: Such a measure should take into consideration both the commonality between two models and their differences (Property 5). For example, we would not get a good similarity measure by just counting the number of activities that are shared among two models without relating this number to the overall number of activities in the models: If two models with 20 nodes have 15 node names in common, it would be reasonable to say that they are more similar to each other than two models with 200 nodes from which 15 node names can be found in b... |

1241 | An information-theoretic definition of similarity
- LIN
- 1998
(Show Context)
Citation Context ...f traces Σ(M0) and Σ(M1) are considered as being the same (symbol: Σ(M0) ≡ Σ(M1)) if 〈s1, s2, . . . 〉 ∈ Σ(M0) implies that 〈map(s1),map(s2), . . . 〉 ∈ Σ(M1) and vice versa, 〈t1, t2, . . . 〉 ∈ Σ(M1) implies that there is a 〈s1, s2, . . . 〉 ∈ Σ(M0) such that map(si) = ti ∀i. With this interpretation of equality between sets of traces, Property 3 can be substituted by the less strict requirement: Property 3a: dist(M0,M1) = 0⇔ Σ(M0) ≡ Σ(M1). Property 4, the triangle inequality, is not essential for measuring the dissimilarity (distance) between PMs (or for (dis)similarity measures in general, see [11]). Therefore, we will not examine the suggested measures with respect to this property. It is a useful property anyway, because a distance measure that fulfills all four properties given above allows to organise a PM repository using data structures in which the search for similar models is very fast [12]. From an information-theoretic discussion of the concept of similarity (see [11, 13]), one more requirement for a similarity measure can be derived: Such a measure should take into consideration both the commonality between two models and their differences (Property 5). For example, we would ... |

425 | Adeptflex− supporting dynamic changes of workflows without losing control
- Reichert, Dadam
- 1998
(Show Context)
Citation Context ...imilarity between V0 and the other PMs as well as the adherence of the measure proposed by Li et al. to the properties given in Sect. 3. We identified the necessary amount of change operations by hand. For example, to transform V0 into V2 we have to change the types of the four connectors from XOR to OR. Since we also take start and stop events into account, sim(V0, V2) = 1 − 415+15−11 . The other similarity values are calculated analogously. Discussion: The similarity measure based on high-level change operations has been developed in the context of the process-aware information system ADEPT [32]. This framework allows to construct sound PMs by starting from an empty model and repeatedly applying high-level change operations. In this context, the question about the difference between model variants V0 and V1 is not relevant, because the construction algorithm would ensure that only one of the models would occur in practice. A remarkable property of this measure is that when calculating sim(V0, V6) and sim(V0, V7), we have to regard only one high-level change operation. This is different from other similarity measures based on graph edit distances that we have discussed so far. V. Tree... |

240 | Similarity measures.
- Santini, Jain
- 1999
(Show Context)
Citation Context ...ess models. A distance measure dist is a function dist : M×M→ R+ ∪ {0}. A similarity measure is a function sim : M×M→ [0, 1]. The formula sim(x, y) = 1 1 + dist(x, y) (1) can be used for transforming a distance measure into a similarity measure or vice versa. In Sect. 6, we will discuss several alternatives to define the functions map, corr, dist and sim, i.e. in each subsection those functions will be defined differently. Throughout this article, the symbol #K will be used to denote the number of elements of a set K. 3 Desirable Properties of Distance and Similarity Measures Santini and Jain [10] point out that a number of dissimilarity measures proposed in the literature assume that those measures are distance measures in a metric space. (M, dist), the set of all process models M with a distance measure dist, becomes a metric space, if the following properties hold: Property 1 dist(M0,M1) ≥ 0 ∀M0,M1 ∈M (non-negativity) Property 2 dist(M0,M1) = dist(M1,M0) ∀M0,M1 ∈M (symmetry) Property 3 dist(M0,M1) = 0⇔M0 ≡M1 Property 4 dist(M0,M2) ≤ dist(M0,M1) + dist(M1,M2) (triangle inequality) For measuring the “dissimilarity” distance between PMs, it is reasonable to require Property 1 and Prope... |

195 | Change patterns and change support features enhancing flexibility in process-aware information systems.
- Weber, Reichert, et al.
- 2008
(Show Context)
Citation Context ...ations that is necessary to transform model M0 = (N0, E0) into model M1 = (N1, E1). A corresponding similarity measure is introduced as sim(M0,M1) = 1− dist(M0,M1)#N0+#N1−#(N0∩N1) . Table 14. Adherence to properties and similarity values of [31] Graph-Edit Distance by High-Level Change Operations [31] Adherence to property . . . 1–yes 2–yes 3–no 3a–yes 5–yes 6–no 7 - n/a 8–yes Similarity between V0 and . . . V0 V1 V2 V3 V4 V5 V6 V7 1.00 1.00 0.79 0.79 0.87 0.33 0.93 0.93 Calculation Settings and Results: For our calculations, we referred to the set of high-level change operations described in [19]. Tab. 14 shows the similarity between V0 and the other PMs as well as the adherence of the measure proposed by Li et al. to the properties given in Sect. 3. We identified the necessary amount of change operations by hand. For example, to transform V0 into V2 we have to change the types of the four connectors from XOR to OR. Since we also take start and stop events into account, sim(V0, V2) = 1 − 415+15−11 . The other similarity values are calculated analogously. Discussion: The similarity measure based on high-level change operations has been developed in the context of the process-aware info... |

181 | A survey on tree edit distance and related problems
- Bille
- 2005
(Show Context)
Citation Context ...on. This is different from other similarity measures based on graph edit distances that we have discussed so far. V. Tree Edit Distance Between PMs Represented as Trees In [33], Bae et al. transform a PM into an ordered tree. A sequential PM (without any splits and joins) would become a tree of depth one; all activities would be leafs that are children of the root node. A split node in the PM would correspond to a node in the tree which is parent of several subtrees which correspond to the outgoing arcs of this split. After translating a PM into a tree this way, algorithms for comparing trees [34] are used. Table 15. Adherence to properties and similarity values of [33] Tree Edit Distance Between PMs Represented as Trees [33] Adherence to property . . . 1–yes 2–yes 3–no 3a–no 5–yes 6–no 7–yes 8–yes Similarity between V0 and . . . V0 V1 V2 V3 V4 V5 V6 V7 1.00 1.00 1.00 0.08 0.13 0.06 0.14 0.11 Calculation Settings and Results: Since the original paper does not describe whether Bae et al. distinguish between different connector types, we transform our models V0, . . . , V7 into their graph representation using method 1 from Fig. 4. Therefore, information about connector types gets lost r... |

112 |
C.S.: A vector space model for automatic indexing.
- Salton, Wong, et al.
- 1975
(Show Context)
Citation Context ...ce of 6 must be preceded by the occurrence of one activity from the set {1, 9}; the presence of “9” is in fact irrelevant, but allowed according to the definition). Analogously, the set Lla of look-ahead links is defined such that (A×℘(A)) ⊇ (a, S) ∈ Lla if and only if each occurrence of activity a in a trace of the PM must be followed by an occurrence of an activity that is contained in the set S. For measuring the similarity of two PMs, a similarity measure for their causal footprints is calculated. For this purpose, the causal footprints are regarded as documents in a document vector space [45], a concept that is widely used in the field of information filtering and information retrieval. Causal footprints (the “documents”) are represented as vectors of index terms. Let’s assume that we have to calculate the similarity between the causal footprints of two models M0 = (N0, E0) and M1 = (N1, E1) whose sets of look-ahead links be L M0 la and LM1la and whose sets of look-back links be L M0 lb and L M1 lb . The set of index terms is defined as Θ = N0 ∪N1 ∪LM0la ∪L M1 la ∪L M0 lb ∪L M1 lb , i.e. Θ contains all nodes as well as all look-ahead and look-back link of both M0 and M1. Let λ : Θ... |

105 |
A survey of longest common subsequence algorithms.
- BERGROTH, HAKONEN, et al.
- 2000
(Show Context)
Citation Context ...ower model PM1 of Fig. 1 contains exactly one trace, 〈n0, n1, n2, n3, n4, n5〉. Traces of process models can be represented as a string, i.e. a sequence of symbols. Based on this representation, it is possible to calculate the longest common subsequence. The longest common subsequence of two strings is a subsequence of both strings that contains the maximum number of symbols (preserving the symbol order). For example for the strings “123456” and “1x2y3z”, the longest common subsequence is “123”. A more formal definition and algorithms for calculating longest common subsequences can be found in [9]. We denote the length of the longest common subsequence of traces σ1 and σ2 as len(lcs(σ1, σ2)). 2.3 Mapping Function In most approaches, the algorithm for calculating similarity measures starts with establishing a mapping between the nodes in M0 and M1. Such a mapping describes which activity in M1 “corresponds” to an activity in M0. Formally, a mapping is described by a partial function that assigns nodes of M0 = (N0, E0) to the “corresponding” nodes of M1 = (N1, E1). Throughout this article, we will denote this mapping function with map : N0 → N1. In many approaches, only the activities ar... |

92 | The refined process structure tree
- Vanhatalo, Völzer, et al.
- 2008
(Show Context)
Citation Context ... 1 and sim(V0, V2) = 1, too as can be seen in Tab. 15. By taking connectors into account, the similarity between V0 and V1 and V2 respectively would be slightly lower. Discussion: The approach described in [33] ignores loops in a PM which is a severe limitation. We cannot agree to the statement made in [33] that “cycles are not used in the distance measure because the cycle does not affect the structure of a process”. Additional research would be necessary on extending tree-based similarity measures to the more general case of expressing PMs with loops as trees (preliminaries are discussed in [35, 36]). VI. Edit Distance Between Reduced Models Facilitating queries on process model repositories is in the focus of the approach of Lu and Sadiq in [37]. A query is represented as a partial process model having the desired process structure, e.g. the order of activities. Given a query model M0 = (N0, E0) and a process model M1 = (N1, E1) with the sets A0 ⊆ N0 and A1 ⊆ N1 of activities, the mapping map : A0 → A1 is established by label equivalence. The approach is limited by the assumption that A0 ⊆ A1. Similar processes can be in either of two relations with each other. M1 is equivalent to M0 wh... |

69 | Semantics-based code search
- Reiss
(Show Context)
Citation Context ...When business units of different organisations are consolidated, it can be assumed that process overlaps exist. For example, [15] reported of an organisation having several subsidiaries where every subsidiary managed its own ERP system resulting in more than 200,000 PMs. During integration it is necessary to integrate these systems and to identify process overlaps. C: Facilitate reuse A cross-sectional goal that can be achieved by targeting various other application areas for similarity calculations is to facilitate reuse of PMs. Similar to reusing components in software engineering (see e.g. [16] for code reuse), reusing PMs promises to reduce time and costs. Therefore, it is necessary to find existing PMs and reuse them in the right context. D: Manage PM repositories Due to the vast amount of existing PMs, organisations usually store these models in process repositories. These repositories provide various functions, such as adding and removing models, annotating models and searching for models [3]. Before new models can be added to a repository, it is useful to check whether similar or even identical models are already stored in the repository. Furthermore, repositories are useless w... |

64 | Graph matching algorithms for business process model similarity search, in:
- Dijkman, Dumas, et al.
- 2009
(Show Context)
Citation Context ...ch presented in [27] are very close to each other. This is due to the fact, that the approach allows an n:m mapping of edges based on the simple comparison of their connected nodes. However, the decisive disadvantage of the approach from [27] is that it does not calculate a similarity of 1 for equivalent models. This drawback results from the multiplication of weights (which is 1 only for sequential activities). Accordingly, this approach ranks model V5 as very similar, since all the weights in this model are 1. 6.2 Edit Distance Between Graphs I. Graph Edit Distance Similarity Dijkman et al. [4, 28] try to capture structural similarity as follows: As described in Sect. 6.1.II, they derive a mapping function map from a function corr that measures the similarities between nodes in A0 and nodes in A1. The nodes in a0 ∈ A0 for which map(a) is not defined and the nodes a1 ∈ A1 for which there is no a0 ∈ A0 such that map(a0) = a1 are regarded as “inserted or deleted nodes” (because they appear in one model and not in the other one). Similarly, an edge (x, y) ∈ E0 is called “inserted or deleted edge” if either map(x) or map(y) is undefined or (map(x),map(y)) /∈ E1. Inserted or deleted edges in ... |

59 | Measuring Similarity between Semantic Business Process Models,” In
- Ehrig, Koschmider, et al.
- 2007
(Show Context)
Citation Context ... or 1. The resulting similarity values are shown in Tab. 4. Discussion: As can be seen from Tab. 4, the similarity values for the approach proposed by [4] match with the results established by [22] (Sect. 6.1.I) This is due to the fact that the measure of [4] also does not take into account any information about the order of nodes. For example, sim(V0, V5) would be 1 while sim(V0, V3) would be 18 21 < 1 only – a result that is counter-intuitive if we are interested in the similarity of the modelled behaviour. III. Syntactic, Linguistic and Structural Similarity of Activity Labels Ehrig et al. [6] measure the similarity of PMs based on so-called semantic business process models - predicate transition Petri nets transformed in an ontological representation. Let A0 = { a01, a 0 2, . . . a 0 #A0 } and A1 = { a11, a 1 2, . . . a 1 #A1 } be the sets of activities of the models. Ehrig et al. describe three measures for similarity between the label of an activity a0 ∈ A0 and the label of an activity a1 ∈ A1. First, the Levenshtein string edit distance [23] is used to calculate the syntactic similarity. Second, linguistic similarity handles synonyms in element labels using WordNet [24] and tak... |

59 | The ProM framework: A New Era in Process Mining Tool Support.
- Dongen, Medeiros, et al.
- 2005
(Show Context)
Citation Context ... on the set of traces (see Sect. 2.2) in Sect. 6.4. Every presentation is enriched by a table containing information about the adherence to the properties in Sect. 3 and the absolute similarity values for the similarity between model V0 and the models V1 . . . V7 from Sect. 5. Furthermore, we give a brief explanation of the parameters and (if necessary) adaptions used in our calculation of the similarity values and discuss each measure. To enhance the reproducibility of our findings we developed a publicly available1 application. It is based on the well-known ProM framework for process mining [21] and provides an extensible API. Currently, 15 of the presented measures are implemented, and we will add missing measures in the future. Using 1 https://sourceforge.net/projects/prom-similarity/ this application, it is possible to analyse the impact of various parameters when calculating similarity (e.g. size of models, amount of text in models). The source code contains detailed comments on the parameters and strategies for those measures whose original description allows some degree of freedom in the implementation. 6.1 Correspondence Between Nodes and Edges in the PM I. Similarity Score Ba... |

58 | Similarity of business process models: Metrics and evaluation,
- Dijkman, Dumas, et al.
- 2011
(Show Context)
Citation Context ...or drawback is that changing the structure of processes (e.g. changing the order of activities or inserting connectors) does not influence the similarity measure in any way. Furthermore, its application is limited to a domain with controlled vocabulary (as the authors of [22] state, too). The approach fails both in multilingual and in inter-organisational environments due to different vocabularies. In Sect 6.1.III we discuss a similar approach avoids the restriction of activity labels to a controlled vocabulary. II. Label Matching Similarity Dijkman et al. study several similarity measures in [4]. The first and simplest one is called label matching similarity. It builds on a function corr that calculates a label-based similarity score between nodes in A0 and A1. The mapping function map : M0 →M1 is defined such that∑ x∈A0 corr(x,map(x)) takes its maximum value. Optionally, a threshold can be used that disregards a similarity score if corr(x, y) is smaller than a given value. This means that instead of corr, the function corr′(x, y) = { corr(x, y) if corr(x, y) ≥ threshold, 0 otherwise is used. The label matching similarity is defined as sim(M0,M1) = 2 ∑ x∈A0 corr(x,map(x)) #A0 + #A1 .... |

57 | Measuring similarity between business process models.
- Dongen, Dijkman, et al.
- 2008
(Show Context)
Citation Context ...dex terms. Let’s assume that we have to calculate the similarity between the causal footprints of two models M0 = (N0, E0) and M1 = (N1, E1) whose sets of look-ahead links be L M0 la and LM1la and whose sets of look-back links be L M0 lb and L M1 lb . The set of index terms is defined as Θ = N0 ∪N1 ∪LM0la ∪L M1 la ∪L M0 lb ∪L M1 lb , i.e. Θ contains all nodes as well as all look-ahead and look-back link of both M0 and M1. Let λ : Θ → N be an indexing function that assigns a running number to each index term. The model Mi(i ∈ {0, 1}) is represented as a vector gi = (gi1, gi2, . . . , gi#Θ). In [46], its coordinates are defined as: giλ(t) = 0 if t /∈ Ni ∪ LMila ∪ L Mi lb 1 if t ∈ Ni 1 2len(t)−1 if t ∈ L Mi la ∪ L Mi lb where len(t) is the number of set elements in the look-ahead or look-back link. For example, len(({1}, 12)) = #{1} = 1 and len(({9, 10, 11}, 12)) = #{9, 10, 11} = 3. This way, a greater weight is given to the look-back link ({1}, 12), following the rationale that links with fewer activities in the set are more informative and therefore more important for the comparison. The similarity of M0 and M1 is calculated as the cosine of the angle between the corresponding vecto... |

49 | On measuring process model similarity based on high-level change operations. In:
- Li, Reichert, et al.
- 2008
(Show Context)
Citation Context ...and Weske show that their distance measure fulfills Properties 1-4, i.e. it is a metric. This allows storing a set of models in a repository organised as a metric tree. For searching a model that is similar to a given query model, it is not necessary to compare the query model with each model from the repository. The main benefit from the paper by Kunze and Weske is their description of the indexing approach based on metric trees which leads to a remarkable improvement of the search for similar models within a model repository. IV. Graph-Edit Distance by High-Level Change Operations Li et al. [31] present an approach to calculate similarity between process models based on so-called high level change operations. They identify different types of high level change operations such as inserting an activity between existing activities, deleting an activity from the model, moving an activity from its original position to another one, and replacing an activity. A high-level change operation encapsulates a number of primitive graph-based operations (deleting an edge, inserting a node, etc.). The authors state that by constructing a PM using high level change operations only, it can be guarantee... |

39 |
Efficient consistency measurement based on behavioural profiles of process models.
- Weidlich, Mendling, et al.
- 2010
(Show Context)
Citation Context ...on for such results lies in the fact that the TAR relation contains only information about direct precedence. It should be appealing to include information about the transitive closure of TAR into the calculation of similarity measures. Instead of analysing information such as “activity AI can be followed directly by activity AII”, we would also take into account information such as: “After executing activity AI , it will be possible to execute AII later”. Approaches which use this kind of information are discussed in the following subsections. III. Causal Behavioural Profiles Weidlich et al. [43, 44] capture the behaviour of a PM by examining dependencies between the execution of an activity AI and the execution of activity AII . Such dependencies are expressed by means of four relations: – AI and AII are in strict order relation, if and only if it is possible that AI is executed before AII is executed, but it is not possible that AII is executed before AI is executed (i.e., there is a trace 〈. . . AI . . . AII . . . 〉 but no trace 〈. . . AII . . . AI . . . 〉 – AI and AII are in exclusiveness relation, if and only if it is not possible that both AI and AII are executed in the same process... |

32 | The ICoP framework: Identification of correspondences between process models. In:
- Weidlich, Dijkman, et al.
- 2010
(Show Context)
Citation Context ... 1.00 1.00 Calculation Settings and Results: Tab. 5 contains the similarity values for our variants. We defined map such that activities having the same label are mapped to each other. Since the activity labels of the variants shown in Fig. 3 are singleletter words, there is (with the exception of variant V3) always a pair of activities with a syntactic similarity of 1. This results in the fact that linguistic and structural similarity is not taken into account. By using a non-injective mapping, sim(V0, V3) would be higher because we could establish a mapping map such that map(1) = (1, A), map(5) = (B, 5), map(6) = (6, C), and map(9) = (D, 9). Discussion: The approach of Ehrig et al. focuses to a great extent on the similarity of activity labels. Structural similarity is only taken into account if labels are not equal or if they are not synonyms of each other. Resulting from this, PMs with the same activity names will always have similarity 1 independently from any structural changes. Following equation 2, the approach of Ehrig et al. maps the activities of A0 to activities of A1 and ignores activities in A1 that are not contained in A0. This results in a similarity of 1 when comparin... |

30 | On managing process variants as an information resource.
- Lu, Sadiq
- 2006
(Show Context)
Citation Context ...A0 ⊆ N0 and A1 ⊆ N1 of activities, the mapping map : A0 → A1 is established by label equivalence. The approach is limited by the assumption that A0 ⊆ A1. Similar processes can be in either of two relations with each other. M1 is equivalent to M0 when A0 = A1 and E0 = E1. M1 is subsumed by M0 when A0 ⊆ A1 and the order of activities in M0 is preserved in M1. If models are not in any of those relations, they are not regarded as similar to each other. To identify the relations it is attempted to transform M1 into the query graph M0 using graph reduction rules. The reduction rules can be found in [38]. Simply stated, the reduction first removes activities from M1 that are not contained in the query graph M0. After these activities are removed, edges that are not required for statements about the order of activities are removed, too. Eventually, M1 will be transformed into a reduced model M red 1 = (N red 1 , E red 1 ). A similarity metric is defined as sim(M0,M1) = #Ered1 ∩E0 #Ered1 . Table 16. Adherence to properties and similarity values of [37] Edit Distance Between Reduced Models [37] Adherence to property . . . 1–yes 2–no 3–no 3a–no 5–yes 6–no 7–yes 8–yes Similarity between V0 and . .... |

26 | Representation and structure-based similarity assessment for agile workflows.
- Minor, Tartakovski, et al.
- 2007
(Show Context)
Citation Context ... models (with the meaning of “similar behaviour”), but rather “related” models. Given the fact that they present a fast implementation of their algorithm, the benefit of the approach is that it can be used as a first step to filter potentially relevant models from a large model collection. In a second step, it is possible to calculate similarity measures only for those models from the collection that have not been disregarded as irrelevant. V. Percentage of Common Nodes and Edges in the Graph Similar to the label matching similarity measures discussed in the previous subsections, Minor et al. [26] suggest a measure that relates the number of nodes and edges that can be found in both M0 = (N0, E0) and M1 = (N1, E1) to the overall number of nodes and edges in both models. As the purpose of the work by Minor et al. is to compare models that are adapted versions of a “template” model, it can be assumed that two nodes can be regarded as identical if and only if they have exactly the same label. This means that the Function map is simply the identity. For a strictly sequential PM (i.e. one which has only activities, but no split or join nodes), the similarity measure is defined as: sim(x, y)... |

24 |
Reconstructing the giant: On the importance of rigour in documenting the literature search process. In:
- Brocke, Simons, et al.
- 2013
(Show Context)
Citation Context ...sure is 0 only if the models have the same set of traces (up to mapping by map) 4 Distance measure fulfills the triangle inequality 5 Distance measure considers both commonalities as differences 6 Distance measure takes similarity measure between activities into account 7 Distance measure is defined for arbitrary process models 8 Distance measure can be computed efficiently 4 Literature Research 4.1 Methods of the Literature Research The findings we present in this article are based on an extensive literature review conducted between March 2010 and May 2011. According to the taxonomy given in [14], our review can be classified as follows: – Scope: state-of-the-art presentation concerning approaches to calculate PM similarity – Focus: comprehend research methods and technologies – Goal: summarise findings – Organisation: conceptual using the different types of similarity measures established in Sect. 6.1 to 6.5 – Perspective: neutral – Audience: specialised scholars can identify similarities and differences between existing approaches to calculate PM similarity – Coverage: The central starting points for our survey were the digital libraries of ACM, IEEE, Springer, and Elsevier (SciVers... |

24 | A.J.M.M.: Quantifying process equivalence based on observed behavior.
- Medeiros, Aalst, et al.
- 2008
(Show Context)
Citation Context ...0 V1 V2 V3 V4 V5 V6 V7 1.00 1.00 0.83 0.61 0.84 0.20 0.85 0.83 Calculation Settings and Results: The values for the similarity between V0 and the rest of our example variants according to [50] is shown in Tab. 24. Discussion: The approach from Wang et al. allows to calculate a similarity measure based on sets of traces even between PMs with an infinite sets of traces. However, the generation of the sub-traces to compare still requires a symbolic exploration of the sets of traces, i.e. Property 8 is not fulfilled. III. Similarity of process models based on observed behaviour De Medeiros et al. [51] present a method to calculate the similarity of PMs that is based on comparing traces obtained from actual process executions or by simulation. They point out that comparing the sets of traces directly leads to problems when a set of traces becomes infinite and that such a comparison would not take into account the real world application of processes in practice where certain traces occur more frequently than others. It could be added that dealing with the whole sets of traces could become computationally inefficient if the sets of traces are very large. De Medeiros et al. define a log L as a... |

23 | A Classification of Differences between Similar Business Processes. - Dijkman - 2007 |

21 |
Business process reference models: Survey and classification.
- Fettke, Loos, et al.
- 2006
(Show Context)
Citation Context ...ies are useless without efficient querying (provided by similarity searches) and browsing facilities. E: Automate Process Execution Automation is usually concerned in SOA applications. During execution, services may be called depending on user requirements established at runtime. Furthermore, existing services may fail, e.g. due to a computer failure. In this case, it may be necessary to find similar services that are able to provide the same or similar functionality. F: Assure Compliance with normative models Reference models are a common approach to improve the process of developing new PMs [17]. Based on a given reference process, application specific processes can be established. Reference models often contain necessary legal requirements for specific domains. Therefore, it is often necessary to measure the compliance degree between a given reference models and its application specific implementation. G: Discover Services Closely connected to the goal of automation is service discovery. In SOA applications, one common task is to search for services satisfying specific user requirements. If this task can be automated it is possible to call services dynamically and to make reuse of e... |

21 |
On the Discovery of Preferred Work Practice Through Business Process Variants,
- Sadiq
- 2007
(Show Context)
Citation Context ...ightly lower. Discussion: The approach described in [33] ignores loops in a PM which is a severe limitation. We cannot agree to the statement made in [33] that “cycles are not used in the distance measure because the cycle does not affect the structure of a process”. Additional research would be necessary on extending tree-based similarity measures to the more general case of expressing PMs with loops as trees (preliminaries are discussed in [35, 36]). VI. Edit Distance Between Reduced Models Facilitating queries on process model repositories is in the focus of the approach of Lu and Sadiq in [37]. A query is represented as a partial process model having the desired process structure, e.g. the order of activities. Given a query model M0 = (N0, E0) and a process model M1 = (N1, E1) with the sets A0 ⊆ N0 and A1 ⊆ N1 of activities, the mapping map : A0 → A1 is established by label equivalence. The approach is limited by the assumption that A0 ⊆ A1. Similar processes can be in either of two relations with each other. M1 is equivalent to M0 when A0 = A1 and E0 = E1. M1 is subsumed by M0 when A0 ⊆ A1 and the order of activities in M0 is preserved in M1. If models are not in any of those rela... |

21 | Automatic control of workflow processes using eca rules.
- Bae, Bae, et al.
- 2004
(Show Context)
Citation Context ... 0.97 0.82 Calculation Settings and Results: The results of the similarity calculations and the adherence to the properties given in Sect. 3 for the approach presented in [39] is shown in Tab. 17. As stated in the description of the measure, dependency graphs do not distinguish between different connector types. Thus, to calculate similarity, we transform a given PM into its graph representation using approximation method 1 from Fig. 4. To automatically establish the execution probabilities of activities for the approach of Jung et al., we use the so-called branch-water algorithm presented in [42]. As described above, activities following an OR-split are assigned with a probability of 12 , activities following an AND-split with a probability of 1, and activities following an XOR-split with a probability of 1n where n is the amount of outgoing arcs. Since Jung et al. do not describe how cycles should be handled, we remove these cycles from our PMs (as this is a necessary requirement for the branch-water algorithm). On the assumption that processes do not run into deadlocks, cycles have no influence on the execution probabilities of individual activities. These preparations result in the... |

20 | Similarity search of business process models.
- Dumas, Garcıa-Banuelos, et al.
- 2009
(Show Context)
Citation Context ...e some recommendations which type of measure is useful for which kind of application. 1 Introduction Business process models, or just process models (PMs), are nowadays a common approach to analyse existing business processes and to create new processes in a structured way. They are used for purposes like supporting communication in organisations, documentation in projects, and training of employees [1]. This wide area of application has led to the existence of a tremendous amount of PMs. Large scale enterprises often own process repositories consisting of hundreds or even thousands of models [2], usually developed by different persons. A variety of techniques to manage these repositories are conceivable. They range from intelligent process repositories [3] to similarity search over the models. So far, several approaches that follow the latter idea have been proposed. They aim to find PMs in a PM repository that are similar to a given query model. For this purpose, there is a need of a similarity measure that quantifies the similarity between models. The goal of our article is to provide a comprehensive survey on techniques to define and calculate similarity measures between PMs. Furt... |

19 | Evaluation of workflow similarity measures in service discovery. In: Service Oriented Electronic Commerce:
- Wombacher, Rozie
- 2006
(Show Context)
Citation Context ...ld be set for Lla. Another, less severe, disadvantage lies in the fact that in the published algorithm for calculating causal footprints OR-connectors are dealt with in the same way as XOR-connectors. Hence, the change between model variants V0 and V2 will remain undetected; sim(V0, V2) is 1. A possible solution of this problem would be to consider other types of look-ahead/look-back links such as L′la as the set of all pairs (a, S) such that every execution of activity a can be followed by a state where all activities in S are running in parallel. V. String Edit Distance of Sets of Traces In [47] Wombacher and Rozie compare several approaches to calculate the similarity of process models based on a comparison of their sets of traces. First, they analyse the Levenshtein string edit distance [23] between traces. However, a set of traces of process models with loops is infinite. Thus, this simple idea is not applicable. To handle infinite traces as well, [47] presents a second approach based on n-grams. These n-grams are defined as sub-traces of length n. For example, possible traces from process variant V0 are 〈1, 2, 3, 4, 5, 6, 7, 8, 9〉 and 〈1, 2, 3, 4, 5, 6, 5, 6, 5, 6, 5, 6, 7, 8, 9〉... |

18 | On the challenges of business modeling in large-scale reengineering projects.
- Gulla, Brasethvik
- 2000
(Show Context)
Citation Context ...ve from a theoretical point of view and analysed how these properties are fulfilled by the different measures. Our results show that there are remarkable differences among existing measures. We give some recommendations which type of measure is useful for which kind of application. 1 Introduction Business process models, or just process models (PMs), are nowadays a common approach to analyse existing business processes and to create new processes in a structured way. They are used for purposes like supporting communication in organisations, documentation in projects, and training of employees [1]. This wide area of application has led to the existence of a tremendous amount of PMs. Large scale enterprises often own process repositories consisting of hundreds or even thousands of models [2], usually developed by different persons. A variety of techniques to manage these repositories are conceivable. They range from intelligent process repositories [3] to similarity search over the models. So far, several approaches that follow the latter idea have been proposed. They aim to find PMs in a PM repository that are similar to a given query model. For this purpose, there is a need of a simil... |

15 |
A workflow net similarity measure based on transition adjacency relations.
- Zha, Wang, et al.
- 2010
(Show Context)
Citation Context ...ed that are all more or less similar to each other. To simplify management and facilitate reuse of these process variants, it is necessary to establish a measure that captures their similarity. Therefore, it is possible to react on new user requirements by searching for process variants that satisfied similar requirements in the past. B: Merge processes Merging PMs is a common activity executed in the case of company mergers and in collaborations beyond company borders. When business units of different organisations are consolidated, it can be assumed that process overlaps exist. For example, [15] reported of an organisation having several subsidiaries where every subsidiary managed its own ERP system resulting in more than 200,000 PMs. During integration it is necessary to integrate these systems and to identify process overlaps. C: Facilitate reuse A cross-sectional goal that can be achieved by targeting various other application areas for similarity calculations is to facilitate reuse of PMs. Similar to reusing components in software engineering (see e.g. [16] for code reuse), reusing PMs promises to reduce time and costs. Therefore, it is necessary to find existing PMs and reuse th... |

15 | Efficient Computation of Causal Behavioural Profiles Using Structural Decomposition, in:
- Weidlich, Polyvyanyy, et al.
- 2010
(Show Context)
Citation Context ...on for such results lies in the fact that the TAR relation contains only information about direct precedence. It should be appealing to include information about the transitive closure of TAR into the calculation of similarity measures. Instead of analysing information such as “activity AI can be followed directly by activity AII”, we would also take into account information such as: “After executing activity AI , it will be possible to execute AII later”. Approaches which use this kind of information are discussed in the following subsections. III. Causal Behavioural Profiles Weidlich et al. [43, 44] capture the behaviour of a PM by examining dependencies between the execution of an activity AI and the execution of activity AII . Such dependencies are expressed by means of four relations: – AI and AII are in strict order relation, if and only if it is possible that AI is executed before AII is executed, but it is not possible that AII is executed before AI is executed (i.e., there is a trace 〈. . . AI . . . AII . . . 〉 but no trace 〈. . . AII . . . AI . . . 〉 – AI and AII are in exclusiveness relation, if and only if it is not possible that both AI and AII are executed in the same process... |

14 | W.B.: Process mining, discovery, and integration using distance measures.
- Bae, Liu, et al.
- 2006
(Show Context)
Citation Context ...tivities are removed during transformation. Tab. 16 also shows that models V3 and V4 get a similarity score of 1 when compared to model V0. This is straightforward, since these variants do not change the order of activities. This behaviour is motivated by the goal of the measure as it is applied during search in process repositories. However, a drawback that cannot be ignored is that the measure of Lu and Sadiq does not take the similarity of activities into account and only counts the amount of common edges. 6.3 Causal Dependencies Between Activities I. Dependency Graph Comparison Bae et al. [39, 40] build a so-called “dependency graph” for a PM. The activities of the PM become the nodes in the dependency graph. In the dependency graph, there is an arc between two activities if one activity directly depends on data that have to be produced by another activity, i.e. if one activity is the direct predecessor of another one. For the dependency graph, it does not make a difference which type of connector (AND, XOR, inclusive OR) is located between activities. As an example, the dependency graph of V0 (which coincides with the dependency graphs of V1 and V2) is shown as the topmost graph in Fi... |

13 |
Fast business process similarity search with feature-based similarity estimation. In: On the Move to Meaningful Internet Systems -
- Yan, Dijkman, et al.
- 2010
(Show Context)
Citation Context ...1 when comparing V0 with V3 (activities A, B, C, and D are ignored). On the other hand, sim(V3, V0) is not 1 but 9/13 because nine activities are matched with a similarity of 1 and the additional four activities are not matched and have a similarity of 0 (assuming a one-to-one-mapping). This means that Property 2 is violated – the measure is not symmetric. We believe the value of [6] lies more in a discussion of concepts to define the mapping function map between activities than in the definition of similarity measures between PMs as a whole. IV. Feature-Based Similarity Estimation Yan et al. [25] address the problem of searching a collection of PMs for models that are similar to a query model. They point out that it is inefficient to compare each model in the collection with the query model. As a solution, Yan et al. suggest to build computational efficient indices for quickly finding models that have many features in common with the query model. In this procedure, features are defined as activity labels as well as the position that a node has within the structure of the PM graph. First, the Levenshtein string edit distance [23] is used for computing the similarity of activity labels.... |

12 |
Measuring the compliance of processes with reference models. In:
- Gerke, Cardoso, et al.
- 2009
(Show Context)
Citation Context ..., 1 (map(a1) = a2 xor map(b1) = b2), or 2 (neither a1 nor b1 are mapped to a2 or b2). Discussion: Using a bigram representation of process models is similar to the TAR-approach (see Sect. 6.3.II). As stated there, it takes only information about direct precedence of activities into account. Therefore, the similarity values for (V0, V2) and (V0, V3) are very low in comparison with the other values shown in Tab. 22. 6.4 Approaches Based on the Sets of Traces I. Longest Common Subsequence of Traces To calculate compliance and maturity of an actual process model to a reference model, Gerke et al. [49] compare the sets of traces of both models. In this context, compliance is the extent to which a process model adheres to ordering rules of activities (e.g. activity A must always be executed before activity B). Maturity measures to what extent the process model recalls activities of the reference model. In order to avoid problems with infinite traces and infinite sets of traces, it is assumed that the possible executions are restricted by the constraint that there is a maximum number of possible repetitions of each loop in a model. Gerke et al. use a non-injective mapping function map : N0 → ... |

11 |
Ranking BPEL processes for service discovery.
- Grigori, Corrales, et al.
- 2010
(Show Context)
Citation Context ...gously. With sn being the set of inserted or deleted nodes and se being the set of inserted or deleted edges, Dijkman et al. define a graph edit distance as: dist(M0,M1) = #sn+ #se+ 2 ∑ a∈A0,map(a) is defined corr(a,map(a)). By dividing the terms in the above sum by the total numbers of nodes, arcs and nodes that are not inserted or deleted nodes resp., three quotients can be derived. A similarity measure called graph edit distance similarity is calculated as the weighted average of these three quotients. The idea of a graph-edit distance is also used for comparing processes by Grigori et al. [29]. They use a distance measure for searching a service repository for services that match a given query. The basic ideas for the measure are the same as described above; but two remarkable differences should be noted: First, Grigori et al. do not relate the number of change operations to the graph size and thus violate Property 5. And second, the approach supports a noninjective mapping function map which is helpful when models on different levels of abstraction have to be compared. Table 10. Adherence to properties and similarity values of [4] Graph Edit Distance [4] Adherence to property . . ... |

11 | Federal Information Processing Standards Publication 183 : Integration Definition for Function Modeling - Draft - 1993 |

9 | Business process model repositories - framework and survey. Working Papers 292, Technische Universiteit Eindhoven,
- Yan, Dijkman, et al.
- 2009
(Show Context)
Citation Context ...days a common approach to analyse existing business processes and to create new processes in a structured way. They are used for purposes like supporting communication in organisations, documentation in projects, and training of employees [1]. This wide area of application has led to the existence of a tremendous amount of PMs. Large scale enterprises often own process repositories consisting of hundreds or even thousands of models [2], usually developed by different persons. A variety of techniques to manage these repositories are conceivable. They range from intelligent process repositories [3] to similarity search over the models. So far, several approaches that follow the latter idea have been proposed. They aim to find PMs in a PM repository that are similar to a given query model. For this purpose, there is a need of a similarity measure that quantifies the similarity between models. The goal of our article is to provide a comprehensive survey on techniques to define and calculate similarity measures between PMs. Furthermore, we will study the question how the different measures rank “similarity” within the same set of PMs. In our study, we investigated, how different kinds of c... |

9 | Vertical alignment of process models - how can we get there? In: - Weidlich, Barros, et al. - 2009 |

8 | Development of distance measures for process mining, discovery and integration.
- Bae, Liu, et al.
- 2007
(Show Context)
Citation Context ...tivities are removed during transformation. Tab. 16 also shows that models V3 and V4 get a similarity score of 1 when compared to model V0. This is straightforward, since these variants do not change the order of activities. This behaviour is motivated by the goal of the measure as it is applied during search in process repositories. However, a drawback that cannot be ignored is that the measure of Lu and Sadiq does not take the similarity of activities into account and only counts the amount of common edges. 6.3 Causal Dependencies Between Activities I. Dependency Graph Comparison Bae et al. [39, 40] build a so-called “dependency graph” for a PM. The activities of the PM become the nodes in the dependency graph. In the dependency graph, there is an arc between two activities if one activity directly depends on data that have to be produced by another activity, i.e. if one activity is the direct predecessor of another one. For the dependency graph, it does not make a difference which type of connector (AND, XOR, inclusive OR) is located between activities. As an example, the dependency graph of V0 (which coincides with the dependency graphs of V1 and V2) is shown as the topmost graph in Fi... |

5 |
An algorithm for calculating process similarity to cluster open-source process designs. In: Grid and Cooperative Computing
- Huang, Zhou, et al.
- 2004
(Show Context)
Citation Context ... along a path from one activity to another as attributes to the arcs between activities. Method 3 uses exactly one “artificial” node per type of split- or join node, regardless of the number of occurrences of this kind of split- or join node in the PM. Fig. 4 illustrates the three methods of transforming a PM into its approximation graph for our example model V0. After transforming the original PM, Equation 3 is used for comparing the approximation graphs. Fig. 4. Three methods to transform model M0 from Fig. 3(a) into its approximation graph A similar approach is presented by Huang et al. in [27]. First, the function corr is calculated; Huang et al. do not suggest any specific corr-measure. Let A0 = { a01, a 0 2, . . . a 0 #A0 } and A1 = { a11, a 1 2, . . . a 1 #A1 } be the sets of activities of the models to compare and E0 = { e01, e 0 2, . . . e 0 #E0 } and E1 = { e11, e 1 2, . . . e 1 #E1 } the sets of its edges. The overall similarity between the activity sets A0 and A1 is then defined as: #A0∑ i=1 max j=1,...,#A1 (corr(a0i , a 1 j )) + #A1∑ j=1 max i=1,...,#A0 (corr(a0i , a 1 j )) #A0 + #A1 (4) Second, the PMs to compare are transformed into a weighted graph representation simila... |

5 | Process mining by measuring process block similarity. In:
- Bae, Caverlee, et al.
- 2006
(Show Context)
Citation Context ...tarting from an empty model and repeatedly applying high-level change operations. In this context, the question about the difference between model variants V0 and V1 is not relevant, because the construction algorithm would ensure that only one of the models would occur in practice. A remarkable property of this measure is that when calculating sim(V0, V6) and sim(V0, V7), we have to regard only one high-level change operation. This is different from other similarity measures based on graph edit distances that we have discussed so far. V. Tree Edit Distance Between PMs Represented as Trees In [33], Bae et al. transform a PM into an ordered tree. A sequential PM (without any splits and joins) would become a tree of depth one; all activities would be leafs that are children of the root node. A split node in the PM would correspond to a node in the tree which is parent of several subtrees which correspond to the outgoing arcs of this split. After translating a PM into a tree this way, algorithms for comparing trees [34] are used. Table 15. Adherence to properties and similarity values of [33] Tree Edit Distance Between PMs Represented as Trees [33] Adherence to property . . . 1–yes 2–yes ... |

5 |
Visualization and clustering of business process collections based on process metric values. Symbolic and Numeric Algorithms for Scientific Computing, International Symposium on
- Melcher, Seese
- 2008
(Show Context)
Citation Context ...hat the approach is limited to comparing two models with respect to a given log. For example, if subparts of a model are never executed, they are not in the log and therefore, differences in these subparts can not be identified. For this reason, the authors emphasise that the logs must reflect typical behaviours of the models. 6.5 Similarity of Structural Complexity Metrics In this subsection, we shortly mention another approach for defining similarity between PMs that will not be discussed in detail, because it follows a different understanding of the concept of similarity. Melcher and Seese [52] aim to find structurally similar PMs within model collections by comparing the values of several complexity metrics for the PMs. The models are clustered such that PMs with similar metrics values can be identified. While this could be useful for gaining insights into the distribution of metric values, it is not possible to draw conclusions about behavioural similarity or relatedness among PMs. For this reason, the approach will not be discussed here further. 7 Discussion Tab. 26 shows the similarity values we have computed between our example model V0 and its variants V1 . . . V7. For measure... |

4 |
Discovering business process similarities: An empirical study with sap best practice business processes.
- Akkiraju, Ivan
- 2010
(Show Context)
Citation Context ...of the presented measures are implemented, and we will add missing measures in the future. Using 1 https://sourceforge.net/projects/prom-similarity/ this application, it is possible to analyse the impact of various parameters when calculating similarity (e.g. size of models, amount of text in models). The source code contains detailed comments on the parameters and strategies for those measures whose original description allows some degree of freedom in the implementation. 6.1 Correspondence Between Nodes and Edges in the PM I. Similarity Score Based on Common Activity Names Akkiraju and Ivan [22] measure similarity of process models solely based on the number of equally labelled activities, i.e. on the number of activities that occur in both models. The so called semantic similarity score between model M0 with the set of activities A0 and model M1 with the set of activities A1 is defined as sim(M0,M1) = 2 · #(A0∩A1)#A0+#A1 . Any two of the example models Vi (with the only exception V3) have a similarity score of 2 · 99+9 = 1 irrespective of the calculation order. Table 3. Adherence to properties and similarity values of [22] Similarity Score Based on Common Activity Names [22] Adheren... |

4 | Process-annotated service discovery facilitated by an n-gram-based index. In:
- Mahleko, Wombacher, et al.
- 2005
(Show Context)
Citation Context ...ents a second approach based on n-grams. These n-grams are defined as sub-traces of length n. For example, possible traces from process variant V0 are 〈1, 2, 3, 4, 5, 6, 7, 8, 9〉 and 〈1, 2, 3, 4, 5, 6, 5, 6, 5, 6, 5, 6, 7, 8, 9〉. In a trace of V0, activities 5 and 6 can be repeated arbitrarily often. A bigram representation of the traces combines tuples of pairs and is {∗1, 12, 23, 34, 56, 65, 67, 78, 89, 9∗} where ∗ symbolises the start and end of a trace respectively. From the example, we can see that even infinite traces introduced by cycles can be represented using a finite set of n-grams [48]. Analogously to the simple approach, the distance between processes is calculated using the string edit distance. But instead of analysing specific traces only their n-gram-representation is taken into account. Table 22. Adherence to properties and similarity values of [47] Sets of Traces as n-grams [47] Adherence to property . . . 1–yes 2–no 3–no 3a–no 5–no 6–no 7–yes 8–no Similarity between V0 and . . . V0 V1 V2 V3 V4 V5 V6 V7 1.00 1.00 0.06 0.05 0.33 0.06 0.17 0.14 Calculation Settings and Results: Wombacher and Rozie do not give any information on how to calculate the edit distance. There... |

3 |
Metric trees for efficient similarity search in process model repositories. In:
- Kunze, Weske
- 2010
(Show Context)
Citation Context ...lity between sets of traces, Property 3 can be substituted by the less strict requirement: Property 3a: dist(M0,M1) = 0⇔ Σ(M0) ≡ Σ(M1). Property 4, the triangle inequality, is not essential for measuring the dissimilarity (distance) between PMs (or for (dis)similarity measures in general, see [11]). Therefore, we will not examine the suggested measures with respect to this property. It is a useful property anyway, because a distance measure that fulfills all four properties given above allows to organise a PM repository using data structures in which the search for similar models is very fast [12]. From an information-theoretic discussion of the concept of similarity (see [11, 13]), one more requirement for a similarity measure can be derived: Such a measure should take into consideration both the commonality between two models and their differences (Property 5). For example, we would not get a good similarity measure by just counting the number of activities that are shared among two models without relating this number to the overall number of activities in the models: If two models with 20 nodes have 15 node names in common, it would be reasonable to say that they are more similar to... |

3 |
Merging business process models. In: On the Move to Meaningful Internet Systems
- Rosa, Dumas, et al.
- 2010
(Show Context)
Citation Context ... Since we transform PMs into their graph representation using method 1 from Fig. 4, sim(V0, V1) = 1 and sim(V0, V1) = 1, too. Since the approach of Grigori et al. considers connectors and their types, the values for sim(V0, V1) and sim(V0, V 2) are rather low in comparison to the other similarity values. Due to the possibility to split a node into two nodes, sim(V0, V3) is high because we only have to split node 1 into nodes < 1, A >, node 5 into nodes < B, 5 >, node 6 into nodes < 6, C >, and node 9 into nodes < D, 9 >. II. Combining Activity Matching and a Graph Edit Distance La Rosa et al. [30] discuss the question of comparing process models stemming from different organisations. Their aim is to create an integrated model in situations like company mergers or restructurings. The approach has three steps: In step 1, a mapping between the activities in M0 and M1 is established, i.e. the mapping function map is defined for activity nodes based on a function corr that uses string-similarity measures. In step 2, a mapping between split and join nodes is found. For this purpose, a measure called context similarity is calculated. A join (or split) node n0 in M0 is regarded as similar to a... |

3 | Detection of semantically equivalent fragments for business process model change management. In:
- Gerth, Luckey, et al.
- 2010
(Show Context)
Citation Context ... 1 and sim(V0, V2) = 1, too as can be seen in Tab. 15. By taking connectors into account, the similarity between V0 and V1 and V2 respectively would be slightly lower. Discussion: The approach described in [33] ignores loops in a PM which is a severe limitation. We cannot agree to the statement made in [33] that “cycles are not used in the distance measure because the cycle does not affect the structure of a process”. Additional research would be necessary on extending tree-based similarity measures to the more general case of expressing PMs with loops as trees (preliminaries are discussed in [35, 36]). VI. Edit Distance Between Reduced Models Facilitating queries on process model repositories is in the focus of the approach of Lu and Sadiq in [37]. A query is represented as a partial process model having the desired process structure, e.g. the order of activities. Given a query model M0 = (N0, E0) and a process model M1 = (N1, E1) with the sets A0 ⊆ N0 and A1 ⊆ N1 of activities, the mapping map : A0 → A1 is established by label equivalence. The approach is limited by the assumption that A0 ⊆ A1. Similar processes can be in either of two relations with each other. M1 is equivalent to M0 wh... |

2 |
Process model analysis using related cluster pairs. In:
- Niemann, Siebenhaar, et al.
- 2010
(Show Context)
Citation Context ...s have been created in different organisations or if they describe a business process on different levels of detail, this can become a non-trivial task. This first step is, however, not in the focus of our article. We assume that a mapping between corresponding activity nodes in the process models to compare has been established, either by using one of the existing algorithms or based on experts’ judgment. The interested reader can find a discussion of different mapping techniques in [4–6]. More general techniques that map model fragments instead of single nodes to each other are discussed in [7]. Once a mapping between the activities has been established, measures of similarity between the models can be computed in step two. After explaining some basic concepts and symbols in Sect. 2, we will discuss desirable properties that a similarity measure should have in Sect. 3. The methods used in our literature survey and the applications for similarity measures that have been suggested in the literature are presented in Sect. 4. To facilitate comparability of the analysed measures, we calculate similarity between an example model and its variations which we present in Sect. 5. Following, i... |

2 | Hierarchical clustering of business process models.
- Jung, Bae, et al.
- 2009
(Show Context)
Citation Context ...clearly limits the applicability of this approach. Another shortcoming is illustrated by model variant V3 (Fig. 3(d)): By adding the activities A, B, C and D into model V0, we destroyed the majority of “direct precedence” relations. Therefore, the dependency graphs of V0 and V3 have only one edge (5,6) in common, and the distance measure is equal to the number of all but one edges in both dependency graphs (i.e. 11+15=26). On the other hand, the distance measure between V0 and V5 (whose dependency graphs have two common edges) is 10+6=16 only, which does not meet the intuitive expectation. In [41], Jung et al. improve the approach with the aim to avoid the shortcomings mentioned above. At first, they calculate the execution probability of each activity. If there is no additional information (for example from process logs), the probability of an activity that follows an XOR-split with n outgoing arcs is assumed to be 1n , and the probability of an activity that follows an OR-split is assumed to be 12 regardless of the number of outgoing arcs (which, of course, is disputable). The examples in [41] show only the calculation of execution probabilities for very simple PMs with a nesting lev... |

1 | A behavioral similarity measure between labeled petri nets based on principal transition sequences - (short paper). In:
- Wang, He, et al.
- 2010
(Show Context)
Citation Context ... of Gerke et al. in [49]), although it should be noted that disregarding the order between some activities has a great negative influence on the computational complexity. In order to compare two models M0 and M1, both sets of traces Σ(M0) and Σ(M1) have to be calculated, and each σ0 ∈ Σ(M0) has to be compared with each σ1 ∈ Σ(M1). For models with large sets of traces, this would not be feasible, i.e. we have to observe a violation of Property 8. II. Similarity Based on Principal Transition Sequences In order to deal with the problems of infinite traces and infinite sets of traces, Wang et al. [50] limit the (sub)traces to consider in a comparison between PMs as follows: A trace that does not contain any activity more than once is considered as a whole. A trace σ that contains an activity x more than once has the form σ = 〈σprefix, x, σrepeatable, x, . . . 〉, where σprefix and σrepeatable are sub-traces of σ. In such a case, the sub-traces σprefix and σrepeatable are used for the comparison instead of the complete trace σ. In [50] it has been shown that for each PM the number of (sub)traces derived in this way is finite. The similarity between two (sub)traces σI and σII is defined based... |