Results 1  10
of
25
Deep Packet Inspection Using Parallel Bloom Filters
, 2004
"... this memory core, five randommemory locations are readable in a single clock cycle. So performing 35 concurrent memory operations requires seven parallel memory cores, each with oneseventh of the required array size, as Figure 5b illustrates. Because the basic Bloom filter allows any hash function ..."
Abstract

Cited by 218 (18 self)
 Add to MetaCart
this memory core, five randommemory locations are readable in a single clock cycle. So performing 35 concurrent memory operations requires seven parallel memory cores, each with oneseventh of the required array size, as Figure 5b illustrates. Because the basic Bloom filter allows any hash function to map to any bit in the vector, it is possible that for some member, more than five hash functions map to the same memory segment, thereby exceeding the lookup capacity of this memory core. We can solve this problem by restricting the range of each hash function to a given memory, preventing memory contention
Efficient Computation of Buffer Capacities for CycloStatic RealTime Systems with BackPressure
"... This paper describes a conservative approximation algorithm that derives close to minimal buffer capacities for an application described as a cyclostatic dataflow graph. The resulting buffer capacities satisfy constraints on the maximum buffer capacities and endtoend throughput and latency cons ..."
Abstract

Cited by 50 (15 self)
 Add to MetaCart
This paper describes a conservative approximation algorithm that derives close to minimal buffer capacities for an application described as a cyclostatic dataflow graph. The resulting buffer capacities satisfy constraints on the maximum buffer capacities and endtoend throughput and latency constraints. Furthermore we show that the effects of runtime arbitration can be included in the response times of dataflow actors. We show that modelling an MP3 playback application as a cyclostatic dataflow graph instead of a multirate dataflow graph results in buffer capacities that are reduced up to 39%. Furthermore, the algorithm is applied to a reallife carradio application, in which two independent streams are processed.
W.F.: Engineering Parallel Applications with Tunable Architectures
 International Conference on Software Engineering
, 2010
"... Current multicore computers differ in many hardware characteristics. Software developers thus handtune their parallel programs for a specific platform to achieve the best performance; this is tedious and leads to nonportable code. Although the software architecture also requires adaptation to achi ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Current multicore computers differ in many hardware characteristics. Software developers thus handtune their parallel programs for a specific platform to achieve the best performance; this is tedious and leads to nonportable code. Although the software architecture also requires adaptation to achieve best performance, it is rarely modified because of the additional implementation effort. The Tunable Architectures approach proposed in this paper automates the architecture adaptation of parallel programs and uses an autotuner to find the bestperforming software architecture for a particular machine. We introduce a new architecture description language based on parallel patterns and a framework to express architecture variants in a generic way. Several case studies demonstrate significant performance improvements due to architecture tuning and show the applicability of our approach to industrial applications. Software developers are exposed to less parallel programming complexity, thus making the approach attractive for experts as well as inexperienced parallel programmers.
ApplicationtoCore Mapping Policies to Reduce Interference in OnChip Networks
, 2011
"... As the industry moves toward manycore processors, NetworkonChips (NoCs) will likely become the communication backbone of future microprocessor designs. The NoC is a critical shared resource and its effective utilization is essential for improving overall system performance and fairness. In this p ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
As the industry moves toward manycore processors, NetworkonChips (NoCs) will likely become the communication backbone of future microprocessor designs. The NoC is a critical shared resource and its effective utilization is essential for improving overall system performance and fairness. In this paper, we propose applicationtocore mapping policies to reduce the contention in networkonchip and memory controller resources and hence improve overall system performance. First, we introduce the notion of clusters: cores are grouped into clusters, and a memory controller is assigned to each cluster. The memory controller assigned for a cluster is primarily responsible for servicing the data requested by the applications assigned to that cluster. We propose and evaluate page allocation and page replacement policies that ensure that network traffic of a core is restricted to its cluster with high probability. Second, we develop algorithms that distribute applications between clusters. Our intercluster mapping algorithm separates interferencesensitive applications from aggressive ones by mapping them to different clusters to improve system performance, while maintaining a reasonable network load balance among different clusters. Contrary to the conventional wisdom of balancing network/memory load across clusters, we observe that it is also important to ensure that applications that are more sensitive to network latency experience little interference from applications that are networkintensive. Finally, we develop algorithms to map applications to cores within a cluster. The key idea of intracluster mapping is to map those applications that benefit more from being close to the memory controller, closer to the controller. We evaluate the proposed applicationtocore mapping policies on a 60core CMP with an 8x8 mesh NoC using a suite of 35 diverse applications. Averaged over 128 randomly generated multiprogrammed workloads, the final proposed policy improves system throughput by 16.7 % in terms of weighted speedup over a baseline manycore processor, while also reducing system unfairness by 22.4 % and interconnect power consumption by 52.3%.
A probabilistic constructive approach to optimization problems
 in ACM/IEEE ICCAD, 2001
, 2001
"... We propose a new optimization paradigm for solving intractable combinatorial problems. The technique, named Probabilistic Constructive (PC), combines the advantages of both constructive and probabilistic algorithms. The constructive aspect provides relatively short runtime and makes the technique am ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We propose a new optimization paradigm for solving intractable combinatorial problems. The technique, named Probabilistic Constructive (PC), combines the advantages of both constructive and probabilistic algorithms. The constructive aspect provides relatively short runtime and makes the technique amenable for the inclusion of insights through heuristic rules. The probabilistic nature facilitates a flexible tradeoff between runtime and the quality of solution. In addition to presenting the generic technique, we apply it to the Maximal Independent Set problem. Extensive experimentation indicates that the new approach provides very attractive tradeoffs between the quality of the solution and runtime, often outperforming the best previously published approaches. 1.
The Optimal Path in a Random Network
"... We study the optimal distance ` opt in random networks in the presence of disorder implemented by assigning random weights to the links. The optimal distance between two nodes is the length of the path for which the sum of weights along the path (\cost") is a minimum. We study the case of stron ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We study the optimal distance ` opt in random networks in the presence of disorder implemented by assigning random weights to the links. The optimal distance between two nodes is the length of the path for which the sum of weights along the path (\cost") is a minimum. We study the case of strong disorder for which the distribution of weights is so broad that its sum along any path is dominated by the largest link weight in the path. We nd that in random graphs, ` opt scales as N , where N is the number of nodes in the network. Thus, ` opt increases dramatically compared to the known small world result for the minimum distance `, which scales as log N . We also nd the functional form fro the probability distribution P (l opt ) of optimal paths. In addition we show how the problem of strong disorder on a random network can be mapped onto a percolation problem on the Cayley tree and using this mapping, obtain the probability distribution of the maximal weight on the optimal path.
Lagrangian heuristics for strictly convex quadratic minimum cost network flow problems
, 2005
"... This thesis presents a study of five different Lagrangian heuristics applied to the strictly convex quadratic minimum cost network flow problem. Tests are conducted on randomly generated transportation networks with different degrees of sparsity and nonlinearity according to a system devised by Ohuc ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This thesis presents a study of five different Lagrangian heuristics applied to the strictly convex quadratic minimum cost network flow problem. Tests are conducted on randomly generated transportation networks with different degrees of sparsity and nonlinearity according to a system devised by Ohuchi and Kaji [18]. The different heuristics performance in time and quality are compared. The unconstrained dual version of the problem is first solved to nearoptimality using the conjugate gradient method with an exact line search. Then a Lagrangian heuristic is applied to obtain (nearoptimal) primal solutions to the original problem. In the computational study, we show results for two modifications of the Lagrangian heuristic Flowroute, FlowrouteBS and FlowrouteD, and one modification of the Lagrangian heuristic Shortest Path, Shortest PathL. FlowrouteBS, FlowrouteD and Shortest PathL are novel Lagrangian heuristics, but Flowroute and Shortest Path are constructed according to Marklund [15]. The results demonstrate that although FlowrouteBS has the drawback of being significantly slower than Flowroute and FlowrouteD, it produces results of almost as good quality as Shortest Path and Shortest PathL, and is therefore the most promising Lagrangian heuristic.
Routing of 40Gb/s Traffic in Heterogeneous Optical Networks
, 2003
"... In this paper, we introduce the Routing of Multirate Traffic (RMT) problem that arises in current backbone networks required to carry the new 40Gb/s traffic streams. The RMT problem is informally defined as the process of finding the best routing which maximizes the total bandwidth carried in the n ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper, we introduce the Routing of Multirate Traffic (RMT) problem that arises in current backbone networks required to carry the new 40Gb/s traffic streams. The RMT problem is informally defined as the process of finding the best routing which maximizes the total bandwidth carried in the network, for a set of sessions, within a given TDM equipment budget. We propose a twophase iterative optimization scheme (twophase RMT). This scheme first obtains a basis solution used in routing 40Gb/s traffic only on OC768 capable links without the use of TDM equipment. In the second phase, an iterative routing, rerouting, and resource allocation step is used to optimize the total bandwidth carried in the network while allowing 40Gb/s traffic to be routed on OC768 incapable links by the proper installation of TDM multiplexors and demultiplexers at some strategic locations in the network. Numerical results demonstrate the performance of the proposed approach on a meshtype heterogeneous topology.
NEARFAR RESISTANT MULTIUSER DETECTOR USING ENERGY CONTOURS
"... A multiuser detector with scalable complexity that achieves the maximum likelihood (ML) solution for two users and gives good suboptimal performance for a higher number of users is proposed. The key idea is to construct a lookup table based on the geometric structure of the signal constellation, ..."
Abstract
 Add to MetaCart
A multiuser detector with scalable complexity that achieves the maximum likelihood (ML) solution for two users and gives good suboptimal performance for a higher number of users is proposed. The key idea is to construct a lookup table based on the geometric structure of the signal constellation, and then perform fast decoding based on the lookup table. The proposed detector is nearfar resistant and its performance is consistently better than existing suboptimal detectors when the number of users is greater than the number of dimensions. The robustness of the detector against noise can be controlled at the expense of higher complexity. 1.
Obtaining an ACL2 specification from an Isabelle/HOL theory?
"... Abstract. In this work, we present an interoperability framework that enables the translation of specifications (signature of functions and lemma statements) among different theorem provers. This translation is based on a new intermediate XML language, called XLL, and is performed almost automatica ..."
Abstract
 Add to MetaCart
Abstract. In this work, we present an interoperability framework that enables the translation of specifications (signature of functions and lemma statements) among different theorem provers. This translation is based on a new intermediate XML language, called XLL, and is performed almost automatically. As a case study, we focus on porting developments from Isabelle/HOL to ACL2. In particular, we study the transformation to ACL2 of an Isabelle/HOL theory devoted to verify an algorithm computing a diagonal form of an integer matrix. Moreover, we provide a formal proof of a fragment of the obtained ACL2 specification — this shows the suitability of our approach to reuse in ACL2 a proof strategy imported from Isabelle/HOL. 1