Results 1  10
of
13
Deep Packet Inspection Using Parallel Bloom Filters
, 2004
"... this memory core, five randommemory locations are readable in a single clock cycle. So performing 35 concurrent memory operations requires seven parallel memory cores, each with oneseventh of the required array size, as Figure 5b illustrates. Because the basic Bloom filter allows any hash function ..."
Abstract

Cited by 154 (19 self)
 Add to MetaCart
this memory core, five randommemory locations are readable in a single clock cycle. So performing 35 concurrent memory operations requires seven parallel memory cores, each with oneseventh of the required array size, as Figure 5b illustrates. Because the basic Bloom filter allows any hash function to map to any bit in the vector, it is possible that for some member, more than five hash functions map to the same memory segment, thereby exceeding the lookup capacity of this memory core. We can solve this problem by restricting the range of each hash function to a given memory, preventing memory contention
Efficient computation of buffer capacities for cyclostatic datatflow graphs
 In Proceedings of the 44th annual Design Automation Conference
, 2007
"... Abstract. A key step in the design of cyclostatic realtime systems is the determination of buffer capacities. In our multiprocessor system, we apply backpressure, which means that tasks wait for space in output buffers. Consequently buffer capacities affect the throughput. This requires the deriv ..."
Abstract

Cited by 28 (9 self)
 Add to MetaCart
Abstract. A key step in the design of cyclostatic realtime systems is the determination of buffer capacities. In our multiprocessor system, we apply backpressure, which means that tasks wait for space in output buffers. Consequently buffer capacities affect the throughput. This requires the derivation of buffer capacities that both result in a satisfaction of the throughput constraint, and also satisfy the constraints on the maximum buffer capacities. Existing exact solutions suffer from the computational complexity that is associated with the required conversion from a cyclostatic dataflow graph to a singlerate dataflow graph. In this paper we present an algorithm, with linear computational complexity, that does not require this conversion and that strives to obtain close to minimal buffer capacities. The algorithm is applied to an MP3 playback application that is mapped on our multiprocessor system. 1.
W.F.: Engineering Parallel Applications with Tunable Architectures
 International Conference on Software Engineering
, 2010
"... Current multicore computers differ in many hardware characteristics. Software developers thus handtune their parallel programs for a specific platform to achieve the best performance; this is tedious and leads to nonportable code. Although the software architecture also requires adaptation to achi ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Current multicore computers differ in many hardware characteristics. Software developers thus handtune their parallel programs for a specific platform to achieve the best performance; this is tedious and leads to nonportable code. Although the software architecture also requires adaptation to achieve best performance, it is rarely modified because of the additional implementation effort. The Tunable Architectures approach proposed in this paper automates the architecture adaptation of parallel programs and uses an autotuner to find the bestperforming software architecture for a particular machine. We introduce a new architecture description language based on parallel patterns and a framework to express architecture variants in a generic way. Several case studies demonstrate significant performance improvements due to architecture tuning and show the applicability of our approach to industrial applications. Software developers are exposed to less parallel programming complexity, thus making the approach attractive for experts as well as inexperienced parallel programmers.
A probabilistic constructive approach to optimization problems
 in ACM/IEEE ICCAD, 2001
, 2001
"... We propose a new optimization paradigm for solving intractable combinatorial problems. The technique, named Probabilistic Constructive (PC), combines the advantages of both constructive and probabilistic algorithms. The constructive aspect provides relatively short runtime and makes the technique am ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We propose a new optimization paradigm for solving intractable combinatorial problems. The technique, named Probabilistic Constructive (PC), combines the advantages of both constructive and probabilistic algorithms. The constructive aspect provides relatively short runtime and makes the technique amenable for the inclusion of insights through heuristic rules. The probabilistic nature facilitates a flexible tradeoff between runtime and the quality of solution. In addition to presenting the generic technique, we apply it to the Maximal Independent Set problem. Extensive experimentation indicates that the new approach provides very attractive tradeoffs between the quality of the solution and runtime, often outperforming the best previously published approaches. 1.
The Optimal Path in a Random Network
"... We study the optimal distance ` opt in random networks in the presence of disorder implemented by assigning random weights to the links. The optimal distance between two nodes is the length of the path for which the sum of weights along the path (\cost") is a minimum. We study the case of strong dis ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We study the optimal distance ` opt in random networks in the presence of disorder implemented by assigning random weights to the links. The optimal distance between two nodes is the length of the path for which the sum of weights along the path (\cost") is a minimum. We study the case of strong disorder for which the distribution of weights is so broad that its sum along any path is dominated by the largest link weight in the path. We nd that in random graphs, ` opt scales as N , where N is the number of nodes in the network. Thus, ` opt increases dramatically compared to the known small world result for the minimum distance `, which scales as log N . We also nd the functional form fro the probability distribution P (l opt ) of optimal paths. In addition we show how the problem of strong disorder on a random network can be mapped onto a percolation problem on the Cayley tree and using this mapping, obtain the probability distribution of the maximal weight on the optimal path.
ApplicationtoCore Mapping Policies to Reduce Interference in OnChip Networks
, 2011
"... As the industry moves toward manycore processors, NetworkonChips (NoCs) will likely become the communication backbone of future microprocessor designs. The NoC is a critical shared resource and its effective utilization is essential for improving overall system performance and fairness. In this p ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
As the industry moves toward manycore processors, NetworkonChips (NoCs) will likely become the communication backbone of future microprocessor designs. The NoC is a critical shared resource and its effective utilization is essential for improving overall system performance and fairness. In this paper, we propose applicationtocore mapping policies to reduce the contention in networkonchip and memory controller resources and hence improve overall system performance. First, we introduce the notion of clusters: cores are grouped into clusters, and a memory controller is assigned to each cluster. The memory controller assigned for a cluster is primarily responsible for servicing the data requested by the applications assigned to that cluster. We propose and evaluate page allocation and page replacement policies that ensure that network traffic of a core is restricted to its cluster with high probability. Second, we develop algorithms that distribute applications between clusters. Our intercluster mapping algorithm separates interferencesensitive applications from aggressive ones by mapping them to different clusters to improve system performance, while maintaining a reasonable network load balance among different clusters. Contrary to the conventional wisdom of balancing network/memory load across clusters, we observe that it is also important to ensure that applications that are more sensitive to network latency experience little interference from applications that are networkintensive. Finally, we develop algorithms to map applications to cores within a cluster. The key idea of intracluster mapping is to map those applications that benefit more from being close to the memory controller, closer to the controller. We evaluate the proposed applicationtocore mapping policies on a 60core CMP with an 8x8 mesh NoC using a suite of 35 diverse applications. Averaged over 128 randomly generated multiprogrammed workloads, the final proposed policy improves system throughput by 16.7 % in terms of weighted speedup over a baseline manycore processor, while also reducing system unfairness by 22.4 % and interconnect power consumption by 52.3%.
A Dynamic Programming Approach to the Study of Protein Sequence Variations
"... Abstract: We propose a dynamic programming method to design efficient algorithms to analyze the genetic variation of gene of interest from different isolates, to search for the pattern and rule of changes in their DNA/protein sequences. In many cases we can achieve linear time (O(n)) time bound for ..."
Abstract
 Add to MetaCart
Abstract: We propose a dynamic programming method to design efficient algorithms to analyze the genetic variation of gene of interest from different isolates, to search for the pattern and rule of changes in their DNA/protein sequences. In many cases we can achieve linear time (O(n)) time bound for the worst case time complexity, instead of the cubic time (O(n 3)) for a bruteforce approach, where n is the length of the sequence. We apply our algorithms to the analysis of Nlinked glycosylation sites of all published gp120 variable regions (V1 to V5) of the envelope glycoproteins of HIV1 virus and find that there is a strong positive correlation between the length of the region and the number of glycosylation sites in the V1 and V4 loops. KeyWords: bioinformatics, dynamic programming, protein sequences, HIV virus, envelope glycoprotein. 1.
NEARFAR RESISTANT MULTIUSER DETECTOR USING ENERGY CONTOURS
"... A multiuser detector with scalable complexity that achieves the maximum likelihood (ML) solution for two users and gives good suboptimal performance for a higher number of users is proposed. The key idea is to construct a lookup table based on the geometric structure of the signal constellation, ..."
Abstract
 Add to MetaCart
A multiuser detector with scalable complexity that achieves the maximum likelihood (ML) solution for two users and gives good suboptimal performance for a higher number of users is proposed. The key idea is to construct a lookup table based on the geometric structure of the signal constellation, and then perform fast decoding based on the lookup table. The proposed detector is nearfar resistant and its performance is consistently better than existing suboptimal detectors when the number of users is greater than the number of dimensions. The robustness of the detector against noise can be controlled at the expense of higher complexity. 1.
Routing of 40Gb/s Traffic in Heterogeneous Optical Networks
, 2003
"... In this paper, we introduce the Routing of Multirate Traffic (RMT) problem that arises in current backbone networks required to carry the new 40Gb/s traffic streams. The RMT problem is informally defined as the process of finding the best routing which maximizes the total bandwidth carried in the n ..."
Abstract
 Add to MetaCart
In this paper, we introduce the Routing of Multirate Traffic (RMT) problem that arises in current backbone networks required to carry the new 40Gb/s traffic streams. The RMT problem is informally defined as the process of finding the best routing which maximizes the total bandwidth carried in the network, for a set of sessions, within a given TDM equipment budget. We propose a twophase iterative optimization scheme (twophase RMT). This scheme first obtains a basis solution used in routing 40Gb/s traffic only on OC768 capable links without the use of TDM equipment. In the second phase, an iterative routing, rerouting, and resource allocation step is used to optimize the total bandwidth carried in the network while allowing 40Gb/s traffic to be routed on OC768 incapable links by the proper installation of TDM multiplexors and demultiplexers at some strategic locations in the network. Numerical results demonstrate the performance of the proposed approach on a meshtype heterogeneous topology.
HsiaoCode Check Matrices and Recursively Balanced Matrices
, 2008
"... The key step of generating the wellknown Hsiao code is to construct a {0, 1}checkmatrix in which each column contains the same oddnumber of 1’s and each row contains the same number of 1’s or differs at most by one for the number of 1’s. We also require that no two columns are identical in the m ..."
Abstract
 Add to MetaCart
The key step of generating the wellknown Hsiao code is to construct a {0, 1}checkmatrix in which each column contains the same oddnumber of 1’s and each row contains the same number of 1’s or differs at most by one for the number of 1’s. We also require that no two columns are identical in the matrix. The author solved this problem in 1986 by introducing a type of recursively balanced matrices. However, since the paper was published in Chinese, the solution for such an important problem was not known by international researchers in coding theory. In this note, we focus on how to practically generate the check matrix of Hsiao codes. We have modified the original algorithm to be more efficient and effective. We have also corrected an error in algorithm analysis presented in the earlier paper. The result shows that the algorithm attained optimum in average cases if a divideandconquer technique must be involved in the algorithm. 1