filter by: Publication Year
(Descending) Articles
Computers and Electrical Engineering (00457906) 115
In this research, we introduce the Image Approximate Block Compressor (IABC), a fast (single cycle), simple and high-performance cache block compressor targeting domain-specific image data. Our work presents a high-quality cache block compression technique by applying approximation to image pixels used in selected error-resilient applications. IABC not only works seamlessly alongside mainstream block compression approaches including zero, frequent and partial patterns detection but also, due to introducing the approximation, improves their performance by increasing the probability of detecting the patterns. Having examined multiple variants of IABC, the proposed block compression with one-cycle decompression and two-cycle compression latency, we have considered a state-of-the-art algorithm, namely Base-Delta-Immediate (BΔI), and its modified approximate version that we call approximate BΔI, as our baselines. The evaluation reveals that IABC brings about a block compression ratio of 25.7 on average (up to 106) against BΔI, with an average ratio of 2.69 (up to 45.0) and the Approximate BΔI with an average ratio of 2.7 (up to 45.2). The evaluation results also show that the compression benefits of IABC come at only a 2.73% average error in the quality of a deep learning object recognition application. In addition, IABC generates high-quality outputs for stand-alone images with a 39.49 dB average Peak Signal to Noise Ratio (PSNR). The mentioned qualities come at only 13% storage overhead. © 2024 The Author(s)
Neurocomputing (09252312) 515pp. 107-120
In this paper, a low cost method for input size reduction without sacrificing accuracy is proposed, which reduces required computation resources for both training and inference of deep convolutional neural network (DCNN) in the steering control of self-driving cars. Efficient processing of DCNNs is becoming prominent challenge due to its huge computation cost, number of parameters and also inadequate computation resources on power efficient hardware devices and the proposed method alleviates the problem comparing the state of the art. The proposed method introduces feature density metric (FDM) as criterion to mask and filter out regions of input image that do not contain adequate amount of features. This filtering method prevents DCNN from useless calculations belongs to feature-free regions. Compared to PilotNet, the proposed method accelerates overall training and inference phases of end-to-end (ETE) deep steering control of self-driving cars up to 1.3× and 2.0× respectively. © 2022 Elsevier B.V.
Amirkabir Journal of Mechanical Engineering (20086032) 53(4 Special Issue)pp. 577-580
In this paper deep neural controller is evaluated in self-driving car application which is one of the most important and critical among human-in-the-loop cyber physical systems. To this aim, the modern controller is compared with two classic controllers, i.e. proportional–integral–derivative and model predictive control for both quantitative and qualitative parameters. The parameters reflect three main challenges: (i) design-time challenges like dependency to the model and design parameters, (ii) implementation challenges including ease of implementation and computation workload, and (iii) run-time challenges and parameters covering performance in terms of speed, accuracy, control cost and effort, kinematic energy and vehicle depreciation. The main objective of our work is to present comparison and concrete metrics for designers to compare modern and traditional controllers. A framework for design, implementation and evaluation is presented. An end-to-end controller, constituting six convolution layers and four fully connected layers, is evaluated as the modern controller. The controller learns human driving behaviors and is used to drive the vehicle autonomously. Our results show that despite the main advantages of the controller i.e. being model free and also trainable, in terms of important metrics, this controller exhibits acceptable performance in comparison with proportional–integral–derivative and model predictive controllers. © 2021, Amirkabir University of Technology. All rights reserved.
Nowadays replacement policies draw the attention of researchers. These policies have a direct impact on the cache miss rate and consequently on the performance and power consumption. The hardware overhead grows as the complexity of cache replacement policies increase which results in energy penalties. In this article, we introduce IPKB (Improvement Per Kilo Byte) as a touchstone for replacement policies and then use this new metric to evaluate recent common and credible replacement policies. IPKB indicates both miss rate and hardware overhead, the latter is a better representative of energy consumption. We show that using our metric, the policies with higher miss rate improvement were not the best policy because of their massive hardware overheads. © 2019 IEEE.
Computers and Education (03601315) 120pp. 75-89
The quality of online information is highly variable because anyone can post data on the internet, and not all online sources are equally reliable, valuable, or accurate. Previous studies reveal problems with online information evaluation skills and a lack of ability in using evaluation criteria, including currency, relevance, authority, accuracy and purpose. The primary purpose of this study is to develop a framework for cooperative and interactive mobile learning to improve students' online information evaluation skills. A mobile learning application is subsequently developed based on the proposed framework. To assess the effectiveness of the developed application, an experiment is conducted on diploma students in a university. A usability questionnaire is conducted on an experimental group to identify students' perceptions regarding the usability of the developed mobile application. The experimental results indicate that the application is significantly more effective with an effect size of 1.91 in improving students’ online information evaluation skills than traditional learning. The results contribute to the extant literature in the context of mobile learning by identifying usability evaluation features and providing a framework for developing cooperative and interactive mobile learning. The implications of the present findings for research and instructional practice are discussed. © 2018 Elsevier Ltd
Parsazadeh, N. ,
Ali, R. ,
Rezaei, M. ,
Tehrani, S.Z. Studies in Educational Evaluation (0191491X) 58pp. 97-111
The advent of mobile technologies in learning context, has been increased the requirements for developing appropriate usability model to align with mobile learning applications. Even though mobile learning has been studied from different aspects of pedagogy environment and technology acceptance, there is little scientific and published research on usability of mobile learning applications. To fill up the gap, in this study, a usability evaluation model with the inclusion of timeliness is developed to assess the usability of mobile learning applications. Timeliness or response time as an important feature in mobile learning, which influences learning satisfaction, can be considered to evaluate the peers and instructors’ timely response. The main objective of this study is to construct and validate a usability evaluation survey for mobile learning environments. This study employed a two round Delphi method to empirically verify the usability questionnaire by obtaining a consensus from fourteen experts regarding the questionnaire items. Results indicate that over 88% of experts have consented on all usability items represented in the usability questionnaire. The usability evaluation survey for mobile learning applications can help to improve user satisfaction and reductions in training costs. The decrease in costs attracts many researchers, interface designers and project managers to employ the usability evaluation when designing the interfaces for mobile learning applications. © 2018 Elsevier Ltd
E-Commerce is growing rapidly and so is its major part, e-tourism. Although it is important to enhance its technology, but there is a necessity to understand the behavior of its consumer first. This study aims to identify fundamental factors influencing consumer acceptance of e-tourism websites in Iran. The theoretical background of the study is based on Technology Acceptance Model. We investigated the effect of Task Technology Fit, playfulness, trust and computer self-efficacy, as additional factors, on user acceptance of e-tourism websites. Our empirical findings from analyses of a survey data indicate that 'attitude' toward use has the most effect on intention to use e-tourism. The result of this study enhanced our knowledge about critical success factors which influence e-tourism acceptance. Our findings help tourism operators, web developers and researchers to better understand consumer's behavior in e-tourism context. © 2014 IEEE.
Journal of Systems Architecture (13837621) 53(12)pp. 927-936
In conventional architectures, the central processing unit (CPU) spends a significant amount of execution time allocating and de-allocating memory. Efforts to improve memory management functions using custom allocators have led to only small improvements in performance. In this work, we test the feasibility of decoupling memory management functions from the main processing element to a separate memory management hardware. Such memory management hardware can reside on the same die as the CPU, in a memory controller or embedded within a DRAM chip. Using Simplescalar, we simulated our architecture and investigated the execution performance of various benchmarks selected from SPECInt2000, Olden and other memory intensive application suites. Hardware allocator reduced the execution time of applications by as much as 50%. In fact, the decoupled hardware results in a performance improvement even when we assume that both the hardware and software memory allocators require the same number of cycles. We attribute much of this improved performance to improved cache behavior since decoupling memory management functions reduces cache pollution caused by dynamic memory management software. We anticipate that even higher levels of performance can be achieved by using innovative hardware and software optimizations. We do not show any specific implementation for the memory management hardware. This paper only investigates the potential performance gains that can result from a hardware allocator. © 2007 Elsevier B.V. All rights reserved.
Journal of Systems Architecture (13837621) 52(1)pp. 41-55
In this work, we show that data-intensive and frequently-used service functions such as memory allocation and de-allocation entangle with application's working set and become a major cause for cache misses. We present our technique that transfers the allocation and de-allocation functions' executions from main CPU to a separate processor residing on chip with DRAM (Intelligent Memory Manager). The results manifested in the paper state that, 60% of the cache misses caused by the service functions are eliminated when using our technique. We believe that cache performance of applications in computer system is poor due to their indulgence for the service functions. © 2005 Elsevier B.V. All rights reserved.
In this paper we show that cache memories for embedded applications can be designed to increase performance while reduce area and energy consumed. Previously we have shown that separating data cache into an array cache and a scalar cache can lead to significant performance improvements for scientific benchmarks. In this paper we show that such a split data cache can also benefit embedded applications. To further improve the split cache organization, we augment the scalar cache with a small victim cache and the array cache with a small stream buffer. This "integrated" cache organization can lead to 43% reduction in the overall cache size, 37% reduction in access time and a 63% reduction in power consumption when compared to a unified 2-way set associative data cache for media benchmarks from MiBench suite.
In our prior work we explored a cache organization providing architectural support for distinguishing between memory references that exhibit spatial and temporal locality and mapping them to separate caches.That work showed that using separate (data) caches for indexed or stream data and scalar data items could lead to substantial improvements in terms of cache misses. In addition, such a separation allowed for the design of caches that could be tailored to meet the properties exhibited by different data items.In this paper, we investigate the interaction between three established methods: split cache, victim cache and stream buffer. Since significant amounts of compulsory and conflict misses are avoided, the size of each cache (i.e., array and scalar), as well as the combined cache capacity can be reduced. Our results show that on average 55% reduction in miss rates over the base configuration.
Data intensive service functions such as memory allocation/de-allocation, data prefetching, and data relocation can pollute processor cache in conventional systems since the same CPU (using the same cache) executes both application code and system services. In this paper we show the improvements in cache performance that can result from the elimination of the cache pollution using separate caches for memory management functions. For the purpose of our study we simulate the existence of separate hardware units for the application and the memory management services using two Unix processes. One process executes application code (simulating main CPU) while the other executes memory management code. We collected address traces for the two processes and used Dinero IV cache simulator to evaluate the expected cache behaviors. A second goal of this paper is to examine the cache performance of different memory allocators. In this paper we compare two allocators: a very popular segregated list based allocator (originally due to Doug Lea) and our own binary-tree based allocator (called Address-ordered Binary Tree). © PDCS 2003. All rights reserved.
Conference Proceedings - IEEE SOUTHEASTCON (07347502) pp. 332-339
Dynamic memory management is an important and essential part of computer systems design. Efficient memory allocation, garbage collection and compaction are becoming increasingly more critical in parallel, distributed and real-time applications using object-oriented languages like C++ and Java. In this paper we present a technique that uses a Binary Tree for the list of available memory blocks and show how this method can manage memory more efficiently and facilitate easy implementation of well known garbage collection techniques.