Visual Computer (14322315)40(10)pp. 6825-6841
An anomaly is a pattern, behavior, or event that does not frequently happen in an environment. Video anomaly detection has always been a challenging task. Home security, public area monitoring, and quality control in production lines are only a few applications of video anomaly detection. The spatio-temporal nature of the videos, the lack of an exact definition for anomalies, and the inefficiencies of feature extraction for videos are examples of the challenges that researchers face in video anomaly detection. To find a solution to these challenges, we propose a method that uses parallel deep structures to extract informative features from the videos. The method consists of different units including an attention unit, frame sampling units, spatial and temporal feature extractors, and thresholding. Using these units, we propose a video anomaly detection that aggregates the results of four parallel structures. Aggregating the results brings generality and flexibility to the algorithm. The proposed method achieves satisfying results for four popular video anomaly detection benchmarks. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
Computers and Electrical Engineering (00457906)120
Video anomaly detection is the identification of outliers deviating from the norm within a series of videos. The spatio-temporal dependencies and unstructured nature of videos make video anomaly detection complicated. Many existing methods cannot detect anomalies accurately because they are unable to learn from the learning data effectively and capture dependencies between distant frames. To this end, we propose a model that uses a pre-trained vision transformer and an ensemble of deep convolutional auto-encoders to capture dependencies between distant frames. Moreover, AdaBoost training is used to ensure the model learns every sample in the data properly. To evaluate the method, we conducted experiments on four publicly available video anomaly detection datasets, namely the CUHK Avenue dataset, ShanghaiTech, UCSD Ped1, and UCSD Ped2, and achieved AUC scores of 93.4 %, 78.8 %, 93.5 %, and 95.7 % for these datasets, respectively. The experimental results demonstrate the flexibility and generalizability of the proposed method for video anomaly detection, coming from robust features extracted by a pre-trained vision transformer and efficient learning of data representations by employing the AdaBoost training strategy. © 2024 Elsevier Ltd
Infrared small target detection is a challenging task. Despite the recent advances in the development of small infrared target detection algorithms, having a robust target detection algorithm with high probability of detection rate remains unaddressed. To this end, in this paper, a morphological top-hat transform is presented. The proposed method benefits from intrinsic high probability detection rate of classical top-hat transform, while the false alarm rate remains below the acceptable threshold. Also, the sensitivity to noise analysis demonstrates that the proposed method is robust against various noise intensities. The proposed method is tested on some real infrared images. The experiments show that the proposed method outperforms state-of-the-arts in both quantitative and qualitative manners. © 2023
IEEE Sensors Journal (1530437X)22(13)pp. 13144-13152
Due to the limited resolution of depth maps captured by RGB-D sensors, depth map Super Resolution (SR) techniques have received a lot of attention. Intensity guided depth map SR methods based on bilateral filter or guided image filter are commonly used for depth upsampling. Although promising edge-preserving results have been reported in these methods, texture-copying artifacts caused by structure discrepancy between depth map and associated intensity image cannot be addressed, easily. In this paper we aim to balance the trade-off between preserving structure and suppressing texture defects. Based on this, a structure-preserving guided filter is presented that not only keeps the advantages of aforementioned methods, but also overcomes texture-copying artifacts. Unlike conventional guided filtering-based methods which rely on only one guidance, we emphasize on the use of both intensity and depth information as guidance to alleviate the deficiencies of the existing works. We replace the mean filtering scheme in guided filters with a weighted average strategy, where the weights are described by the local depth kernel depended on the input depth map. This enables our method to considerably reduce texture-copying artifacts while preserving 3D structural details. Visual evaluation of results shows that the algorithm can also avoid halo artifacts near the edges whereas traditional guided filters suffer from it. Quantitative results of comprehensive experiments demonstrate the effectiveness of our approach over prior depth map SR works. © 2001-2012 IEEE.
Conference Record - IEEE Instrumentation and Measurement Technology Conference (10915281)
Human Activity Recognition (HAR) can be useful in various applications such as health monitoring, security and surveillance, and smart environments. But the majority of existing HAR methods fail to recognize more than one subject in the environment. Moreover, usually a machine learning algorithm is applied for recognition which needs access to a suitable training dataset and the necessary processing power. In this paper, a non-learning approach for recognizing human activities in multi-subject environments is proposed. For this purpose, microwave Frequency-Modulated Continuous Wave (FMCW) radar is used which is able to work unobtrusively and also does not need any adjustments in different environments. We propose mathematical and morphological operations of range-Doppler map to enable the system to recognize activities in real time with inexpensive and low-power processors. Our system also measures the distance of subjects, in addition to their activity. Performance results show that InARMS can reach 89.1% and 75.1% accuracy in an environment with one and two subjects, respectively, outperforming representative existing methods by as much as 6.4%. © 2022 IEEE.
Multimedia Tools and Applications (13807501)81(8)pp. 11461-11478
Depth images captured by conventional RGB-D sensors such as ToF cameras have limited resolution. Despite recent advances in depth camera technology, there is still a significant difference between the resolution of depth and color images. Therefore, depth map Super-Resolution (SR) techniques have received attention. Specifically, achieving an algorithm performing well at large scaling factors is of great importance and also challenging. In most existing methods, the up-sampling of low resolution depth images to the desired size is performed by an interpolation operation during the beginning stage and quality improvement filters are applied then. Due to the different nature of depth images and their sparsity, magnifying the images in a single step brings heavy artifacts specially at large up-sampling factors (e.g., 16). To tackle this problem, we propose a progressive multi-step depth map SR method where interpolation and modified enhancement processes are applied iteratively. This extremely improves the quality of the output depth image. Moreover, considering the importance of edges and discontinuities in depth images, instead of using conventional symmetric kernel, an edge directed kernel is applied which effectively avoids blurring. In addition, texture copying and depth bleeding artifacts are reduced employing a depth range filter. Quantitative and qualitative results of comprehensive experiments on Middlebury and real-world datasets demonstrate the effectiveness of our approach over prior depth SR works, especially for large scaling factors of 16, 32 and even 64. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Multimedia Tools and Applications (13807501)80(8)pp. 12685-12730
Despite of the recent progresses in reliable and high bandwidth communication, packet loss is still probable and needs special attention in real-time video streaming applications. Congestion and bit error rate, which sometimes are more than the protection capability of the channel codes, are the sources of packet loss in video communication. One common approach to deal with video packet loss is to use error concealment techniques, which estimate the non-received data as close as possible to the actual data. This article reviews the temporal video error concealment methods that have been developed over the past 30 years. The techniques are categorized into 8 groups, and the methods are covered with enough details. The strengths and weaknesses of the 8 groups are also tabulated, and some suggestions for future work and open areas for research are provided. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature.
IEEE Instrumentation and Measurement Magazine (10946969)24(6)pp. 46-57
Human Activity Recognition (HAR) has attracted much attention in the last two decades with applications such as remote health monitoring, security and surveillance, and smart environments. Specifically, for well-being assessment, HAR systems give us the possibility of recognizing important physical activities in the patient's daily living. For instance, using motion sensors to monitor and record the physical situations and postures of patients with chronic conditions such as arthritis and cardiovascular disease which cause limitations in mobility can be useful in behavior assessment [1]. These physical records, especially for people with disabilities or elderly people, provide caregivers with useful information for treatment. Another example is fall detection that could notify caregivers instantly when a person falls. Today, many types of sensors are used for human activity recognition, including vision-based, wearable, object-tagged and device-free. In this article, we focus on device-free sensors and give and overview of their applications in HAR for well-being assessment in smart homes, including examples from the existing literature and our own test results for simple activity recognition with some of these sensors. We will see that device-free sensors are used predominantly for fall detection, although other well-being applications such as cognitive assessment, respiration monitoring, and dementia detection have emerged as well. Let us begin by looking at various types of HAR sensors and identifying the characteristics of each of them. © 1998-2012 IEEE.
Signal, Image and Video Processing (18631711)15(1)pp. 165-173
Packet loss is inevitable when the video is transmitted over lossy channels. HEVC, due to the relatively large coding unit and high compression ratio, is much sensitive to data loss. Against the data loss, the responses of inter or intra coded blocks/frames are not the same. In the case of intra coding, the inter-frame dependency is removed and hence the error propagation is mitigated. However, with intra coding, since there is no motion vector, error concealment is more challenging than inter coding. Another difference between inter and intra coding is compression ratio where it is usually much better when using inter prediction. In this paper, taking into account the above trade-offs, a series of experiments is conducted to show the domain of preference of intra or inter coding in the application of HEVC video transmission over packet lossy networks. It is achieved that, if ratio of bitrates of I-Frames and P-Frames is smaller than a threshold, fully intra coding provides higher PSNR than some complicated methods; the improvement is as much as 2.5 dB sometimes. The performance of the proposed method is not the best as experimental results show, but besides its acceptable Rate-Quality behavior, it provides some side advantages. © 2020, Springer-Verlag London Ltd., part of Springer Nature.
Multimedia Tools and Applications (13807501)80(18)pp. 27385-27405
For error concealment of the relatively large corrupted areas in High Efficiency Video Coding (HEVC), the available spatial information is far from the corrupted region and cannot be directly exploited for error concealment. In this paper, a method is proposed to use the spatial information in a new manner to refine the already recovered Motion Vectors (MVs). The refinement method works based on boundary matching and adaptively selection among three approaches for fine tuning of the temporal MVs. The experiments show that the refinement leads to a significant improvement, 2–7 dB in PSNR, for some frames, and the highest MS-SSIM against the state of the art methods. Another important feature of the proposed method is its generality; which can be added on top of other MV recovery methods. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Digital Signal Processing: A Review Journal (10954333)117
Small infrared target localization and tracking are of great importance in early-warning systems. In order to accurately localize the target, a high-performance target detection algorithm is required. In this paper, a new detection algorithm is proposed, which effectively enhances the target area and eliminates noise and background clutter. The algorithm is inspired by the minimum variation directions interpolation. The detection performance of the method is investigated comprehensively in different situations. Also, to exclude the effect of thresholding on the detector's performance, a measure based on constant false alarm rate (CFAR) is employed. Experiments on multiple real-world infrared sequences demonstrate the effectiveness of the proposed method. © 2021 Elsevier Inc.
Multimedia Tools and Applications (13807501)79(11-12)pp. 7449-7469
In this paper, we proposed a video error concealment algorithm using Motion Vector (MV) recovery for parallelogram partitions in the lost area. Error concealment is inevitable when some video packets are lost during transmission and correction or retransmission is not feasible. In conventional methods, MVs are recovered for the square shaped blocks which are then used for motion compensated temporal replacement. But in our proposed method, by parallelogram partitioning of the lost area, the MVs are found for more general shaped blocks. The parallelograms with various sizes and angles are examined, and then the best combination (size and angle) is selected with the assist of a border matching algorithm and a blind quality assessment method. Experimental results show that our method outperforms the other error concealment algorithms, both subjectively and objectively. © 2019, Springer Science+Business Media, LLC, part of Springer Nature.
IEEE Transactions on Multimedia (15209210)22(9)pp. 2193-2206
One challenge in video transmission is to deal with packet loss. Since the compressed video streams are sensitive to data loss, the error resiliency of the encoded video becomes important. When video data is lost and retransmission is not possible, the missed data should be concealed. But loss concealment causes distortion in the lossy frame which also propagates into the next frames even if their data are received correctly. One promising solution to mitigate this error propagation is intra coding. There are three approaches for intra coding: intra coding of a number of blocks selected randomly or regularly, intra coding of some specific blocks selected by an appropriate cost function, or intra coding of a whole frame. But Intra coding reduces the compression ratio; therefore, there exists a trade-off between bitrate and error resiliency achieved by intra coding. In this paper, we study and show the best strategy for getting the best rate-distortion performance. Considering the error propagation, an objective function is formulated, and with some approximations, this objective function is simplified and solved. The solution demonstrates that periodical I-frame coding is preferred over coding only a number of blocks as intra mode in P-frames. Through examination of various test sequences, it is shown that the best intra frame period depends on the coding bitrate as well as the packet loss rate. We then propose a scheme to estimate this period from curve fitting of the experimental results, and show that our proposed scheme outperforms other methods of intra coding especially for higher loss rates and coding bitrates. © 1999-2012 IEEE.
IEEE Transactions on Image Processing (10577149)29pp. 5937-5952
In highly-interactive video streaming applications such as video conferencing, tele-presence, or tele-operation, retransmission is typically not used, due to the tight deadline of the application. In such cases, the lost or erroneous data must be concealed. While various error concealment techniques exist, there is no defined rule to compare their perceived quality. In this paper, the performance of 16 existing image and video quality metrics (PSNR, SSIM, VQM, etc.) evaluating error-concealed video quality is studied. The encoded video is subjected to packet loss and the loss is concealed using various error concealment techniques. We show that the subjective quality of the video cannot be necessarily predicted from the visual quality of the error-concealed frame alone. We then apply the metrics to the error-concealed images/videos and evaluate their success in predicting the scores reported by human subjects. The error-concealed videos are judged by image quality metrics applied on the lossy frame, or by video quality metrics applied on the video clip containing that lossy frame; this way, the impact of error propagation is also considered by the objective metrics. The measurement and comparison of the results show that, mostly though not always, measuring the objective quality of the video is a better way to judge the error concealment performance. Moreover, our experiments show that when the objective quality metrics are used for the assessment of the performance of an error concealment technique, they do not behave as they would for general quality assessment. In fact, some newly developed metrics show the correct decision only about 60% of the time, leading to an unacceptable error rate of as much as 40%. Our analysis shows which specific quality metrics are relatively more suitable for error-concealed videos. © 1992-2012 IEEE.
IEEE Transactions on Multimedia (15209210)20(4)pp. 781-795
Multiple description coding (MDC) is a technique for video transmission over error prone networks where the descriptions are routed over multiple paths. Intra coding such as MDC provides error resiliency but coding in this mode must be decided with care since it degrades the compression ratio. In this paper, we present our investigation results for a new intra coding approach in MDC. We have found that, in MDC streams, the best policy is to encode selective frames as I-frame instead of coding some macroblocks of frames in intra mode. In order to find the most suitable I-frame positions within a given video stream, we developed a cost function based on which intra/inter frame type is decided. The MDC scheme with the proposed intra coding criterion, with and without redundancy optimization, is implemented in the H.264/AVC reference software, JM16.0. Based on the experimental performance evaluation, we show that our method achieves higher average PSNR compared to the other optimized MDCs found in the literature. © 2017 IEEE.
IEEE Transactions on Multimedia (15209210)19(1)pp. 54-66
Multiple description coding (MDC) is a robust coding technique for video transmission over error prone networks, whereby the video is encoded into multiple descriptions with some redundancy between the descriptions. This redundancy leads to error resiliency in the case of packet loss during the network transport. However, the amount of this redundancy has a critical role in MDC performance. Therefore, a crucial problem in MDC is to find what the optimum amount of redundancy budget is, and then how this redundancy budget can be optimally allocated to the frames. To solve this problem, we propose a scheme in which the redundancy budget is allocated to the frames based on the weighted mismatch-rate slopes so that this additional bitrate can attain maximum distortion reduction. The redundancy is added gradually so that fine tuning of the utilized bitrate is achievable. We have verified our proposed scheme by implementing it in H.264/AVC reference software JM16.0, and running experiments against two representative reference methods. Our experiments show that our scheme not only minimizes the end-to-end distortion with a rate-distortion performance that is better than the reference methods, especially for high PLRs, but also entirely uses the available bandwidth, unlike the reference methods. © 1999-2012 IEEE.
Multiple Description Coding (MDC) is a technique for video transmission over error prone networks where the descriptions are routed over multiple paths. Intra coding technique and MDC both provides error resiliency for realtime video transport over unreliable networks. In this paper, we present our investigation results for a new intra coding approach for MDC where selective frames are fully encoded in intra mode instead of coding selective macroblocks in intra mode in many frames. We implemented our scheme in the H.264/AVC reference software, JM16.0. Based on the experimental performance evaluation, we show that our method achieves higher average PSNR compared to the other optimized MDC schemes available in the literature. © 2016 IEEE.
Signal Processing: Image Communication (09235965)36pp. 95-105
Abstract Multiple Description Coding (MDC) is a technique where multiple streams from a video source are generated, each individually decodable and mutually refinable. MDC is a promising solution to overcome packet loss in video transmission over noisy channels, particularly for real-time applications in which retransmission of lost information is not practical. The error resiliency feature of MDC is achieved at the cost of redundancy, and the required amount of redundancy for each frame depends on the packet loss ratio and also the importance of the frame in the sequence. Due to the error propagation in video transmission over lossy channels, reference frames of a Group of Pictures (GOP) are more important for video reconstruction and, hence, need more redundancy to increase the chance of being received correctly. Therefore a channel adaptive optimization for frame-wise redundancy allocation is needed. In this paper, based on the difference of the side and central decoder outputs, the receiver side distortion is formulated and then used for optimization of a MDC scheme. The performance of the optimizer is verified by experimental results measured from JM 16.0, H.264/AVC reference software. © 2015 Elsevier B.V.
Multimedia Systems (14321882)20(3)pp. 283-309
Multiple description coding (MDC) is one of the promising solutions for live video delivery over lossy networks. In this paper, we present a review of MDC techniques based on their application domain and we explain their functionality, with the objective of giving enough insight to designers to decide which MDC scheme is best suited for their specific application based on requirements such as standard compatibility, redundancy tunability, complexity, and extendibility to n-description coding. The focus is mainly on video sources but imagebased algorithms applicable to video are considered as well. We also cover the well-known and important problem of drift and solutions to avoid it. © Springer-Verlag Berlin Heidelberg 2013.
IEEE Transactions on Circuits and Systems for Video Technology (10518215)22(2)pp. 202-215
Multiple description coding (MDC) is a technique where multiple streams from a source video are generated, each individually decodable and mutually refinable. MDC is a promising solution to overcome packet loss in video transmission over noisy channels, particularly for real-time applications in which retransmission of lost information is not practical. A problem with conventional MDC is that the achieved side distortion quality is considerably lower than single description coding (SDC) quality except at high redundancies which in turn leads to central quality degradation. In this paper, a new mixed layer MDC scheme is presented with no degradation in central quality, and providing better side quality (approximately as much as that of SDC) compared to conventional methods. Also, this property directly leads to higher average quality when delivering the video in lossy networks. For each discrete cosine transform coefficient, we generate two coefficients: base coefficient (BC) and enhancement coefficient which are combined together. When all descriptions are available, they are decomposed and decoded to achieve high quality video. When one description is not available, we use estimation to extract as much of the BC as possible from the received description. Simulation results show that the proposed scheme leads to an improved redundancy-rate-distortion performance compared to conventional methods. The algorithm is implemented in JM16.0 and its performance for two-description and four-description coding is verified by experiments. © 2011 IEEE.
Proceedings - IEEE International Conference on Multimedia and Expo (1945788X)
Multiple Description Coding (MDC) is a technique where multiple streams from a source are generated, each individually decodable and mutually refinable. In this paper, a new Mixed Layer MDC (MLMDC) scheme is presented which achieves a higher side quality compared to conventional MDCs. The improved side performance leads to higher average video quality at the receiver in lossy networks. For each DCT coefficient, we generate two coefficients: Base Coefficient (BC) and Enhancement Coefficient (EC) which are combined together. When all descriptions are available, they are decomposed and decoded to achieve high quality video. When one description is not available, we use estimation to extract as much of the BC as possible from the received description. The algorithm is implemented in JM16.0 and its performance for two-description and four-description coding is verified by experiments. © 2011 IEEE.
The cooperative relay cognitive interference channel (RCIC) is a four-node network with two source nodes (primary source and cognitive source) and two destination nodes, in which sources try to communicate at certain rates with their corresponding destinations simultaneously through a common medium and each destination can act as a relay to assist the other one. For the partially cooperative RCIC (PC-RCIC), in which only one of the destinations (corresponding to the cognitive source) acts as a relay, we derive an achievable rate region based on using rate splitting and superposition coding at the cognitive source, and using decode-and-forward scheme at the relay. For the degraded PC-RCIC, we characterize the capacity region. We also investigate the Gaussian PC-RCIC in details. In this case, we present the achievability and converse arguments for a class of degraded Gaussian PC-RCIC and determine the capacity region of this class. Obtained results offer the cooperative relaying as an effective strategy for improving the capacity region of the cognitive interference channels (CICs). © 2010 IEEE.
The cooperative relay cognitive interference channel (RCIC) is a four-node network with two source nodes (primary source and cognitive source) and two destination nodes, in which sources try to communicate at certain rates with their corresponding destinations simultaneously through a common medium and each destination can act as a relay to assist the other one. In this paper, we study partially and fully cooperative state-dependent relay cognitive interference channels (RCICs) with perfect causal channel state information (CSI). For each of these channels, we investigate three different cases. For the first case, perfect causal CSI is available at both the source and relay nodes; for the second case, perfect causal CSI is only known to the relay nodes; and for the third case, perfect causal CSI is only available at the cognitive source. We obtain the capacity region of each case, for a degraded version of the channel. Our results include the previously obtained results for the degraded relay, broadcast and relay broadcast channels with perfect causal CSI, as special cases. © 2010 IEEE.