filter by: Publication Year
(Descending) Articles
Machine Learning (15730565) 114(3)
Many machine learning algorithms use Euclidean distance as a common metric to calculate similarities between data. However, Euclidean distance is not valid when data lie on a manifold with non-zero curvatures. Therefore, we propose a new non-parametric approach that uses curvatures to calculate distances. Curvature is an appealing feature for this purpose since it is not altered by isometries. In this paper, we propose two formulas for measuring distances on a manifold with constant curvature, and their validities are proven using the theorems of differential geometry. Utilizing these formulas, an algorithm is developed to measure the distance between a point and the center of a class. In the proposed algorithm geodesies are divided into equal linear segments, assuming that the curvature remains constant within each segment. This assumption is shown to be valid in many data spaces experimentally. Observed data near each segment are used to estimate curvatures and calculate distances within each segment. Finally, the total distance is computed by summing up the non-Euclidean lengths of all segments. The proposed method is a supervised version of k-means, named non-Euclidean centers. The correctness of the proposed method is validated using the Riemann tensor and its related theorems in differential geometry. Furthermore, experimental results show that our method performs well in real-world data classification applications. The space of symmetric positive definite matrices, which is often endowed with non-Euclidean metrics that induce some curvature, is used for input data representations. © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2025.
IEEE Access (21693536) 13pp. 71323-71334
This paper presents a novel computer vision system, which enables real-time pathfinding for individuals with visual impairments. The navigation experience for visually impaired individuals has significantly improved 'in traditional segmentation methods and deep learning techniques'. Traditional methods usually focus on the detection of specific patterns or objects, requiring custom algorithms for each object of interest. In contrast, deep learning models such as instance segmentation and semantic segmentation allow for independent recognition of different elements within a scene. In this research, deep convolutional neural networks are employed to perform semantic segmentation of camera images, thereby facilitating the identification of patterns across the image's feature space. Motivated by a unique concept of a two-branch core architecture, we propose utilizing semantic segmentation to support navigation for visually impaired individuals. The 'demarcation path' captures spatial details with wide channels and shallow layers, while the 'path with rich features' extracts categorical semantics using deep layers. By providing awareness of both 'obstacles' and 'paths' in the surrounding vicinity, this method enhances the perceptual understanding of visually impaired individuals. We try to prioritize real-time performance and low computational overhead to ensure timely and responsive assistance. With a wearable assistive system, we demonstrate that semantic segmentation provides a comprehensive understanding of the surroundings to those with visual impairments. The experimental results showcase an impressive accuracy of 72.6% in detecting paths, path objects, and path boundaries. © 2025 The Authors.
International Journal of Intelligent Systems Technologies and Applications (17408865) 22(2)pp. 151-172
A group formation problem is defined as the simulation of groups of agents, moving without collision while forming a specific shape. The development of this type of problem is usually done using velocity-based or deep-reinforcement learning methods. In velocity-based methods, it is possible to create complex environments with more realistic behaviours of the agents in the environment. However, the computational complexity and inflexibility in changing the formation are among the leading challenges. Using velocity-based and deep reinforcement learning techniques, agents learn to have a collision-free motion in the desired formations. The proposed algorithm, we called ‘DGB DRL’, takes advantage of a hybrid method by combining the two approaches as a formation control algorithm. The evaluation results of the proposed method show an improvement in reducing computational complexity and increasing flexibility in complex environments. Copyright © 2024 Inderscience Enterprises Ltd.
Computers and Electrical Engineering (00457906) 110
In this paper, an efficient features extraction using validated statistical approaches is proposed, along with a robust Grammatical Facial Expressions (GFEs) classifier in facial expression recognition systems. Accordingly, a new dataset was collected from 70 participants (33 males and 37 females) ranging in age from 18 to 46. The total number of video clips collected was 765. The features extracted in this study consist of 17 features associated with three categories of non-manual features: facial expression, head movement, and eye-gaze. Automatic recognition of nine classes of grammatical facial expressions in two languages (Arabic and Persian) is performed using a linear Support Vector Machine (SVM) classifier. The proposed system was also validated by testing it on the American Sign Language (ASL) dataset. In comparison to previous works on the ASL dataset, the results showed a higher accuracy rate of 95%. © 2023 Elsevier Ltd
IEEE Access (21693536) 11pp. 80020-80029
Argus II is the most advanced retina implants approved by the US FDA and almost 350 visually impaired people are using it. This implant uses 60 microelectrodes implanted in the retina. The goal of this implant is to improve mobility and quality of life of its users. However, users' satisfaction is not very high due to the very low resolution of the phosphene images and features created by this device. This article proposes a system to improve the artificial vision created by visual implants. The proposed method extracts information about the people around the visually impaired person by using image processing and machine vision algorithms. This information includes the number of the people in the scene, whether they are known or unknown, their gender, estimated ages, facial emotions, and approximate distance from the visually impaired person. This information is extracted from the frames received by a camera mounted on the glasses of the user to generate signals that are fed into a visual stimulator. This information is shown to the user by a schematic vision created by some pre-trained patterns of phosphenes reflecting the information communicated to the user. The proposed system is validated with a simulated prosthetic vision comprising 150 microelectrodes that is compatible with the retina and visual cortex implants. A low-cost and energy efficient implementation of the proposed method executing on a Raspberry Pi 4 B at a frame rate of 4.5 frames/second shows the feasibility of using it in portable systems. © 2013 IEEE.
Engineering Applications of Artificial Intelligence (09521976) 125
Image captioning generates a human-like description for a query image, which has attracted considerable attention recently. The most broadly utilized model for image description is an encoder–decoder structure, where the encoder extracts the visual information of the image, and the decoder generates textual descriptions of the image. Transformers have significantly enhanced the performance of image description models. However, a single attention structure in transformers cannot consider more complex relationships between key and query vectors. Furthermore, attention weights are assigned to entire candidate vectors based on the assumption that entire vectors are related. In this paper, a new double-attention framework is presented, which improves the encoder–decoder structure to consider image captioning problems. Hence, a local generator module and a global generator module are designed to predict textual descriptions collaboratively. The proposed approach improves Self-Attention (SA) from two aspects to enhance the performance of image description. First, a Masked Self-Attention module is presented to attend on the most relevant information. Second, to evade a single shallow attention distribution and make deeper internal relations, a Hybrid Weight Distribution (HWD) module is proposed, that develops SA to use the relations between key and query vectors efficiently. Experiments over the Flickr30k and MS-COCO datasets prove that the proposed approach achieves desirable performance on different evaluation measures compared to the state-of-the-art frameworks. © 2023
Expert Systems with Applications (09574174) 223
Image captioning is a difficult problem for machine learning algorithms to compress huge amounts of images into descriptive languages. The recurrent models are popularly used as the decoder to extract the caption with significant performance, while these models have complicated and inherently sequential overtime issues. Recently, transformers provide modeling long dependencies and support parallel processing of sequences compared to recurrent models. However, recent transformer-based models assign attention weights to all candidate vectors based on the assumption that all vectors are relevant and ignore the intra-object relationships. Besides, the complex relationships between key and query vectors cannot be provided using a single attention mechanism. In this paper, a new transformer-based image captioning structure without recurrence and convolution is proposed to address these issues. To this end, a generator network and a selector network to generate textual descriptions collaboratively are designed. Our work contains three main steps: (1) Design a transformer-based generator network as word-level guidance to generate next words based on the current state. (2) Train a latent space to learn the mapping of captions and images into the same embedding space to learn the text-image relation. (3) Design a selector network as sentence-level guidance to evaluate next words by assigning fitness scores to the partial captions through the embedding space. Compared with the architecture of existing methods, the proposed approach contains an attention mechanism without the dependencies of time. It executes each state to select the next best word using local–global guidance. In addition, the proposed model maintains dependencies between the sequences, and can be trained in parallel. Several experiments on the COCO and Flickr datasets demonstrate that the proposed approach can outperform various state-of-the-art models over well-known evaluation measures. © 2023 Elsevier Ltd
IET Computer Vision (17519640) 14(5)pp. 241-247
Polyps are a group of cells growing on the inner surface of the colon. Over time, some polyps can lead to colon cancer, which is often fatal if found in its later stages. Colon cancer can be prevented if the polyps are identified and removed in their early stages. Colonoscopy is a very effective screening method to remove polyps and it largely prevents colon cancer. However, some polyps may not be detected during a colonoscopy due to human error. Over the past two decades, many studies have been conducted on computer-aided detection to reduce the miss rate of polyps. This study consists of two distinct parts, the detection of frames containing polyps and polyp segmentation. In the first section, a new convolutional neural network based on the VGG network is proposed. The proposed network has an accuracy of 86% on a newly collected dataset. In the polyp segmentation section, a fully convolutional network and an effective post-processing algorithm are presented. An evaluation of the proposed polyp segmentation system on the ETIS-LARIB database achieves an overall 82.00% F2 score, which outperforms the methods that participated in the sub-challenge of MICCAI. © The Institution of Engineering and Technology 2020
Journal of Applied Security Research (19361629) 14(2)pp. 169-190
Binary feature descriptors, require considerable amount of information to be applicable in wide appearance variations, which contradicts the single sample per person (SSPP) problem. To address this challenge, a novel binary feature learning method called discriminative binary feature mapping is presented. Then, based on a number of precisely selected objectives, a feature mapping is learned by projecting all of the extracted vectors to a lower-dimensional feature space. The resulting feature vectors are then used to obtain a holistic face representation based on dictionary learning. Extensive experimental results show that the proposed method is able to obtain superior performance. © 2019, © 2019 Taylor & Francis Group, LLC.
Obstacle detection is one of the important parts of systems such as navigation systems or self-driving cars. Most of the proposed approaches for obstacle detection are based on special sensors which are expensive and (or) hard to use. In this article, a new method is introduced which is based on Deep Neural Networks (DNN) and detects obstacle by using a single camera. This method consists of an unsupervised DNNs to extract global features of image and a supervised one to extract local features of image (block). The proposed method uses the advantages of some neighborhood coefficients to consider the impact of the neighboring blocks during local feature extraction (which would be done by supervised CNN). The focus of this article is on the obstacle detection while this approach could be used in depth inference too. © 2018 IEEE.
IET Computer Vision (17519640) 11(2)pp. 145-152
An enhanced version of a segmentation algorithm applied in X-ray images using a prior shape and a straightened boundary image (SBI) is proposed. In the SBI method, the boundary of the target object is extracted with a constant width along the prior shape and transformed to a rectangular image in which the edges are straightened. A new minimal path algorithm is proposed and applied to SBI minimising a cost function to select the best path corresponding to the edges of the target object. The cost function is calculated based on all possible paths from each pixel to the beginning of the image while lowering the computational complexity. Comparing with previous methods, the proposed method removes artefacts and provides clearer and smoother edges even when the prior shape is far from the target object. The method is also less sensitive to the initial positioning of the prior shape model. © The Institution of Engineering and Technology.
Image representation is proven as a long-standing activity in computer vision. The rich context and large amount of information in images makes image recognition hard. So the image features must be extracted and learned correctly. Obtaining good image descriptors is greatly challenging. In recent years Learning Binary Features has been applied for many representation tasks of images, but it is shown to be efficient and effective just on face images. Therefore, designing a method that can be simultaneously successful in representing both texture and face images as well as other type of images is very important. Moreover, advanced binary feature methods need strong prior knowledge as they are hand-crafted. In order to address these problems, here a method is proposed that applies a pattern called Multi Cross Pattern (MCP) to extract the image features, which calculates the difference between all the pattern neighbor pixels and the pattern center pixel in a local square. In addition, a Multi-Objective Binary Feature method, named MOBF for short, is presented to address the aforementioned problems by the following four objectives: (1) maximize the variance of learned codes, (2) increase the information capacity of the binary codes, (3) prevent overfitting and (4) decrease the difference between binary codes of neighboring pixels. Experimental result on standard datasets like FERET, CMU-PIE, and KTH-TIPS show the superiority of MOBF descriptor on texture images as well as face images compared with other descriptors developed in literature for image representation. © 2017 IEEE.
Artificial Organs (15251594) 36(7)pp. 616-628
This article presents an image processing approach dedicated for a blind mobility aid facilitated through visual intracortical electrical stimulation. The method examines a display framework based on the distances related to a scene. The distances of objects to the walker are measured using a size perspective method which uses only one camera without any occlusion effect. The method extracts the information of the closest object to the camera and transfers a sense of distance to a blind walker. The proposed image processing method can estimate the distances of objects within 7.5m of the walker, and alert the presence of the closest object to the person. This new method offers the advantages of information reduction and scene understanding suitable for visual prosthesis. © 2012, the Authors. Artificial Organs © 2012, International Center for Artificial Organs and Transplantation and Wiley Periodicals, Inc.
IET Image Processing (17519667) 6(8)pp. 1041-1048
This study proposes an enhanced version of the five-field motion compensated deinterlacing algorithm. The proposed method applies bi-directional motion estimation using two previous and two subsequent fields. It uses an array of flags to determine if the missing pixels of two previous frames are calculated from original pixels or pre-filtered data. If the pixels are calculated from pre-filtered data, they are not used for deinterlacing in order to decrease error propagation. A new method is also proposed to recognise the presence of fast and non-uniform motions and to prevent the artefacts they can cause. The method calculates two motion vectors within the same parity fields and checks their directions to determine whether they are associated with a uniform motion or not. Motion compensation is refined in a way that pre-filtered data of subsequent fields are never used for missing line calculation. Experimental results based on objective and subjective criteria show that the proposed algorithm improves vertical resolution, prevents artefacts caused by fast and non-uniform motions and achieves better overall image quality than previously reported methods. © The Institution of Engineering and Technology 2012.
IET Image Processing (17519667) 5(7)pp. 611-618
This study proposes a new hybrid video deinterlacing algorithm method featuring a novel approach to qualify the reliability of motion vectors. The algorithm switches between motion-compensated and enhanced edge-based line averaging (ELA) methods based on motion vector reliability. When the motion vectors are calculated, reverse motion estimation (RME) is applied to the optimal matching block. A motion vector is assumed reliable if the result of RME refers to the original block or to a block in its vicinity. Motion compensation is used when motion vectors are reliable to improve the vertical resolution and enhanced ELA is used when the motion vectors are not reliable to prevent artefacts. Experimental results show that RME performs better than previous approaches, based on objective and subjective criteria. The computational complexity of the proposed method is up to two orders of magnitude less than previous methods, while the quality of the output compares well with the best previously reported methods. © 2011 The Institution of Engineering and Technology.
We present techniques used to create a high performance application-specific instruction-set processor (ASIP) implementation of the Pattern-Based Directional Interpolation (PBDI) intra-field deinterlacing algorithm. The proposed techniques focus primarily on an efficient utilization of the available memory bandwidth. They include the use of Very Long Instruction Words (VLIW) and an appropriate choice of custom instructions and application-specific registers in order to form a processing pipeline. We report a speedup factor of 1351 in comparison with a software-only implementation of the algorithm running on a general-purpose 32-bit RISC processor.
IEEE Transactions on Consumer Electronics (00983063) 53(3)pp. 1117-1124
A new motion compensated deinterlacing method using forward and backward motion estimation is proposed in this paper. Bi-directional motion estimation is performed using two previous and two subsequent fields. The motion estimator uses pre-filtering prior to motion estimation for the current and the subsequent two fields. The motion estimator finds a single optimal matching block in the same or opposite parity reference fields. Motion compensation is done according to the amount of vertical motion within the reference fields to achieve the highest vertical resolution improvement. A novel technique to prevent the appearance of visual artifacts in the presence of fast-moving objects is proposed. Experimental results show that the proposed method performs better than the conventional deinterlacing methods, based on objective and subjective criteria. © 2007 IEEE.
In this paper we propose a new deinterlacing algorithm using motion compensation and directional interpolation. To limit the propagation error that is a major drawback of conventional motion compensated methods, motion estimation is performed using original lines only, for same and opposite parity fields. In addition, a threshold value is used during the search to recognize situations where the motion estimator fails to find an optimal matching block. Enhanced edge-based line average with median filtering is used in these situations. Experimental results show that the proposed method performs better than the traditional motion compensated method, based on objective and subjective criteria. © 2006 IEEE.
This paper proposes optimization techniques to accelerate the enhanced edge-based line average (ELA) deinterlacing method. ELA is based on edge detection and directional interpolation as well as median filtering. The techniques are first based on low-level software optimizations to accelerate loops and arithmetic operations. Specialized hardware structures and corresponding new instructions are then defined for the Xtensa reconfigurable processor to accelerate ELA-specific operations. The combined software and hardware techniques result in a speed-up of 67× when compared to a base case. This accelerates the processing time from 25 times slower than real time to 2.7 times faster for a NTSC frame rate. A parallel processing version of ELA is also discussed. © 2006 IEEE.