Multimodal Image Classification Based on Convolutional Network and Attention-Based Hidden Markov Random Field
Abstract
In this article, a multimodal deep architecture for classification of light detection and ranging (LiDAR) and hyperspectral image (HSI) is proposed, acquiring the knowledge of both modalities by leveraging modality-specific information and their complementary information. The proposed model consists of two main steps. First, to improve the performance of a 2-D convolutional neural network (2DCNN), low-frequency features with a maximum autocorrelation factor of HSI are injected into 2DCNN, which are called multiscale features of 2DCNN. Second, to improve the accuracy of 2DCNN and extract smooth and semantic information, the posterior energy of hidden Markov random field (HMRF) is modified by using Gaussian attention and albedo recovery attention mechanisms and energies of LiDAR and HMRF. Then, these features are fused based on another attention mechanism called attention-based HMRF. Moreover, this HMRF model is used for the fusion of HSI and LiDAR. The proposed model is tested on the Houston 2013, Trento, and MUUFL datasets and compared with several state-of-the-art methods. The resulting classification accuracies through the ablation study show the superior performance of the proposed method. © 1980-2012 IEEE.