In today's digital age, the comprehension and prediction of human personality traits have assumed paramount significance. This study embarks on the task of forecasting the Big Five personality traits through textual data, harnessing the capabilities of advanced natural language processing models. The focal dataset is the ChaLearn First Impressions V2, a treasure trove of human-generated text coupled with Big Five personality trait labels. A diverse array of models undergo scrutiny, ranging from basic deep learning models like Deep Pyramid Convolutional Neural Network (DPCNN) and Hierarchical Attention Network (HAN) to cutting-edge transformer-based architectures such as BERT and FLAN-T5. These models undergo meticulous evaluation across various training scenarios, spanning scenarios where all layers are fine-tuned, only the embedding layer is freezed, and the complete layer freezing, with exclusive attention to Transformer models. Notably, models such as DPCNN and HAN emerge as stars, boasting remarkable accuracy attributable to their prowess in hierarchical feature extraction. Conversely, Transformer models like ELECTRA shine when layers remain frozen, showcasing their exceptional contextual comprehension. Furthermore, the study employs word clouds to visually encapsulate the essence of each Big Five personality trait, unraveling intricate relationships between specific words and these traits. The findings underscore the intricate interplay among model architecture, training methodologies, and layer freezing, offering valuable insights into strategies that yield optimal performance in predicting personality traits. In an age dominated by digital communication, this research contributes significantly to our understanding and prediction of human personalities. ©2024 IEEE.
Knowledge-Based Systems (09507051)301
Self-supervised learning aims to create semantic-enriched representation from unannotated data. A prevalent strategy in this field involves training a unified representation space that is invariant to various transformation combinations. However, creating a single invariant representation to multiple transformations poses several challenges. The efficacy of such a representation space depends on factors such as the intensity, sequence, and various combination scenarios of transformations. As a result, features generated in single representation space may exhibit limited adaptability for subsequent tasks. In contrast to the conventional SSL training approach, we introduce a novel method that involves constructing multiple atomic transformation-invariant representation subspaces. Each subspace in the proposed method is invariant to a specific atomic transformation from a predefined reference set. Our method offers increased flexibility by enabling the downstream task to weigh every atomic transformation-invariant subspace based on the desired feature space. A series of experiments were conducted to compare our approach to traditional self-supervised learning methods in order to assess its effectiveness. This evaluation encompassed diverse data regimes, datasets, evaluation protocols, and perspectives on source-destination data distribution. Our results highlight the superiority of our method compared to training strategies based on single transformation-invariant representation spaces. Additionally, our proposed method demonstrated superior performance in reducing false positives in the context of pulmonary nodule detection when compared to several recent supervised and self-supervised approaches. © 2024 Elsevier B.V.