Research Output
filter by:
Articles
Publication Date: 2025
Journal of the Acoustical Society of America (15208524)158(6)pp. 4294-4307
Fricatives vary acoustically across languages and individuals, with speaker variability shaped by both phonetic and non-phonetic factors. This study examined between- and within-speaker variability in Persian voiceless fricatives (/f/, /s/, /ʃ/, /x/) and how linguistic environments, such as syllable position and lexical stress, affect this variability. A gender-balanced sample of 24 Persian speakers was recorded in two sessions, 1-2 two weeks apart. Acoustic analysis targeted the first four spectral moments and duration. Results showed that center of gravity captured the greatest between-speaker variability, followed by standard deviation, skewness, duration, and kurtosis. Across segments, the alveolar /s/ exhibited the highest speaker-specificity, followed by /ʃ/, /f/, and /x/. Gender-based patterns emerged: for males, the center of gravity and skewness of /s/ were most discriminative, whereas for females, the center of gravity and standard deviation of /ʃ/ were most effective. The labiodental /f/ showed some speaker-specific characteristics only in the male group. Voiceless fricatives in syllable-initial positions demonstrated more speaker-specificity, while lexical stress did not impact between-speaker variability. Results also highlight cross-linguistic differences in the acoustic cues most effective for speaker differentiation and demonstrate that optimal features can vary across speaker populations. Adaptive algorithms are therefore crucial for improving forensic speaker comparison. © 2025 Acoustical Society of America.
Publication Date: 2025
Journal of the Acoustical Society of America (15208524)158(4)pp. 3260-3279
Auditory discrimination of bilingual voices has proven to be challenging for listeners. This can be attributed to the structure of acoustic voice dimensions, which depends not only on speaker-specific acoustic features but also on language-dependent characteristics. This study investigates how acoustic voice dimensions vary within and between Persian-English bilingual speakers and how constellations of voice quality parameters operate within different languages across speakers. Acoustic voice quality indices were computed over voiced segments from read speech samples of 40 gender-balanced Persian-English bilingual speakers. Using a psychoacoustic model developed by Kreiman, Gerrat, Garellek, Samlan, and Zhang [Loquens 1(1), e009 (2014)] and principal component analyses, we found that only a few acoustic voice dimensions are shared within and between speakers. However, most acoustic variability within and between speakers remains idiosyncratic, suggesting that “individual” and “general” voice spaces are similarly structured within and between speakers in each language context, i.e., Persian and English. Comparing the underlying structures of Persian and English, we found that speakers follow similar acoustic patterns in their two languages. However, some divergences exist between Persian and English acoustic structures, especially for female speakers, which could have implications for bilingual voice discrimination. © 2025 Acoustical Society of America.
Hosseini-kivanani, N.,
Asadi, H.,
Schommer, C. Publication Date: 2025
International Conference on Pattern Recognition Applications and Methods (21844313)1pp. 665-672
This paper investigates the impact of speaking rate variation on speaker verification using a hybrid feature approach that combines Mel-Frequency Cepstral Coefficients (MFCCs), their dynamic derivatives (delta and delta-delta), and vowel formants. To enhance system robustness, we also applied data augmentation techniques such as time-stretching, pitch-shifting, and noise addition. The dataset comprises recordings of Persian speakers at three distinct speaking rates: slow, normal, and fast. Our results show that the combined model integrating MFCCs, delta-delta features, and formant frequencies significantly outperforms individual feature sets, achieving an accuracy of 75% with augmentation, compared to 70% without augmentation. This highlights the benefit of leveraging both spectral and temporal features for speaker verification under varying speaking conditions. Furthermore, data augmentation improved the generalization of all models, particularly for the combined feature set, where precision, recall, and F1-score metrics showed substantial gains. These findings underscore the importance of feature fusion and augmentation in developing robust speaker verification systems. Our study contributes to advancing speaker identification methodologies, particularly in real-world applications where variability in speaking rate and environmental conditions presents a challenge. © 2025 by SCITEPRESS – Science and Technology Publications, Lda.
Publication Date: 2025
Language and Communication (02715309)104pp. 29-49
This study examines the phono-pragmatic properties of bebin (‘look’) in Persian within the framework of Interactive Grammar (IG), focusing on its prosodic characteristics and various pragmatic functions in interaction. Specifically, it explores how variations in prosodic features (duration, f0, and intensity) correlate with the four primary functions of bebin: directive, attention signal, discourse marker, and interjection. The findings highlight the dynamic interplay between prosody and pragmatics, demonstrating how prosodic cues facilitate pragmatic interpretation and how pragmatic functions influence prosodic realization. Moreover, the study provides evidence of a systematic relationship between phonetic reduction and grammaticalization, with increased phonetic reduction observed in more grammaticalized uses of bebin. This contributes to broader discussions on the role of prosody in linguistic change. The study also addresses challenges related to functional overlap and polysemy, offering insights into the complexities of interactive discourse. Finally, the quantifiable nature of the analysis makes it highly applicable to computational linguistics, particularly in training language models for corpus annotation and enhancing pragmatic understanding in natural language processing. © 2025 Elsevier Ltd
Publication Date: 2024
pp. 188-193
—In automatic speech recognition, any factor that alters the acoustic properties of speech can pose a challenge to the system’s performance. This paper presents a novel approach for automatic whispered speech recognition in the Irish dialect using the self-supervised WavLM model. Conventional automatic speech recognition systems often fail to accurately recognise whispered speech due to its distinct acoustic properties and the scarcity of relevant training data. To address this challenge, we utilized a pre-trained WavLM model, fine-tuned with a combination of whispered and normal speech data from the wTIMIT and CHAINS datasets, which include the English language in Singaporean and Irish dialects, respectively. Our baseline evaluation with the OpenAI Whisper model highlighted its limitations, achieving a Word Error Rate (WER) of 18.8% and a Character Error Rate (CER) of 4.24% on whispered speech. In contrast, the proposed WavLM-based system significantly improved performance, achieving a WER of 9.22% and a CER of 2.59%. These results demonstrate the efficacy of our approach in recognising whispered speech and underscore the importance of tailored acoustic modeling for robust automatic speech recognition systems. This study provides valuable insights into developing effective automatic speech recognition solutions for challenging speech affected by whisper and dialect. The source codes for this paper are freely available12,. © 2024 IEEE.
Ibrahim, O.,
Asadi, H.,
Kassem, E.,
Dellwo, V. Publication Date: 2020
pp. 5337-5342
Databases for studying speech rhythm and tempo exist for numerous languages. The present corpus was built to allow comparisons between Arabic speech rhythm and other languages. 10 Egyptian speakers (gender-balanced) produced speech in two different speaking styles (read and spontaneous). The design of the reading task replicates the methodology used in the creation of BonnTempo corpus (BTC). During the spontaneous task, speakers talked freely for more than one minute about their daily life and/or their studies, then they described the directions to come to the university from a famous near location using a map as a visual stimulus. For corpus annotation, the database has been manually and automatically time-labeled, which makes it feasible to perform a quantitative analysis of the rhythm of Arabic in both Modern Standard Arabic (MSA) and Egyptian dialect variety. The database serves as a phonetic resource, which allows researchers to examine various aspects of Arabic supra-segmental features and it can be used for forensic phonetic research, for comparison of different speakers, analyzing variability in different speaking styles, and automatic speech and speaker recognition. © European Language Resources Association (ELRA), licensed under CC-BY-NC
Publication Date: 2019
Language Related Research (23223081)10(1)pp. 129-147
Introduction: Fricatives not only differ in their acoustic structures from one language to another, but also they vary considerably from individual to individual. Acoustic correlates of fricatives are sensitive to the shape and size of the resonance cavity in front of the oral constriction. It is therefore conceivable that any physical change in the length and place of constriction during production of fricatives may alter the resultant acoustic signals. This research attempts to explore potential speaker-specific acoustic parameters of voiceless fricatives in Persian based on experimental phonetics. Therefore, acoustic parameters of center of gravity and fricative duration are investigated for each voiceless fricative in Persian. This research aims to discover whether voiceless fricatives and selected acoustic parameters are able to discriminate between speakers in Persian and whether these fricatives and acoustic parameters are of assistance in segregating speakers in Persian. According to the aforementioned considerations, the following questions are presented in this paper: Do the selected acoustic parameters (center of gravity and duration) of voiceless fricatives have capacity to differentiate speakers in Persian? Which acoustic parameters and which voiceless fricatives discriminate Persian speakers the best? Furthermore, we will compare the results of the present study to the findings of previous studies to see in what way Persian has been similar or different from other investigated languages. Methodology: In order to analyze between- and within speaker variability of voiceless fricatives, 24 Persian speakers (12 male, 12 female) on two separate occasions were recorded in the sound proof booth at phonetics laboratory of Alzahra University. Non-contemporaneous recording of speech material allows us to measure the degree of within-speaker variability across each speaker. The speech material consists of a read passage which contains 54 Persian sentences including relevant voiceless fricatives Speech tokens were acoustically measured with PRAAT version 5.2.34 and statistical analyses were carried out with SPSS version 21 and R version 3.3.3. Results and conclusions: Results of this study indicated that for female speakers, center of gravity ofSands have the best performance in showing between-speaker variability. For male speakers, center of gravity of s is the most highly discriminant acoustic parameters across speakers. Moreover, fricative duration was not reported as a promising acoustic parameter. Center of gravity is directly linked to the size and length of the vocal tract. The longer is the length of the vocal tract, the higher is the center of gravity and vice versa. This indicates that anatomical differences between speaker’s vocal tract influence the acoustic properties of fricatives and ultimately make them distinctive. In the future studies, additional parametric potential speaker-specific features will be examined in order to determine a set of well-established discriminant parameters for voiceless fricatives in Persian. © 2019, Tarbiat Modares University. All rights reserved.