Background
Type: Conference Paper

A Weighted TF-IDF-based Approach for Authorship Attribution

Journal: ()Year: 2021/01/01Volume: Issue: Pages: 188 - 193
Abedzadeh A.Fatemi A.a
DOI:10.1109/ICCKE54056.2021.9721474Language: English

Abstract

Authorship Attribution (AA) is a task in which a disputed text is automatically assigned to an author chosen from a list of candidate authors. To this end, a model is trained on a dataset of textual documents with known authors, which can be considered as a multi-class single-label classification task. In this paper, we approach this task differently by extending information retrieval techniques to train an AA model. It is based on weighting the AARR technique, presented in our previous study, to relax the value of term frequency. The efficiency of the proposed solution has been evaluated by conducting several experiments on six datasets. The results show the superiority of the proposed solution by improving the accuracy of IMDB, Gutenberg books, Poetry, Blogs, PAN2011, and Twitter datasets by 33%, 31%, 31%, 19%, 6%, and 1%, respectively, where the average improvement is 19.94% over all datasets. The best accuracy over these datasets is 88%, 82%, 67%, 90%, 65%, and 81% in the same respect. In addition, compared to the baseline system, the computation time of the proposed solution has been improved significantly (21.44X) by employing a dictionary-based indexing technique. © 2021 IEEE.


Author Keywords

Author IdentificationAuthorship AttributionInformation RetrievalTerm FrequencyTF-IDFClassification (of information)Information retrieval systems

Other Keywords

Classification (of information)Information retrieval systemsAuthor identificationAuthorship attributionBaseline systemsClassification tasksComputation timeIndexing techniquesRetrieval techniquesTerm FrequencyTextual documentsTF-IDFInformation retrieval