Improving Text Classification by Fusing Linguistic and Semantic Features

Av:

Sarang Shaikh
Ehtesham Hashmi
Sule Yildirim Yayilgan
Mohamed Abomhara
Rajendra Akerkar

Teknologi og samfunn

Stordata (Big data) og nye teknologiar

Artikkel

Id:

March 2025

Utgjevar:

IEEE Xplore

Text classification in NLP often suffers when using linguistic features or semantic embeddings alone. This study proposes a feature fusion approach that combines traditional linguistic features (such as part-of-speech tags, bag-of-words, TF-IDF, and n-grams) with semantic embeddings from Word2Vec and Doc2Vec to capture both syntactic and semantic information. The method is evaluated on five datasets across fake news detection, Bloom’s taxonomy classification, and hate speech detection using accuracy, precision, recall, and F1-score. Results show that the fused features consistently outperform single-feature approaches, achieving up to 79% and 67% accuracy on fake news datasets, 38% and 64% on Bloom’s taxonomy datasets, and 70% on hate speech datasets. These findings demonstrate that integrating linguistic and semantic features provides a robust and effective solution for improving text classification across diverse NLP tasks.

Lenke:

Conference article: Improving Text Classification by Fusing Linguistic and Sema…

Improving Text Classification by Fusing Linguistic and Semantic Features

Forebygging av voldsfremkallende atferd i det sosiale cyberrommet i lokalsamfunn (SOCYTI)