Improving Text Classification by Fusing Linguistic and Semantic Features
Text classification remains a fundamental challenge in natural language processing (NLP), with performance often limited by the reliance on either traditional linguistic features or semantic embedding techniques in isolation. This study addresses this limitation by proposing a feature fusion method that integrates traditional linguistic features — such as part-of-speech tags, bag-of-words, TF-IDF, and n-grams — with advanced semantic embedding techniques like word 2 vec and doc 2 vec. The proposed approach aims to capture both syntactic and semantic nuances, enhancing the robustness and accuracy of text classification tasks. To evaluate its effectiveness, the method was applied to five datasets across three critical domains: fake news detection, bloom’s taxonomy classification, and hate speech detection.
This paper received best paper award at the 6th International Conference on Advancements in Computational Sciences (ICACS25) in Lahore, Pakistan, February 2025.