Improving Text Classification by Fusing Linguistic and Semantic Features
Text classification in NLP often suffers when using linguistic features or semantic embeddings alone. This study proposes a feature fusion approach that combines traditional linguistic features (such as part-of-speech tags, bag-of-words, TF-IDF, and n-grams) with semantic embeddings from Word2Vec and Doc2Vec to capture both syntactic and semantic information. The method is evaluated on five datasets across fake news detection, Bloom’s taxonomy classification, and hate speech detection using accuracy, precision, recall, and F1-score. Results show that the fused features consistently outperform single-feature approaches, achieving up to 79% and 67% accuracy on fake news datasets, 38% and 64% on Bloom’s taxonomy datasets, and 70% on hate speech datasets. These findings demonstrate that integrating linguistic and semantic features provides a robust and effective solution for improving text classification across diverse NLP tasks.