A Novel Hybrid Ensemble Framework for Thyroid Disease Diagnosis with Optimized Feature Selection
Main Article Content
Abstract
Accurate and early diagnosis of thyroid disease is critical for effective treatment and patient management. With the growing availability of medical data, machine learning (ML) techniques have become powerful tools for automated disease diagnosis. This approach focuses on enhancing prediction accuracy for thyroid disease while addressing challenges such as data imbalance and limited model generalizability. A well-structured thyroid disease dataset was utilized, containing patient information including clinical and diagnostic features. To extract the most informative attributes and reduce dimensionality, optimized feature selection techniques were applied, ensuring that only the most relevant features contribute to the final prediction. Several machine learning classifiers were employed individually—Artificial Neural Network (ANN), Logistic Regression (LR), K-Nearest Neighbours (KNN), Decision Tree (DT), and Support Vector Machine (SVM)—each contributing unique strengths in learning patterns and relationships within the data. However, due to the limitations of single classifiers in handling imbalanced data, ensemble learning techniques were integrated to improve performance and robustness. Two ensemble strategies were explored: stacking and voting. The stacking ensemble integrated base learners such as SVM, DT, KNN, LR, and ANN, with LightGBM serving as the meta-classifier, efficiently capturing complex non-linear relationships. In parallel, a voting classifier was constructed combining Boosted Decision Tree and Extra Tree algorithms to enhance overall decision-making accuracy. These ensemble methods were effective in addressing data imbalance while improving classification accuracy. Among all, the voting classifier demonstrated the best performance, achieving a classification accuracy of 98%, showcasing its superior capability in detecting thyroid disease accurately. This hybrid framework highlights the significance of combining multiple classifiers with optimized feature selection, ultimately leading to a robust and efficient system for thyroid disease diagnosis.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.