AI for Early CVD Diagnosis and Personalized Care

Abstract
Heart-related conditions (CVD) continue to be one of the main causes of morbidity and death worldwide; therefore, early identification and treatment are essential for bettering patient outcomes. The creation and validation of a machine learning (ML) model are presented in this work designed for the early prediction and diagnosis of CVD. Leveraging a comprehensive dataset sourced from National Health and Nutrition Examination Survey (NHANES) which included patient demographics, clinical history, lifestyle factors, and medical records, we employed advanced machine learning technologies include neural networks, random forests, gradient boosting, and logistic regression. Extensive data preprocessing was performed, including managing missing data, encoding categorical variables, and normalizing continuous variables. Feature selection was achieved using Tree-based models’ Recursive Feature Elimination (RFE) and feature importance. Through the use of stratified k-fold cross-validation, the models were trained and verified technique. The best-performing model, a Gradient Boosting Classifier, demonstrated high AUC-ROC of 0.95, accuracy (92%), precision (90%), recall (91%), and F1-score (90%). Important factors that were found to be predictive included age, blood pressure, cholesterol, smoking status, and family history of CVD. These results underscore the model’s efficacy in accurately predicting and diagnosing CVD early on, allowing for prompt intervention and customized treatment regimens. Future research will focus on clinical integration and expanding applicability to various CVD subtypes.
Keywords: Cardiovascular Diseases, Data Pre-processing, diagnosis, Early Prediction, Feature Selection, Gradient Boosting Classifier, Machine Learning, Patient Outcomes.

Author(s): BN Surya*, BN Venkatesh, S Vijayalakshmi, A Hari Narayanan, Rehana Syed
Volume: 1 Issue: 2 Pages: 9-18
DOI: https://doi.org/10.47857/irjmeds.2024.v01i02.008