Hybrid pre-processing models for heart disease prediction based on socioeconomic status and major risk factors: Thai heart study

Authors

  • Chalinee PARTANAPAT Division of Management of Information Technology, School of Informatics, Walailak University, Nikon Si Thammarat 80161
  • Chuleerat JARUSKULCHAI Division of Management of Information Technology, School of Informatics, Walailak University, Nikon Si Thammarat 80161
  • Chanankorn JANDAENG Division of Management of Information Technology, School of Informatics, Walailak University, Nikon Si Thammarat 80161

Abstract

Heart disease is the leading cause of death in all over the world over past ten years. The ability to identify the risk factors related to an effective diagnosis is very important for improving accuracy on heart disease prediction. Major Risk Factors such as ECG, Angiography, an imaging modality for blood vessels, hypertension, diabetes, are currently the most accurate method for diagnosis. However, physical diagnosis based on only biological risk factors, which sometimes are reported to wrong diagnosis and treatment, which prompted this study to investigate alternate solutions. This paper is to enhance the prediction accuracy of the presence of heart disease based on SES (Socioeconomic Status) related to biological risk factors with reduced number of attributes. We examined whether every single SES measures like income or education addressed this bias and derived an approach of relevance to traditional risk factors by employing discretization and hybrid feature selection methods. Originally, thirteen biological risk factors are involved for predicting heart disease. In our work, we reduce number of biological risk factors by hybrid feature selection methods and discretization on some of continuous and numerical risk factors, and add five more SES factors towards prediction. Seven feature selection methods with four hybrid ones and discretization of equal depth are applied for reducing number of attributes to achieve more effective accuracy on prediction. Four classifiers are employed to predict the diagnosis. The observations exhibit that after adjusting discretization on numerical risk factors with Relief Attribute Eval algorithm combined with Bayes, our proposed method gives the highest accuracy with 94.01% classified by SVM. Thirteen biological attributes are reduced to six attributes and SES as income are involved. This experiment concludes that low income can cause high risk for heart disease.  Discretization on continuous and numerical risk factors can improve performance on prediction accuracy. Performance of classification accuracy based on unsupervised discretization is compared, SVM is proved to be the best one for this study. The novelty of hybrid combination models and discretization methods are proved to enhance on classifying heart disease problems. Equal Depth Discretization with feature selection by Relief Attribute Evaluation and Bayes gives the better accuracy, when compared with no discretization and without feature selection.

References

Bhatia, S., Prakash, P., & Pillai, G.N. (2008). SVM based decision support system for heart disease classification with inter-coded genetic algorithm to select critical features. In: Proceedings of the World Congress on Engineering and Computer Science.

Burr, M.L., & Sweetnam, P.M. (1984). Family size and paternal unemployment in relation to myocardial infarction. Journal of Epidemiol Community Health 34, 93-95.

Cassel, J., Heyden, S., Bartel, A.G., Kaplan, B.H., Tyroler, H.A., Cornoni, J.C., & Hames, C.G. (1971). Incidence of coronary heart disease by ethnic group, social class, and sex. Archives of Journal of Internal Medicine 128, 901-906.

Chilnick, L.D. (2008). Heart disease: An essential guide for the newly diagnosed. Da Capo Press.

Crawford, M.H. (2002). Current diagnosis & treatment in cardiology. McGraw-Hill Professional.

Egeland, G.M., Tverdal, A., Selmer, R.M., & Meyer, H.E. (2003). Socioeconomic status and coronary heart disease risk factors and mortality: Married residents, three countries, Norway. Norsk Epidemiologi 13(1), 155-162.

Ferdousy, E.Z., Islam, M.M., & Matin, M.A. (2013). Combination of naïve bayes classifier and K-NN in the classification based Predictive models. Computer and Information Science 6(3), 48-56.

Fiscella, K., & Franks, P. (2004). Should years of schooling be used to guide treatment of coronary risk factors? The Annals of Family Medicine Journal 2(5), 469-473.

Fox, A.J., & Goldblatt, P.O. (1982). Longitudinal study 1971-1975: England and Wales. Office of Population Censuses and Surveys. London: Her Majesty's Stationery Office.

Gupta, S., Kumar, D., & Sharma, A. (2011). Data mining classification techniques applied for breast cancer diagnosis and prognosis. Indian Journal of Computer Science and Engineering 2(2), 188-195.

Haan, M., Kaplan, G.A., & Camacho, T. (1987). Poverty and health: Prospective evidence from the Alameda County Study. American Journal of Epidemiology 125, 989-998.

Health, M. (2010). Heart disease. Retrieved from http://www.mamashealth.com/Heart_disease.asp

Heart disease. (2018). Retrieved from http://chineseschool.netfirms.com/heart-disease-causes.html

Heart disease. (2018). Retrieved from http://en.wikipedia.org/wiki/Heart_disease

Heller, R.F., Williams, H., & Sittampalam, Y. (1984). Social class and ischemic heart disease: use of the male: Female ratio to identify possible occupational hazards. Journal of Epidemiology Community Health 38, 198-202.

Helmert, U., Shea, S., Herman, B., & Greiser, E. (1990). Relationship of social class characteristics and risk factors for coronary heart disease in West Germany. Journal of Public Health 104, 399-416.

Helsing, K.J., & Comstock, G.W. (1977). What kinds of people do not use seat belts? American Journal of Public Health 67, 1043-1049.

Hongmei, Y., Yingtao, J., Jun, Z., Chenglin, P., & Qinghui, L. (2006). A multilayer Perceptron based medical decision support system for heart desease diagnosis. Expert Systems with Applications 30, 272-281.

Jabbar, M., Deekshatulu, B.L., Chandra, P., & Pillai, G.N. (2013). Heart disease prediction using lazy associative classification. In: International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing.

Jones, A., Davies, D.H., Dove, J.R., Collinson, M.A., & Brown, P. (1988). Identification and treatment of risk factors for coronary heart disease in general practice: A possible screening model. British Medical Journal 296, 1711-1714.

Kaplan, B.H, Cassel, J.C, Tyroler, H.A., Cornoni, J.C., Kleinbaum, D.G., & Hames, C.G. Occupational mobility and coronary heart disease. Archives of Journal of Internal Medicine 128, 938-942.

Kaplan, G.A., & Keil, J.E. (1993). Socioeconomic factors and cardiovascular disease: A review of the literature. American Heart Association 88, 1973-1998.

Kaplan, G.A., & Salonen, J.T. (1990). Socioeconomic conditions in childhood and ischaemic heart disease during middle age. British Medical Journal 301, 1121-1123.

Karaolis, M.A. (2010). Assessment of the risk factors of Coronary Heart Disease Events based on data mining with decision trees. Information Technology in Biomedicine, IEEE Transactions on 14(3), 559-566.

Khempila, A., & Boonjing, V. (2011). Heart disease classification using neural network and feature selection. In: Proceedings of the 21st International Conference on Systems Engineering.

King, L. (2004). Taking on heart disease. Rodale

Koskenvuo, M., Kaprio, J., Romo, M., & Langinvainio, H. (1981). Incidence and prognosis of ischemic heart disease with respect to marital status and social class: A national record linkage study. Journal of Epidemiology Community Health 35, 192-196.

Kuller, L.H. Epidemiology of cardiovascular diseases: Current perspectives. American Journal of Epidemiology 104, 425-496.

Lehman, E.W. (1976). Social class and coronary heart disease: A sociological assessment of the medical literature. Journal of Chronic Diseases 20, 381-391.

Leren, P., Helgeland, A., Hjermann, I., & Holme, I. (1988). The Oslo study: CHD risk factors, socioeconomic influences, and intervention. American Heart Journal 106, 1200-1206.

Liberatos, P., Link, B.G., & Kelsey, J.L. (1988). The measurement of social class in epidemiology. Epidemiologic Reviews 10, 87-121.

Liu, H. & Setiono, R. (1996). A probabilistic approach to feature selection: A filter solution. In: Proceedings of the 13th International Conference on International Conference on Machine Learning.

Luepker, R.V., Rosamond, D., Murphy, R., Sprafka, J.M., Folsom, A.R., McGovern, P.G., & Blackburn, H. (2015). Socioeconomic status and coronary heart disease risk factor trends: The Minnesota heart survey. Circulation 88, 2172-2179.

Luoto, R., Pekkanen, J., Uutela, A., & Tuomilehto, J. (1994). Cardiovascular risks and socioeconomic status: Differences between men and women in Finland. Journal of Epidemiology and Community Health 48, 348-354.

Lynch, J.W., Kaplan, G.A., Cohen, R.D., Tuomilehto, J., & Salonen, J.T. (1996). Do cardiovascular risk factors explain the relation between socioeconomic status, risk of all-cause mortality, cardiovascular mortality, and acute myocardial infarction? American Journal of Epidemiology 144(10), 934-942.

Marmot, M. (1989). Socioeconomic determinants of CHD mortality. International Journal of Epidemiology 18, 2172-2179.

Mokeddem, S., Atmani, B., & Mokeddem, M. (2013). Supervised feature selection for diagnosis of coronary artery disease based on genetic algorithm. Computer Science and Information Technology 2013, 41-51.

Nittaya, P. & Hataichanok, C. (2015). World heart day 2015 (pp. 1-6). Ministry of Public Health.

Peter, T.J., & Somasundaram, K. (2012). An empirical study on prediction of heart disease using classification data mining techniques. In: Proceeding of IEEE Conference on Advances in Engineering, Science and Management.

Rajeswari, K., Vaithiyanathan, V., & Pede, S.V. (2013). Feature selection for classification in medical data mining. International Journal of Emerging Trends & Technology in Computer Science 2(2), 492-497.

Rupali, M.S., & Patil, R. (2014). Heart disease prediction system using naïve bayes and Jelmek-mercer Smoothing. International Journal of Advanced Research in Computer and Communication Engineering 3(5), 6787- 6792.

Salonen, J.T. (1982). Socioeconomic status and risk of cancer, cerebral stroke, and death due to coronary heart disease and any disease: A longitudinal study in eastern Finland. Journal of Epidemiology Community Health 36, 294-297.

Saravanakumar, S. & Rinesh, S. Effective heart disease prediction using frequent feature selection method. International Journal of Innovative Research in Computer and Communication Engineering 2(1), 2767-2774.

Sellapan, P., & Rafiah, A. (2008). Intelligent heart disease prediction system using data mining techniques. International Journal of Computer Science and Network Security 8(8), 343-350.

Shantakumar, B.P., & Kumaraswamy, Y.S. (2009). Extraction of significant patterns from heart disease warehouses for heart attack prediction. International Journal of Computer Science and Network Security 9(2), 228-235.

Shouman, M., Turner, T., & Stocker, R. (2011). Using decision tree for diagnosing heart disease patients. In: Proceedings of ninth Australian Data Mining Conference in Research and Practice in Information Technology.

Siegrist, J., Bernhardt, R., Feng, Z.C., & Schettler, G. (1990). Socioeconomic differences in cardiovascular risk factors in China. International Journal of Epidemiology 19, 905-991.

Silverstein, A., Silverstein, V.B. & Nunn, L.S. (2006). Heart disease. Twenty-First Century Books.

Simons, L.A., Simons, J., Magnus, P., & Bennett, S.A. (1986). Education level and coronary risk factors in Australians. Medical Journal of Australia 145, 446-450.

Sivagowry, S., Durairaj, M., & Persia, A. (2013). An empirical study on applying data mining techniques for the analysis and prediction of heart disease. In: International Conference on Information Communication and Embedded Systems.

Smith, G.D., Shipley, M.J., & Rose, G. (1990). Magnitude and causes of socioeconomic differentials in mortality: further evidence from the Whitehall Study. Journal of Epidemiology and Community Health 44, 265-270.

Stanford Five-City Project. Preventive Medicine 21, 592-601.

Susser, M., Watson, W., & Hopper, K. (1985). Sociology in medicine. 3rd eds. New York: Oxford University Press.

Theorell, T., Svensson, J., Knox, S., & Ahlborg, B. (1987). Blood pressure variations across areas in the greater Stockholm region: analysis of 74,000 18-year-old men. Social Science & Medicine Journal 16, 469-473.

Tian, H.G., Hu, G., Dong, Q.N., Yang, X.L., Nan, Y., Pietinen, P., & Nissinen, A. (1996). Dietary sodium and potassium, socioeconomic status, and blood pressure in a Chinese population. Appetite 26, 235-246.

Vanisree, K., & Singaraju, J. (2011). Decision support system for congenital heart disease diagnosis based on signs and symptoms using neural network. International Journal of Computer Applications 19(6), 6-12.

Vathesatogkit, P., Sritara, P., Kimman, M., Hengprasit, B., E-Shyong, T., Wee, H.L., & Woodward, M. (2012). Associations of lifestyle factors, disease history and awareness with health-related quality of life in a Thai Population. PloS One 7(11), e49921.

Wing, S., Barnett, E., Casper, M., & Tyroler, H.A. (1992). Geographic and socioeconomic variation in the onset of decline of coronary heart disease mortality in white women. American Journal of Public Health 82, 204-209.

Winkleby, M.A., Fortmann, S.P., & Barrett, D.C. (1990). Social class disparities in risk factors for disease: Eight-year prevalence pattern by level of education. Preventive Medicine 19, 1-12.

Winkleby, M.A., Fortmann, S.P., & Rockhill, B. (1992.) Trends in cardiovascular disease risk factors by educational level: The

Winkleby, M.A., Jatulis, D.E., Frank, E., & Fortmann, S.P. (1992). Socioeconomic status and health: How education, income, and occupation contribute to risk factors for cardiovascular disease. American Journal of Public Health 82, 816-820.

Zhijie,Y., Aulikki, N., Erkki, V., Guide, S., Zeyu, G., Gengwen, Z., Jaakko, T., & Huiguang, T. (2000). Associations between socioeconomic status and cardiovascular risk factors in an urban population in China. Bulletin of the World Health Organization 78(11), 1296-1305.

Downloads

Published

2019-03-01