Correcting Misreported Multinomial Outcome Data Based on Logistic Regression Model with Application to Stroke Mortality in Thailand



Causes of death in Thailand are misreported; about 40 % of deaths have been recorded as “ill-defined”. This study aims to describe statistical methods to correct misreported multinomial outcome by using verbal autopsy (VA) data. Since the outcome is a nominal variable, with 21 levels, the appropriate model for systematic analysis of death by ICD-10 code is multinomial regression. Moreover, it is simpler and more informative to separately fit logistic regression models to the 21 outcome cause groups, and then rescale the results to ensure that the total number of estimated deaths for each group match those reported in the corresponding populations. This method also gives confidence intervals for percentages of deaths in cause groups for levels of each risk factor, adjusted for other risk factors. These confidence intervals are compared with bar charts of sample percentages, to assess evidence of confounding bias. The methods were illustrated using stroke deaths. Area plots are used to show results by gender, age group, and year. The most misclassified stroke deaths were ill-defined, other cardio vascular disease, mental and nerve (outside-hospital), septicemia, and respiratory disease (in-hospital).


Logistic regression, misreported deaths, multinomial regression, stroke deaths, Thailand

Full Text:



World Health Organization. World Health Statistics 2012. Geneva, Switzerland, 2012.

CD Mathers, DM Fat, M Inoue, C Rao and AD Lopez. Counting the dead and what they died from: an assessment of the global status of cause of death data. Bull. World Health Organ. 2005; 83, 171-7.

V Tangcharoensathien, P Faramnuayphol, W Teokul, K Bundhamcharoen and S Wibulpholprasert. A critical assessment of mortality statistics in Thailand: Potential for improvements. Bull. World Health Organ. 2006; 84, 233-9.

P Vapattanawong and P Prasartkul. Under-registration of deaths in Thailand in 2005-2006: Results of cross-matching data from two sources. Bull. World Health Organ. 2011; 89, 806-12.

DL Brown, F Al-Senani, LD Lisabeth, MA Farnie, LA Colletti, KM Langa, AM Fendrik, NM Garcia, MA Smith and LM Morgenrtern. Defining cause of death in stroke patients: The brain attack surveillance in Corpus Christi project. Am. J. Epidemiol. 2007; 165, 591-6.

J Pattaraarchachai, C Rao, W Polprasert, Y Porapakkham, W Poa-in, S Nophcha and AD Lopez. Cause-specific mortality patterns among hospital deaths in Thailand: Validating routine death certification. Popul. Health Metr. 2010; 8, 12.

C Rao, Y Porapakkham, J Pattaraarchachai, W Polprasert, N Swampunyalert and AD Lopez. Verifying causes of death in Thailand: Rationale and methods for empirical investigation. Popul. Health Metr. 2010; 8, 11.

Y Porapakkham, C Rao, J Pattaraachachai, W Polprasert, T Vos, T Adair and AD Lopez. Estimated causes of death in Thailand, 2005: Implications for health policy. Popul. Health Metr. 2010; 8, 14.

W Polprasert, C Rao, T Adair, J Pattaraachachai, Y Porapakkham and AD Lopez. Cause-of-death ascertainment for deaths that occur outside hospitals in Thailand: Application of verbal autopsy methods. Popul. Health Metr. 2010; 8, 13.

A Chutinantakul, P Tongkumchum, K Bundhamcharoe and V Chongsuvivatwong. Correcting and estimating HIV mortality in Thailand based on 2005 verbal autopsy data focusing on demographic factors, 1996-2009. Popul. Health Metr. 2014; 12, 25.

N Kinjun, A Lim and K Bundhamcharoen. A logistic regression model for estimating transport accident deaths using verbal autopsy data. Asia Pac. J. Public Health 2015; 27, 286-92.

S Waeto, N Pipatjaturon, P Tongkumchum, C Choonpradub, R Saelim and N Makaje. Estimating liver cancer deaths in Thailand based on verbal autopsy study. J. Res. Health Sci. 2014; 14, 18-22.

A Agresti. Categorical Data Analysis. 2nd ed. John Wiley & Sons, New Jersey, 2002.

B Huitema. The Analysis of Covariance and Alternatives: Statistical Methods for Experiments, Quasi-experiments, and Single-case Studies. 2nd ed. John Wiley & Sons, New Jersey, 2011.

CB Begg and R Gray. Calculation of polychotomous logistic regression parameters using individualized regressions. Biometrika 1984, 71, 11-8.

C Jalayondeja, J kaewkungwal, PE Sullivan, S Nidhinandana and S Pichaiyongwongdee. Factors related to community participation by stroke victims six month post-stroke. Southeast Asian J. Trop. Med. Public Health 2011; 42, 1005-13.

S Hanchaiphiboolkul, N Poungvarin, S Nidhinandana, NC Suwanwela, P Puthkhao, S Towanabut, J Suwantamee and M Samsen. Prevalence of stroke and stroke risk factors in Thailand: Thai Epidemiologic Stroke (TES) study. J. Med. Assoc. Thai. 2011; 94, 427-36.

World Health Organization. ICD-10 International Statistical Classification of Diseases and Related Health Problems. Geneva, Switzerland, 2004.

DW Hosmer and S Lemeshow. Applied Logistic Regression. 2nd ed. John Wiley & Sons, New Jersey, 2000.

WN Venables and BD Ripley. Modern Applied Statistics with S. 4th ed. Springer-Verlag, New York, 2002.

P Tongkumchum and D McNeil. Confidence interval using contrasts for regression model. Songklanakarin J. Sci. Tech. 2009; 31, 151-6.

N Kongchouy and U Sampantarak. Confidence intervals for adjusted proportions using logistic regression. Mod. Appl. Sci. 2010; 4, 2-7.

U Sampantarak, N Kongchouy and M Kuning. Democratic confidence intervals for adjusted means and incidence rates. Am. Int. J. Contemp. Res. 2011; 1, 38-43.

SK Sarkar and H Midi. Importance of assessing the model adequacy of binary logistic regression. J. Appl. Sci. 2010; 10, 479-86.

J Fan, S Upadhye and A Worster. Understanding receiver operating characteristic (ROC) curves. Can. J. Emerg. Med. 2006; 8, 19-20.

GF Bonham-Carter. Geographic Information Systems for Geoscientists: Modelling with GIS. Pergamon, Oxford, 1994.

R Core Team. A Language and Environment for Statistical Computing, Available from:, accessed August 2014.

IM Moriyama. Problems in measurement of accuracy of cause-of-death statistics. Am. J. Public Health 1989; 79, 1349-50.

A Khosravi, C Rao, M Naghavi, R Taylor and N Jafari. Impact of misclassification on measure of cardiovascular disease mortality in the Islamic Republic of Iran: a cross-sectional study. Bull. World Health Organ. 2008; 86, 688-96.

RA Lahti and A Penttilä. The validity of death certificates: Routine validation of death certification and its effects on mortality statistics. Forensic Sci. Int. 2001; 115, 15-32.

G Vanagas. Receiver operating characteristic curves and comparison of cardiac surgery risk stratification systems. Interact. Cardiovasc. Thorac. Surg. 2004; 3, 319-22.

DR Lakkireddy, MS Gowda, CW Murray, KR Basarakodu and JL Vacek. Death certificate completion: How well are physicians trained and are cardiovascular causes overstated? Am. J. Med. 2004; 117, 492-8.

LA Johanssan and R Westerling. Comparing hospital discharge records with death certificates: Can the differences be explained? J. Epidemiol. Comm. Health 2002; 56, 301-8.

NC Suwanwela. Stroke epidemiology in Thailand. J. Stroke 2014; 16, 1-7.

V Chongsuvivatwong, T Yipintsoi, P Suriyawongpaisal, S Cheepudomwit, W Aekplakorn, P Faramnuayphol, P Tatsanavivat, V Kosulwat, S Thamthitiwat and C Nuntawan. Comparison of cardiovascular risk factors in five regions of Thailand: InterASIA data. J. Med. Assoc. Thai. 2010; 93, 17-26.

P Faramnuayphol, V Chongsuvivatwong and S Panarunothai. Geographical variation of mortality in Thailand. J. Med. Assoc. Thai. 2008; 91, 1455-60.

K Kongbunkiat, N Kaemsap, K Thepsuthammarat, S Tiamkao and K Sawanyawisuth. National data on stroke outcomes in Thailand. J. Clin. Neurosci. 2015; 22, 493-497.

DG Hoy, C Rao, NP Hoa, S Suhardi and AM Lwin. Stroke mortality variations in South-East Asia: Empirical evidence from the field. Int. J. Stroke 2013; 8, 21-7.


  • There are currently no refbacks.

Online ISSN: 2228-835X

Last updated: 13 February 2019