Predictive Analytics of COVID-19 Pandemic: Statistical Modelling Perspective


  • S Lokesh KUMAR Vellore Institute of Technology, Chennai, India
  • Vergin Raja SAROBIN M Vellore Institute of Technology, Chennai, India
  • Jani ANBARASI L Vellore Institute of Technology, Chennai, India



COVID-19 forecasting, Machine learning, Regression, Time series prediction, Deep learning, Statistical modelling


The novel Coronavirus-19 (COVID-19) is an infectious disease and it causes serious lung injury. COVID-19 induces human disease, which has killed numerous people around the world. Moreover, the World Health Organization (WHO) declares this virus as a pandemic and all countries attempt to monitor and control it by locking all places. The illness induces respiratory influenza like problems with symptoms such as cold, cough, fever, and the difficulty of breathing in extremely severe cases. COVID-2019 has been viewed as a global pandemic, and a few analyses are being performed using multiple computational methods to predict the possible development of this pestilence. Considering the various conditions and inquiries these numerical models are based on future tendency. Multiple techniques have been proposed that could be helpful in forecasting the spread of COVID-19. Through statistical modeling on the COVID-19 data, we performed linear regression, random forest, ARIMA and LSTMs, to estimate the empirical indication of COVID-19 ailment and intensity in 4 countries (USA, India, Brazil, and Russia), in order to come up with a better validation.


  • Provide a comparative analysis taking into consideration the dataset from Our World in Data of 4 countries (Brazil, India, Russia, and USA), to provide optimum test cases and validation across multiple trend patterns focused at 2 forecasting events namely, cases forecasting and deaths forecasting
  • Three statistical modelling approaches (linear regression, random forest, and ARIMA) and 1 deep learning approach (LSTMs) were explored
  • Performance metrics and diagnostic tools such as residuals, correlograms, RMSE, AIC and BIC were implemented to monitor models’ accuracy
  • ARIMA outperformed linear regression and random forest in terms of accuracy prediction of test data
  • ARIMA and LSTMs were compared again with death forecasting task, in which LSTMs were able to provide very high accuracy in comparison



Download data is not yet available.


Metrics Loading ...


C Sohrabi, Z Alsafi, N O’Neill, M Khan, A Kerwan, A Al-Jabir, C Iosifidis and R Agha. World health organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19). Int. J. Surg. 2020; 76, 71-6.

X Zhang, R Ma and L Wang. Predicting turning point, duration and attack rate of COVID-19 out-breaks in major Western countries. Chaos Solitons Fractals 2020; 135, 109829.

S Boccaletti, W Ditto, G Mindlin and A Atangana. Modeling and forecasting of epidemic spreading: The case of COVID-19 and beyond. Chaos Solitons Fractals 2020; 135, 109794.

D Fanelli and F Piazza. Analysis and forecast of COVID-19 spreading in China, Italy and France. Chaos Solitons Fractals 2020; 134, 109761.

JK Davis, T Gebrehiwot, M Worku, W Awoke, A Mihretie, D Nekorchuk and MC Wimberly. A genetic algorithm for identifying spatially-varying environmental drivers in a malaria time series model. Environ. Model. Softw. 2019; 119, 275-84.

MVR Sarobin, S Alphonse, M Gupta and T Joshi. Rapid eye movement monitoring system using artificial intelligence techniques. In: Proceedings of the International Conference on Information Management & Machine Intelligence, Singapore. 2019, p. 605-10.

A Gondalia, D Dixit, S Parashar, V Raghava and A Sengupta. IoT-based healthcare monitoring system for war soldiers using machine learning. Procedia Comput. Sci. 2018; 133, 1005-13.

MVR Sarobin and R Ganesan. Swarm intelligence in wireless sensor networks: A survey. Int. J. Pure Appl. Math. 2015; 101, 773-807.

S Vasudevan, N Chauhan, V Sarobin and S Geetha. Image-based recommendation engine using VGG model. In: Proceedings of the Advances in Communication and Computational Technology, Singapore. 2021, p. 257-65.

A Chazhoor, Y Mounika, MVR Sarobin, MV Sanjana and R Yasashvini. Predictive maintenance using machine learning based classification models. IOP Conf. Ser. Mater. Sci. Eng. 2020; 954, 012001.

R Vaishya, M Javaid, IH Khan and A Haleem. Artificial intelligence (AI) applications for COVID-19 pandemic. Diabetes Metab. Syndr. 2020; 14, 337-9.

Wang V. Coronavirus epidemic keeps growing, but spread in China slows. New York Times, Avail-able at: referring-Source=articleShare, accessed February 2020.

BBC. Coronavirus: Sharp increase in deaths and cases in Hubei, Available at:, accessed February 2020.

Medicine Net. Flu kills 646,000 people worldwide each year: Study finds, Available at:, accessed February 2020.

S Makridakis, A Wakefield and R Kirkham. Predicting medical risks and appreciating uncertainty. Foresight 2019; 52, 28-35.

KH Jacobsen. Will COVID-19 generate global preparedness? Lancet 2020; 395, 1013-4.

TC Chu, CT Tsao and YR Shiue. Application of fuzzy multiple attribute decision making on com-pany analysis for stock selection. In: Proceedings of the Soft Computing in Intelligent Systems and Information Processing. Proceedings of the 1996 Asian Fuzzy Systems Symposium, Kenting, Tai-wan. 1996, p. 509-14.

X Yan and NA Chowdhury. Midterm electricity market clearing price forecasting using two-stage multiple support vector machine. J. Energy 2015; 2015, 384528.

S Zhang, M Diao, W Yu, L Pei, Z Lin and D Chen. Estimation of the reproductive number of novel coronavirus (COVID-19) and the probable outbreak size on the diamond princess cruise ship: A da-ta-driven analysis. Int. J. Infect. Dis. 2020; 93, 201-4.

X Yan and NA Chowdhury. Mid-term electricity market clearing price forecasting utilizing hybrid support vector machine and auto-regressive moving average with external input. Int. J. Electr. Power Energy Syst. 2014; 63, 64-70.

IAWA Razak, IZ Abidin, YK Siah, AAZ Abidin, TKA Rahman, N Baharin and MH Jali. An opti-mization method of genetic algorithm for LSSVM in medium term electricity price forecasting. J. Telecommun. Electron. Comput. Eng. 2018; 10, 99-103.

H Frohlich, O Chapelle and B Scholkopf. Feature selection for support vector machines by means of genetic algorithm. In: Proceedings of the 15th IEEE International Conference on Tools with Artifi-cial Intelligence, Sacramento, CA, USA. 2003, p. 142-8.

HI Fawaz, G Forestier, J Weber, L Idoumghar and PA Muller. Deep learning for time series classi-fication: A review. Data Min. Knowl. Discov. 2019; 33, 917-63.

M Längkvist, L Karlsson and A Loutfi. A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognit. Lett. 2014; 42, 11-24.

JCB Gamboa. Deep learning for time-series analysis. ArXiv 2017. Available at:, accessed February 2020.




How to Cite

KUMAR, S. L. ., SAROBIN M, V. R. ., & ANBARASI L, J. . (2021). Predictive Analytics of COVID-19 Pandemic: Statistical Modelling Perspective. Walailak Journal of Science and Technology (WJST), 18(16), Article 15583 (14 pages).