Unify Framework for Crime Data Summarization using RSS Feed Service

Tichakorn NETSUWAN, Kraisak KESORN


This research presents online crime news analysis using text mining, Natural Language Processing framework (General Framework for Text Mining: GATE), and data warehouse (DW) technologies. The proposed framework aims at extracting key features of crime data available on newspaper website and classifies them into crime categories which are later transformed into a star schema for speedy retrieving and online analytical processing (OLAP). This system can present data in multidimensional structure to perform data analytics to support police officers for determining the security policies to protect locals and tourists who live in the risk areas. The main novelty of this framework is the demonstration of using information available through RSS feed service to generate reports to support decision making. The experimental results show that the extracted data from the Internet can effectively represent the actual crime data occurred in the study areas (low error rate) and allow data analysts to get an insight of the information represented through OLAP.


Crime news, data analytics, OLAP, text mining, data warehouse

Full Text:



“South-Eastern Asia: Crime Index by Country 2016 Midyear, Available at: http://www.numbeo.com/crime/rankings_by_country.jsp?title=2016-mid&region=035, accessed September 2016.

S Shojaee, A Mustapha, F Sidi, and A J Marzanah. A study on classification learning algorithms to predict crime status. Int. J. Digit. Content Tech. Its Appl. 2013; 7, 361-9.

Y Yang, JG Carbonell, RD Brown, T Pierce, BT Archibald and X Liu. Learning approaches for detecting and tracking news events. IEEE Intell. Syst. 1999; 14, 32-43.

YW Seo, J Giampapa and K Sycara. Financial News Analysis for Intelligent Portfolio Management. Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, Technical Report CMU-RI-TR-04-04, 2004.

J Wanglee, C Thaina, S Yodkaew and L Preechaveerakul. Automatic news aggregator system based on users’ preference. In: Proceedings of the Conference of Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology Association of Thailand. Bangkok, Thailand, 2009, p. 155-60.

S Sudhahar, R Franzosi and N Cristianini. Automating quantitative narrative analysis of news data. In: Proceedings of the 2nd Workshop on Applications of Pattern Analysis. Castro Urdiales, UK, 2011, p. 63-71.

W Wang, X Cui and A Wang. News analysis based on meta-synthesis approach. In: Proceedings of the 32nd Annual IEEE International Computer Software and Applications Conference. Turku, Finland, 2008, p. 923-8.

H Cunningham, D Maynard, K Bontcheva and V Tablan. GATE: An architecture for development of robust HTL applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. Philadelphia, 2002, p. 168-75.

Committing Crime Topic from the Oxford Advanced Learner’s Dictionary, Available at: http://www.oxfordlearnersdictionaries.com/topic/committing_crime, accessed September 2016.

Crime Vocabulary, Crime Word List, Available at: https://myvocabulary.com/word-list/crime-vocabulary, accessed September 2016.

P Krongyuth, K Pattanagul, Y Tongrasit, W Chaisiwamongkol, S Ungpansattawong, R Naimsanit and A Maneesriwongul. A multivariate statistical analysis of crime in provincial level of Thailand. KKU Res. J. 2013; 18, 642-50.

T Joachims. A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization. In: Proceedings of the 14th International Conference on Machine Learning. San Francisco, USA, 1997, p. 143-51.

JA Swets. Signal Detection Theory and Roc Analysis in Psychology and Diagnostics: Collected Papers, Available at: https://www.questia.com/library/91082318/signal-detection-theory-and-roc-analysis-in-psychology, accessed September 2016.

J Fogarty, RS Baker and SE Hudson. Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction. In: Proceedings of Graphics Interface. Victoria, British Columbia, 2005, p. 129-36.

K Kesorn, P Ongruk, J Chompoosri, U Thavara, A Tawatsin and P Siriyasatien. Morbidity rate prediction of Dengue Hemorrhagic Fever (DHF) using the Support Vector Machine and the Aedes aegypti infection rate in similar climates and geographical areas. Plos One 2015; 10, e0125049.


  • There are currently no refbacks.


Online ISSN: 2228-835X


Last updated: 4 July 2018