Improving Answer Retrieval from Web Forums with Topic Model and Ontology

Authors

  • Kanda Runapongsa SAIKAEW Department of Computer Engineering, Faculty of Engineering, Khon Kaen University, Khon Kaen 40002
  • Seksan POLTREE Department of Computer Engineering, Faculty of Engineering, Khon Kaen University, Khon Kaen 40002
  • Kornchawal CHAIPAH Department of Computer Engineering, Faculty of Engineering, Khon Kaen University, Khon Kaen 40002
  • Choochart HARUECAIYASAK National Electronics and Computer Technology Center, Pathum Thani 12120

Keywords:

Latent dirichlet allocation, ontology construction, question answering system, natural language processing

Abstract

Searching for information online has become an essential part of modern living. For a general domain, one can use a tool such as a search engine to find information. However, for domain-specific questions, other means, such as a web forum, are preferable. Common problems with web forums are post duplication and poor search results for long query strings, such as sentences. To overcome these issues, we propose a system based on a language model and ontology. Given a question, the system performs language processing to analyze the syntactic structure of the sentence and its category. The question sentence is classified using regular expressions, keyword matching, and query templates. Based on knowledge constructed from the existing information in the web forum, we can retrieve the suggested links to the answers for the given question. To the best of our knowledge, our paper is the first article that attempts to understand questions and to suggest existing sources of answers in a Thai web forum. We compared our 2 proposed subsystems, language-model-based and ontology-based, with the Google custom web search engine. We evaluated the systems by using a Thai breastfeeding web forum containing 6,823 threads with 75,906 messages. The evaluation results show that the proposed systems have fewer duplicated suggestions compared with the Google search engine. Moreover, for the input with some ambiguous keywords, our proposed system, based on a language model, outperforms the Google search engine, because the system based on a language model is better at finding related terms. For a question with no frequently found keywords, our proposed system, based on ontology, suggests answers which are more appropriate than answers from the Google search engine, because it contains related knowledge defined by experts.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biography

Kanda Runapongsa SAIKAEW, Department of Computer Engineering, Faculty of Engineering, Khon Kaen University, Khon Kaen 40002

Kanda Runapongsa Saikaew was born in Chiang Rai, Thailand, in 1975. He received the B.S. degree in electrical and computer engineering from Carnegie Mellon University, Pennsylvania, USA, in 1997, and the M.S. and Ph.D. degrees in computer science and engineering from the University of Michigan at Ann Arbor, in 1999 and 2003, respectively.

In 2003, she joined the Department of Computer Engineering, Khon Kaen University, as a Lecturer, and became an Assistant Professor in 2006.  Her current research interests include  social network analysis, biomedical engineering, and mobile web information systems.

References

K Inui, S Abe, K Hara, H Morita, C Sao, M Eguchi, A Sumida, K Murakami and S Matsuyoshi. Experience mining: Building a large-scale database of personal experiences and opinions from web documents. In: Proceeding of the IEEE/WIC/ACM International Conference Web Intelligence and Intelligent Agent Technology 2008. IEEE Computer Society, Sydney, Australia, 2008, p. 314-21.

IH Ting, PS Chang and SL Wang. Understanding microblog users for social recommendation based on social networks analysis. J. Univ. Comput. Sci. 2012; 18, 554-76.

Z Chen and D Wen. A new web-service-based architecture for question answering. In: Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, IEEE Computer Society, Beijing, China, 2005, p. 179-83.

NI Al-Rajebah and HS Al-Khalifa. Extracting ontologies from Arabic Wikipedia: A linguistic approach. Arabian J. Sci. Eng. 2014; 39, 2749-71.

M Shaheen and AM Ezzeldin. Arabic question answering: Systems, resources, tools, and future trends. Arabian J. Sci. Eng. 2014; 39, 4541-64.

G Chen and MM Chiu. Online discussion processes: Effects of earlier messages’ evaluations, knowledge content, social cues and personal information on later messages. Comput. Educ. 2008; 50, 678-92.

F Lamberti, A Sanna and C Demartini. A relation-based page rank algorithm for semantic web search engines. IEEE Trans. Knowl. Data Eng. 2009; 21, 123-36.

A Kongthon, S Kongyoung, C Haruechaiyasak and P Palingoon. A semantic based question answering system for thailand tourism information. In: Proceedings of the Knowledge and Reasoning for Answering Questions 2011. Chiang Mai, Thailand, 2011, p. 38-42.

W Jitkrittum, C Haruechaiyasak and T Theeramunkong. Qast: Question answering system for thaiwikipedia. In: Proceedings of the 2009 Workshop on Knowledge and Reasoning for Answering Questions, Association for Computational Linguistics, Suntec, Singapore, 2009, p. 11-4.

M Suktarachan, P Rattanamanee and A Kawtrakul. The development of a question-answering services system for the farmer through sms: Query analysis. In: Proceeding of the 2009 Workshop on Knowledge and Reasoning for Answering Questions, ACL and AFNLP, Suntec, Singapore, 2009, p. 3-10.

TF Gharib, N Badr, S Haridy and A Abraham. Enriching ontology concepts based on texts from www and corpus. J. Univ. Comput. Sci. 2012; 16, 2234-51.

P Charoenpornsawat. Software: Swath - Thai Word Segmentation, Available at: http://www.cs.cmu.edu/~paisarn/software.html, accessed May 2014.

Linux.thai.net: “libthai library”, Available at: http://linux.thai.net/projects/libthai, accessed May 2014.

S Poltree and KR Saikaew. Thai word segmentation web service. In: Proceedings of the Joint International Symposium on Natural Language Processing and Agricultural Ontology Service, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand, 2011, p. 115-9.

M Boriboon, K Kriengket, P Chootrakool, S Phaholphinyo, S Purodakananda, T Thanakulwarapas and K Kosawat. Best corpus development and analysis. In: Proceedings of the 2009 International Conference on Asian Language Processing 2009, IEEE Computer Society, Washington DC, USA, 2009, p. 322-7.

EM Voorhees and DM Tice. Building a question answering test collection. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, 2000, p. 200-7.

Downloads

Published

2015-03-24

How to Cite

SAIKAEW, K. R., POLTREE, S., CHAIPAH, K., & HARUECAIYASAK, C. (2015). Improving Answer Retrieval from Web Forums with Topic Model and Ontology. Walailak Journal of Science and Technology (WJST), 13(6), 451–463. Retrieved from https://wjst.wu.ac.th/index.php/wjst/article/view/1417

Issue

Section

Research Article