Improving Answer Retrieval from Web Forums with Topic Model and Ontology
Keywords:
Latent dirichlet allocation, ontology construction, question answering system, natural language processingAbstract
Searching for information online has become an essential part of modern living. For a general domain, one can use a tool such as a search engine to find information. However, for domain-specific questions, other means, such as a web forum, are preferable. Common problems with web forums are post duplication and poor search results for long query strings, such as sentences. To overcome these issues, we propose a system based on a language model and ontology. Given a question, the system performs language processing to analyze the syntactic structure of the sentence and its category. The question sentence is classified using regular expressions, keyword matching, and query templates. Based on knowledge constructed from the existing information in the web forum, we can retrieve the suggested links to the answers for the given question. To the best of our knowledge, our paper is the first article that attempts to understand questions and to suggest existing sources of answers in a Thai web forum. We compared our 2 proposed subsystems, language-model-based and ontology-based, with the Google custom web search engine. We evaluated the systems by using a Thai breastfeeding web forum containing 6,823 threads with 75,906 messages. The evaluation results show that the proposed systems have fewer duplicated suggestions compared with the Google search engine. Moreover, for the input with some ambiguous keywords, our proposed system, based on a language model, outperforms the Google search engine, because the system based on a language model is better at finding related terms. For a question with no frequently found keywords, our proposed system, based on ontology, suggests answers which are more appropriate than answers from the Google search engine, because it contains related knowledge defined by experts.
Downloads
Metrics
References
K Inui, S Abe, K Hara, H Morita, C Sao, M Eguchi, A Sumida, K Murakami and S Matsuyoshi. Experience mining: Building a large-scale database of personal experiences and opinions from web documents. In: Proceeding of the IEEE/WIC/ACM International Conference Web Intelligence and Intelligent Agent Technology 2008. IEEE Computer Society, Sydney, Australia, 2008, p. 314-21.
IH Ting, PS Chang and SL Wang. Understanding microblog users for social recommendation based on social networks analysis. J. Univ. Comput. Sci. 2012; 18, 554-76.
Z Chen and D Wen. A new web-service-based architecture for question answering. In: Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, IEEE Computer Society, Beijing, China, 2005, p. 179-83.
NI Al-Rajebah and HS Al-Khalifa. Extracting ontologies from Arabic Wikipedia: A linguistic approach. Arabian J. Sci. Eng. 2014; 39, 2749-71.
M Shaheen and AM Ezzeldin. Arabic question answering: Systems, resources, tools, and future trends. Arabian J. Sci. Eng. 2014; 39, 4541-64.
G Chen and MM Chiu. Online discussion processes: Effects of earlier messages’ evaluations, knowledge content, social cues and personal information on later messages. Comput. Educ. 2008; 50, 678-92.
F Lamberti, A Sanna and C Demartini. A relation-based page rank algorithm for semantic web search engines. IEEE Trans. Knowl. Data Eng. 2009; 21, 123-36.
A Kongthon, S Kongyoung, C Haruechaiyasak and P Palingoon. A semantic based question answering system for thailand tourism information. In: Proceedings of the Knowledge and Reasoning for Answering Questions 2011. Chiang Mai, Thailand, 2011, p. 38-42.
W Jitkrittum, C Haruechaiyasak and T Theeramunkong. Qast: Question answering system for thaiwikipedia. In: Proceedings of the 2009 Workshop on Knowledge and Reasoning for Answering Questions, Association for Computational Linguistics, Suntec, Singapore, 2009, p. 11-4.
M Suktarachan, P Rattanamanee and A Kawtrakul. The development of a question-answering services system for the farmer through sms: Query analysis. In: Proceeding of the 2009 Workshop on Knowledge and Reasoning for Answering Questions, ACL and AFNLP, Suntec, Singapore, 2009, p. 3-10.
TF Gharib, N Badr, S Haridy and A Abraham. Enriching ontology concepts based on texts from www and corpus. J. Univ. Comput. Sci. 2012; 16, 2234-51.
P Charoenpornsawat. Software: Swath - Thai Word Segmentation, Available at: http://www.cs.cmu.edu/~paisarn/software.html, accessed May 2014.
Linux.thai.net: “libthai library”, Available at: http://linux.thai.net/projects/libthai, accessed May 2014.
S Poltree and KR Saikaew. Thai word segmentation web service. In: Proceedings of the Joint International Symposium on Natural Language Processing and Agricultural Ontology Service, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand, 2011, p. 115-9.
M Boriboon, K Kriengket, P Chootrakool, S Phaholphinyo, S Purodakananda, T Thanakulwarapas and K Kosawat. Best corpus development and analysis. In: Proceedings of the 2009 International Conference on Asian Language Processing 2009, IEEE Computer Society, Washington DC, USA, 2009, p. 322-7.
EM Voorhees and DM Tice. Building a question answering test collection. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, 2000, p. 200-7.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.