Acquiring Sentiment from Twitter using Supervised Learning and Lexicon-based Techniques
Keywords:Twitter, sentiment analysis, social media content, opinion mining, social media mining
The emergence of Twitter in Thailand has given millions of users a platform to express and share their opinions about products and services, among other subjects, and so Twitter is considered to be a rich source of information for companies to understand their customers by extracting and analyzing sentiment from Tweets. This offers companies a fast and effective way to monitor public opinions on their brands, products, services, etc. However, sentiment analysis performed on Thai Tweets has challenges brought about by language-related issues, such as the difference in writing systems between Thai and English, short-length messages, slang words, and word usage variation. This research paper focuses on Tweet classification and on solving data sparsity issues. We propose a mixed method of supervised learning techniques and lexicon-based techniques to filter Thai opinions and to then classify them into positive, negative, or neutral sentiments. The proposed method includes a number of pre-processing steps before the text is fed to the classifier. Experimental results showed that the proposed method overcame previous limitations from other studies and was very effective in most cases. The average accuracy was 84.80 %, with 82.42 % precision, 83.88 % recall, and 82.97 % F-measure.
C Zinner and C Zhou. Social Media and the Voice of the Customer. In: N Smith, R Wollan and C Zhou (eds.). The Social Media Management Handbook: Everything You Need to Know to Get Social Media Working in Your Business. John Wiley & Sons, New Jersey, 2011, p. 67-70.
W He, S Zha, and L Li. Social media competitive analysis and text mining: A case study in the pizza industry. Int. J. Inform. Manag. 2013; 33, 464-72.
N Glance, M Hurst, K Kigam, M Siegler, R Stockton and T Tomokiyo. Deriving marketing intelligence from online discussion. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Washington DC, 2005, p. 419-28.
D Gaffney. #iranElection: Quantifying online activism. In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line, Raleigh, North Carolina, 2010.
H Dong. 2013, Social Media Data Analytics applied to Hurricane Sandy. Master’s Thesis. University of Maryland, Maryland, USA.
S Asur and BA Huberman. Predicting the future with social media. In: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Washington DC, 2010, p. 492-9.
J Paniagua and J Sapena. Business performance and social media: Love or hate? Bus. Horizons 2014; 57, 719-28.
Wikipedia: “Twitter”, Available at http://en.wikipedia.org/wiki/Twitter, accessed November 2014.
A Java, X Song, T Finin and B Tseng. Why we Twitter: Understanding microblogging. In: Proceedings of the Joint 9th WebKDD and 1st SNA-KDD 2007 Workshop, San Jose, California. 2007, p. 56-65.
S Sakawee. Thailand Social Media Stats, Available at https://www.techinasia.com/thailand-social-media-stats-28-million-facebook-45-million-twitter-17-million-instagram, accessed October 2014.
F H Khan, S Bashir and U Qamar. TOM: Twitter opinion mining framework using hybrid classification scheme. Decis. Support Syst. 2014; 57, 245-57.
C Haruechaiyasak and A Kongthon. Constructing Thai opinion mining resource: a case study on hotel reviews. In: Proceedings of the 8th Workshop on Asian Language Resources, Beijing, China. 2010, p. 64-71.
C Haruechaiyasak, A Kongthon, P Palingoon and K Trakultaweekoon. S-Sense: A sentiment analysis framework for social media sensing. In: Proceedings of the Workshop on Natural Language Processing for Social Media, Nagoya, Japan, 2013, p. 6-13.
Wikipedia: “Thai alphabet (in Thai)”, Available at https://en.wikipedia.org/wiki/Thai_alphabet, accessed January 2016.
Twitter Developers: “Twitter Developer Documentation”, Available at https://dev.twitter.com/rest/ public, accessed August 2014.
C Goncalves. GitHub Inc: Twitter-text Library, Available at https://github.com/twitter/twitter-text, accessed January 2015.
Wikipedia: “List of Emoticon”, Available at https://en.wikipedia.org/wiki/List_of_emoticons, accessed January 2015.
NECTEC: “LexTo - Thai Lexeme Tokenizer (in Thai)”, Available at http://www.sansarn.com/lexto, accessed August 2014.
Wiktionary: “The Free Dictionary (in Thai)”, Available at https://th.wiktionary.org, accessed August 2015.
O Chinakarapong. Conceptual metaphor of Thai curse words (in Thai). J. Hum. Fac. Hum. Naresuan Univ. 2014; 11, 57-76.
WEKA: “Data Mining Software in Java”, Available at http://www.cs.waikato.ac.nz/ml/weka, accessed March 2015.
WEKA: “Text categorization with WEKA”, Available at https://weka.wikispaces.com/Text+ categorization+with+WEKA, accessed March 2015.
V Kasorn. 2010, Similarity Measurement of Thai Document using Natural Language Processing (in Thai). Independent Study. Chiang Mai University, Chiang Mai, Thailand.
A Bifet and E Frank. Sentiment Knowledge Discovery in Twitter Streaming Data. In: Proceedings of 13th International Conference on Discovery Science, Canberra, Australia. 2010, p. 1-15.
B Liu. Sentiment Analysis and Opinion Mining, Draft. Morgan & Claypool Publishers, 2012, p. 31.
AsianWordNet Project: “Thai WordNet”, Available at http://awn.iisilab.org, accessed January 2016.
W Wunnasri, T Theeramunkong and C Haruechaiyasak. Solving unbalanced data for Thai sentiment analysis. In: Proceedings of the 10th International Joint Conference on Computer Science and Software Engineering, Mahasarakham, Thailand, 2013, p. 200-5.
How to Cite
Copyright (c) 2016 Walailak Journal of Science and Technology (WJST)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.