Hierarchical Text Categorization Using Level Based Neural Networks of Word Embedding Sequences with Sharing Layer Information

Mongkud KLUNGPORNKUN; Peerapon VATEEKUL

doi:10.48048/wjst.2019.4145

Authors

Mongkud KLUNGPORNKUN Department of Computer Engineering, Chulalongkorn University, Bangkok 10330
Peerapon VATEEKUL Department of Computer Engineering, Chulalongkorn University, Bangkok 10330

DOI:

https://doi.org/10.48048/wjst.2019.4145

Keywords:

Text categorization, hierarchical multi-label classification, deep learning

Abstract

In text corpora, it is common to categorize each document to a predefined class hierarchy, which is usually a tree. One of the most widely-used approaches is a level-based strategy that induces a multiclass classifier for each class level independently. However, all prior attempts did not utilize information from its parent level and employed a bag of words rather than considered a sequence of words. In this paper, we present a novel level-based hierarchical text categorization with a strategy called “sharing layer information” For each class level, a neural network is constructed, where its input is a sequence of word embedding vectors generated from Convolutional Neural Networks (CNN). Also, a training strategy to avoid imbalance issues is proposed called “the balanced resampling with mini-batch training” Furthermore, a label correction strategy is proposed to conform the predicted results from all networks on different class levels. The experiment was conducted on 2 standard benchmarks: WIPO and Wiki comparing to a top-down based SVM framework with TF-IDF inputs called “HR-SVM.” The results show that the proposed model can achieved the highest accuracy in terms of micro F1 and outperforms the baseline in the top levels in terms of macro F1.

Downloads

Download data is not yet available.

References

T Mikolov, K Chen, G Corrado and J Dean. Efficient Estimation of Word Representations in Vector Space. Available at: https://arxiv.org/abs/1301.3781, accessed January 2017.

J Rousu, C Saunders, S Szedmak and J Shawe-Taylor. Kernel-based learning of hierarchical multilabel classification models. J. Mach. Learn. Res. 2006; 7, 1601-26.

R Cerri, RC Barros, AC de Carvalho and Y Jin. Reduction strategies for hierarchical multi-label classification in protein function prediction. BMC Bioinform. 2016; 17, 373.

J Fan, Y Gao and H Luo. Hierarchical classification for automatic image annotation. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, Netherlands, 2007, p. 111-8.

Y Kim. Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 2014, p. 1746-51.

P Vateekul, M Kubat and K Sarinnapakorn. 2012, Top-down optimized SVMs for hierarchical multi-label classification: A case study in gene function prediction. Ph. D. Dissertation. University of Miami, Florida, USA.

CC Chang and CJ Lin. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Tech. 2011; 2, 27.

A Kosmopoulos, I Partalas, E Gaussier, G Paliouras and I Androutsopoulos. Evaluation measures for hierarchical classification: A unified view and novel approaches. Data Min. Knowl. Discov. 2015; 29, 820-65.

CN Silla and AA Freitas. A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 2011; 22, 31-72.

T Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the 21st International Conference on Machine Learning, Alberta, Canada, 2004, p. 116.

P Zhao and T Zhang. Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling. Available at: https://arxiv.org/abs/1405.3080, accessed January 2017.

K Sechidis, G Tsoumakas and I Vlahavas. On the stratification of multi-label data. Lect. Notes Comput. Sci. 2011; 6913, 145-58.

D Tikk, G Biró and JD Yang. Experiment with a hierarchical text categorization method on WIPO patent collections. Int. Intell. Tech. 2005; 20, 283-302.

I Partalas, A Kosmopoulos, N Baskiotis, T Artieres, G Paliouras, E Gaussier, I Androutsopoulos, MR Amini and P Galinari. LSHTC: A Benchmark for Large-scale Text Classification. Available at: https://arxiv.org/abs/1503.08581, accessed January 2017.

G Hinton, N Srivastava and K Swersky. Lecture 6a Overview of Mini-batch Gradient Descent. Availiable at: https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf, accessed September 2018.