Theses and Capstone Projects Plagiarism Checker using Kolmogorov Complexity Algorithm

  • Marco Jr. DEL ROSARIO Laguna State Polytechnic University, San Pablo City, Laguna, Philippines
  • Julius SARENO Technological University of the Philippines, Manila, Philippines
Keywords: Plagiarism checker, Boyer-Moore algorithm, Kolmogorov complexity algorithm, Normalized compression distance


In education, students attempt to copy previous works and are relying on prepared solutions available on the Internet in order to meet their requirements. This action leads to plagiarism, which is becoming part of educational institutions’ concern to reduce growing academic dishonesty. With regards to the aforementioned issue, this study aims to design and develop a plagiarism checker capable of registering documents, granting access to users, and calculating the similarity between documents. Thus, the software was constructed using HTML, PHP, JavaScript, CSS, and MySQL. The developed system is composed of three main modules; the Document Search which enables users to browse documents, the Document Registration which enables the administrator to add and manage the stored documents, and the document Comparison, which serves as the system plagiarism detection mechanism. The algorithm Normalized Compression Distance was used to measure similarity and the Boyer-Moore Algorithm to highlight the suspected plagiarized document. Moreover, tests were conducted to determine if the system is functioning as expected and to measure the accuracy of the output produced by the system. The developed system was evaluated using the ISO 25010 software quality model in terms of Product Quality and was rated by one hundred respondents. The system obtained a mean of 4.70 which is equivalent to “excellent” in descriptive terms. This validates that the objectives of the study were met and achieved. This further indicates that the system was developed according to its desired functions and requirements.


Download data is not yet available.

Author Biography

Marco Jr. DEL ROSARIO, Laguna State Polytechnic University, San Pablo City, Laguna, Philippines

College of Computer Studies

Instructor I


G Helgesson and S Eriksson. Plagiarism in research. Med. Health Care Philos. 2015; 18, 91-101.

G Reynolds. Ethics in Information Technology. Nelson Education, 2011.

MJD Rosario. Student paper comparison system using Kolmogorov complexity and diff algorithm. Thai J. Phys. 2019; 36, 9-27.

J Eya. 2007, Development of Plagiarism Detector for the Family of C Program Source Codes. MIT Research Project. Technological University of the Philippines, Manila, Philippines.

W Badke. Training plagiarism detectives: The law and order approach. Online 2007; 31, 50-2.

Repositories Support Project, Available at:, accessed July 2016.

S Kim and W Lee. Global data repository status and analysis: Based on Korea, China and Japan. Library Hi Tech. 2014; 32, 706-22.

A Drozdek. Data Structures and Algorithms in Java. 2nd ed. Boston, Massachusetts: Course Technology, Thomson Learning, USA, 2010.

G Barnett and LD Tongo. Data Structures and Algorithms: Annotated Reference with Examples. 1st ed. DotNetSlackers, 2007.

K Mehlhorn and P Sanders. Algorithms and Data Structures: The Basic Toolbox. Springer Science & Business Media, 2008.

WM Allen. Data Structures and Algorithm Analysis in C++. Pearson Education India, 2007.

M Khairullah. Enhancing worst sorting algorithms. Int. J. Adv. Sci. Tech. 2013; 56, 13-26.

M Alam and A Chugh. Sorting algorithm: An empirical analysis. Int. J. Eng. Sci. Innovat. Tech. 2014; 3, 118-26.

Searching Algorithm from IDC Technologies, Available at: Algorithms.pdf, accessed January 2017.

J Hopkins. Whiting School of Engineering, Boyer-Moore from Langmead Lab, Available at:, accessed February 2016.

J Platos, M Prilepok and V Snasel. Text Comparison using Data Compression. VSB-Technical University of Ostrava, 2013.

LL Wortel. Plagiarism Detection using the NCD. Universiteit de Amsterdam, 2005.

PMB Vitányi, FJ Balbach, RL Cilibrasi and M Li. Normalized Information Distance. Springer, Boston, MA, 2009, p. 45-82.

R Tatman. Data Science 101 (Getting Started in NLP): Tokenization Tutorial, Available at:, accessed July 2019.

N Habert, A Gilles, M Adda-Decker, PBD Marëuil, S Ferrari, O Ferret, G Illouz and P Paroubek. Towards tokenization evaluation. In: Proceedings of the 1st International Conference on Language Resources and Evaluation, 1998, p. 427-31.

How to Cite
DEL ROSARIO, M. J., & SARENO, J. (2020). Theses and Capstone Projects Plagiarism Checker using Kolmogorov Complexity Algorithm. Walailak Journal of Science and Technology (WJST), 17(7), 726-744. Retrieved from