Improvement of Accuracy and Handling of Missing Value Data in the Naive Bayes Kernel Algorithm

Authors

  • Bijanto Bijanto Sekolah Tinggi Teknik Pati
  • Ryan Yunus Technical College of Pati

DOI:

https://doi.org/10.33633/jais.v6i2.5288

Abstract

The lost impact on the research process, can be serious in classifying results leading to biased parameter estimates, statistical information, decreased quality, increased standard error, and weak generalization of the findings. In this paper, we discuss the problems that exist in one of the algorithms, namely the Naive Bayes Kernel algorithm. The Naive Bayes kernel algorithm has the disadvantage of not being able to process data with the mission value. Therefore, in order to process missing value data, there is one method that we propose to overcome, namely using the mean imputation method. The data we use is public data from UCI, namely the HCV (Hepatisis C Virus) dataset. The input method used to correct the missing data so that it can be filled with the average value of the existing data. Before the imputation process means, the dataset uses yahoo bootstrap first. The data that has been corrected using the mean imputation method has just been processed using the Naive Bayes Kernel Algorithm. From the results of the research tests that have been carried out, it can be obtained an accuracy value of 96.05% and the speed of the data computing process with 1 second.

References

Y. Dong and C. Y. J. Peng, “Principled missing data methods for researchers,” Springerplus, vol. 2, no. 1, pp. 1–17, 2013.

R. Sarmento, E. Text, and M. Visualization, “Hepatitis C Records - A Complete Statistical Analysis,” no. January, 2021.

M. S. and V. K. T. Pang-Ning, “Introduction to data mining,” Libr. Congr, 2006.

S. A. Setiawan. T. A., Wahono. R. S., “Integrasi Metode Sample Bootstrapping dan Weighted Principal Component Analysis untuk Meningkatkan Performa k Nearest Neighbor pada Dataset Besar,” J. Intell. Syst., p. 796, 2015.

Sahibsingh A. Dudani, “The Distance-Weighted k-Nearest-Neighbor Rule,” IEEE Trans. Syst. Man. Cybern., vol. SMC-6, pp. 325–327, 1976.

M. Aladjem, “Projection pursuit mixture density estimation,” IEEE Trans. Signal Process, vol. 53, pp. 4376–4383, 2005.

J. Bilmes, “A gentle tutorial on the EM algorithm and its application to parameter estimation for gaussian mixture models,” Int. Comput. Sci. Inst., 1998.

Christopher M. Bishop, “Neural Networks for Pattern Recognition,” Oxford Univ. Press. Inc.198 Madison Ave. New York, NYUnited States, p. 482, 1995.

Downloads

Published

2021-12-06