Improvement of Accuracy and Handling of Missing Value Data in the Naive Bayes Kernel Algorithm
DOI:
https://doi.org/10.33633/jais.v6i2.5288Abstract
The lost impact on the research process, can be serious in classifying results leading to biased parameter estimates, statistical information, decreased quality, increased standard error, and weak generalization of the findings. In this paper, we discuss the problems that exist in one of the algorithms, namely the Naive Bayes Kernel algorithm. The Naive Bayes kernel algorithm has the disadvantage of not being able to process data with the mission value. Therefore, in order to process missing value data, there is one method that we propose to overcome, namely using the mean imputation method. The data we use is public data from UCI, namely the HCV (Hepatisis C Virus) dataset. The input method used to correct the missing data so that it can be filled with the average value of the existing data. Before the imputation process means, the dataset uses yahoo bootstrap first. The data that has been corrected using the mean imputation method has just been processed using the Naive Bayes Kernel Algorithm. From the results of the research tests that have been carried out, it can be obtained an accuracy value of 96.05% and the speed of the data computing process with 1 second.References
Y. Dong and C. Y. J. Peng, “Principled missing data methods for researchers,” Springerplus, vol. 2, no. 1, pp. 1–17, 2013.
R. Sarmento, E. Text, and M. Visualization, “Hepatitis C Records - A Complete Statistical Analysis,” no. January, 2021.
M. S. and V. K. T. Pang-Ning, “Introduction to data mining,” Libr. Congr, 2006.
S. A. Setiawan. T. A., Wahono. R. S., “Integrasi Metode Sample Bootstrapping dan Weighted Principal Component Analysis untuk Meningkatkan Performa k Nearest Neighbor pada Dataset Besar,” J. Intell. Syst., p. 796, 2015.
Sahibsingh A. Dudani, “The Distance-Weighted k-Nearest-Neighbor Rule,” IEEE Trans. Syst. Man. Cybern., vol. SMC-6, pp. 325–327, 1976.
M. Aladjem, “Projection pursuit mixture density estimation,” IEEE Trans. Signal Process, vol. 53, pp. 4376–4383, 2005.
J. Bilmes, “A gentle tutorial on the EM algorithm and its application to parameter estimation for gaussian mixture models,” Int. Comput. Sci. Inst., 1998.
Christopher M. Bishop, “Neural Networks for Pattern Recognition,” Oxford Univ. Press. Inc.198 Madison Ave. New York, NYUnited States, p. 482, 1995.
Downloads
Published
Issue
Section
License
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).