Classification of Arabica Coffee Green Beans Using Digital Image Processing Using the K-Nearest Neighbor Method

- Arabica coffee is the largest commodity produced by farmers in Pagergunung Village, Bulu District, Temanggung Regency. Coffee production in recent years has increased rapidly by 80% with the increasing lifestyle of the Indonesian people marked by the number of coffee shop buildings in various regions, and of course the demand for Arabica coffee has also increased, therefore it must improve the quality or quality of the coffee itself. However, in determining and classifying the quality of coffee beans, errors often occur due to the lack of understanding of the farmers in processing coffee. Based on this, the purpose of this research is to classify using the K- Nearest Neighbor method and feature extraction using the average value of Red-Green-Blue (RGB) color in determining the quality and quality of coffee beans according to grade so that they can get a high selling price. In this study using as many as 150 training image data and 150 testing image data, the results of this classification accuracy are 80% using k=1.


INTRODUCTION
Arabica coffee is one of two types of coffee that are widely available in the market. Arabica coffee has a higher taste quality and lower caffeine content than other coffees. Based on this, the demand for Arabica coffee has also increased, so processing is needed to improve the quality of the Arabica coffee beans in order to get a higher price. However, in processing, determining and classifying the quality of coffee beans, errors often occur due to lack of understanding in processing coffee (Pamuji, 2019). Therefore, a study was conducted to classify Arabica coffee green beans based on grade using technology to make it more efficient and accurate.
Research by Nelly Oktavia Adiwijaya, Hammam Iqomatuddin Romadhon, Januar Adi Putra, and Dewangga Putra Kuswanto with title "The Quality of Coffee Bean Classification System Based on Color by Using K-Nearest Neighbor Method". In this study, researchers used image processing with the K-Nearest Neighbor method. This study discusses the K-Nearest Neighbor method in classifying the quality of coffee beans by class using 90 training data from 3 classes and 30 test data from each class. Accuracy results with k = 3, k = 5, and k = 7 are the same, namely 83% (Adiwijaya et al., 2022).
Research by Siti Raysyah, Veri Arinal, and Dadang Iskandar Mulyana with title "Klasifikasi Tingkat Kematangan Buah Kopi Berdasarkan Deteksi Warna Menggunakan Metode KNN dan PCA". This study uses a classification based on the maturity level of coffee cherries, namely raw, moderately ripe, and ripe. This study uses the RGB and HSV methods assisted by the K-Nearest Neighbor method. Using 135 datasets divided into 90 training images and 45 test images, which resulted in an accuracy of 97.7% with k=3 (Raysyah, Veri Arinal and Dadang Iskandar Mulyana, 2021).
Research by Ariska Restu Ginanjar, and Enny Itje Sela with title "Sistem Deteksi Jenis Cacat Biji Kopi dengan Algoritma K-Nearest Neighbor". This study discusses the defect detection system in coffee beans using the RGB (Red, Green, Blue) and HSV (Hue, Saturation, Value) color model features. In his research Ariska, et al. used 60 coffee beans for training data and 40 coffee beans for test data. The resulting accuracy with k=3 in the data processing uses the RGB (Red, Green, Blue) color model feature of 95% (Ginanjar, 2019).
Research by Cinantya Paramita, Eko Hari Rachmawanto, Christy Atika Sari, and De Rosal Ignatius Moses Setiadi with title "Klasifikasi Jeruk Nipis Terhadap Tingkat Kematangan Buah Berdasarkan Fitur Warna Menggunakan K-Nearest Neighbor". This study discusses the classification of limes with RGB color features (Red, Green, Blue) based on the level of skin color maturity with 5 categories, namely raw, slightly ripe, ripe, perfectly ripe, and rotten. The data used is 75 data which is divided into 50 training data and 25 testing data. The results of the classification accuracy of this study with k = 3 is 92% (Paramita et al., 2019).
Research by Andhika Ryan Pratama, Muhammad Mustajib, and Aryo Nugroho with title "Deteksi Citra Uang Kertas dengan Fitur RGB Menggunakan K-Nearest Neighbor". This study discusses the detection of banknotes using color feature extraction, namely RGB (Red, Green, Blue). The number of datasets used is 40 images. This study uses old & new Rp. 2000 banknotes, and Rp. 5000 old & new banknotes. The results obtained that from the 16 test data obtained 15 data were detected correctly, and the accuracy obtained was 93.7% with k = 5 (Pratama, Mustajib and Nugroho, 2020).
Based on the research above, the research idea was obtained in classifying Arabica coffee green beans based on grade using the K-Nearest Neighbor method. With this research, it is hoped that farmers will understand the processing of Arabica coffee green beans based on their grade, and be able to increase the price of Arabica coffee.

Green Bean Arabica Coffee
Green coffee beans are raw coffee beans after going through the process of peeling from the skin and have not been roasted which is green in color. Most coffee farmers and experts distinguish coffee quality by looking at the color, shape and texture of green beans or raw coffee. (Nugroho and Sebatubun, 2020).

Grade Arabica Coffee
Grade is the level of coffee quality. Arabica coffee has several grades in classifying coffee quality, this is also included in the Indonesian National Standard (SNI) with SNI number 01-2907-2008 in order to adjust standards with other countries that also produce coffee. Arabica coffee has 6 quality grades consisting of grades 1, 2, 3, 4, 5, and 6 (Rizal, 2019).

Image Processing
Image processing is a processing method using images or images to get data from an image. There are many ways to process the image, one of which is the K-Nearest Neighbor method with RGB (Red, Green, Blue) color feature extraction. The RGB color feature (Red, Green, Blue) has 3 basic color components, namely red (R), green (G), and blue (B). Color feature extraction is done by calculating the average of each RGB value of an image or image. After the average value data is obtained, the results are used as input for further processing.

K-Nearest Neighbor
K-Nearest Neighbor is one of the supervised learning algorithms that processes training data based on the input and output information that has been given, then the system learns the pattern of the data which will be used as a reference to determine information from other data (Abijono, Santoso and Anggreini, 2021). The K-Nearest Neighbor method is used for data classification by determining the calculation of the nearest neighbor distance between training data and testing data using the Euclidean distance formula which depends on the value of k (Paramita et al., 2019).

Confussion Matrix
To determine the size of the prediction accuracy (performance) of a classifier with more than 2 (two) target classes (multiclass) it is necessary to calculate the confusion matrix which will be used to determine the results of precision, recall, f1-score, and the level of accuracy.

Research Object
The process of taking greenbean arabica coffee images is carried out using an OPPO Reno4F cellphone camera which has a 48 megapixel camera quality, which is assisted by a monopod as high as 10 cm from the object distance and is magnified or zoomed by 5x on the camera. The background used is white HVS paper, because it strengthens the color of the object. Taking pictures of each coffee bean is done back and forth or front and back so that it can be seen clearly because of the different front and back shapes. After the image capture process, preprocessing is carried out by removing the background and resizing the size from the original 3000x3000 pixels to 500x500 pixels.
Researchers took 50 alternating image data for each grade so that a total of 300 images for the training dataset, consisting of 50 grade 1 images, 50 grade 2 images, 50 grade 3 images, 50 grade 4 images, 50 grade 5 images, and 50 grade images. 6. For data testing, there are 50 alternating image data for each grade so that a total of 300 images for the training dataset, consisting of 50 grade 1 images, 50 grade 2 images, 50 grade 3 images, 50 grade 4 images, 50 grade 5 images, and 50 grade 6 images.
Based on interviews with middlemen in Pagergunung Village, there are defective coffee beans to distinguish each grade:

Research Stages
In conducting this research, the first to acquire Arabica coffee green bean images was divided into 2 parts, namely training and testing images. After that, preprocessing is done by removing the background. Then perform feature extraction by taking the average RGB color value (Red, Green, Blue). Then normalization is performed using decimal scaling, and the last step is to classify it using the K-Nearest Neighbor method by evaluating it using accuracy, precision, and recall.

RESULTS AND DISCUSSION
This study uses a Google Collaboratory notebook with the Python programming language to create a system for classifying Arabica coffee green beans based on grade. This classification uses the K-Nearest Neighbor method with feature extraction of the average RGB (Red, Green, Blue) value. The grades used are grade 1, grade 2, grade 3, grade 4, grade 5, and grade 6 with defects in each grade.
The following are examples of good quality coffee images and defective quality in this study: There are several stages in this research, namely the search for the average value of RGB (Red, Green, Blue), the search for Euclidean distance, and classification.

Means RGB
The first stage in this research is extracting features by taking the average RGB (Red, Green, Blue) value. At this stage the system calculates the average value for each training and testing image of 300 images (front and back). The results of these calculations will be used as a reference in the classification of Arabica coffee green beans based on grade. The following is a sample table of RGB average values from 10 green bean images (front, back) :

Euclidean Distance
The classification method using K-Nearest Neighbor in this study uses the Euclidean Distance formula in calculating the distance between neighbors. Here is the formula for Euclidean Distance: The process of calculating Euclidean Distance between 5 training data image samples (x) and 1 testing image sample (y) can be seen below:

KNN Classification
The following are the results of the classification of grade 1 using the Euclidean Distance K-Nearest Neighbor method using the value k = 1 to k = 10.          The results of the green bean classification of Arabica coffee based on the grade of 150 testing data of Arabica coffee green beans against 150 Arabica coffee green bean training data using a value of k = 1, there were 30 data that did not match the target. Accuracy can be obtained by : The accuracy obtained by using k=1 is 80%.

CONCLUSION
Based on research on the Arabica coffee green bean classification system based on grade with the K-Nearest Neighbor method using the Python Google Colab programming language, it was found that the accuracy obtained by using the value of k = 1 was 80%, k = 2 was 73%, k = 3 was 76,67%, k = 4 was 70%, k = 5 was 70,67%, k = 6 was 68,667%, k = 7 was 69,3%, k = 8 was 70%, k = 9 was 68%, and k = 10 was 71,33%. The results of this study are expected to assist coffee farmers in classifying Arabica coffee green beans based on grade in order to improve the quality and price of coffee, and the economic welfare of coffee farmers.