Learning Vector Quantization for Robusta and Arabica Coffee Classification

- ANN or artificial neural network is a way to solve various kinds of problems to make decisions based on training. One of the methods of JSt which contains competitive and supervised learning. Where this layer will automatically learn the classification of the closest input distances and will be distributed to the same class. there are 2 types of coffee beans that are famous in the world, namely arabica and robusta, for some people or the layman it will be very difficult to distinguish these 2 types of coffee beans apart from the fact that the shape is almost the same the color looks almost the same but there are a number of differences in the two coffee beans which we can see from the shape of the seed. Robusta has a shape that tends to be round and smaller in size, and has a rougher texture. Arabica, on the other hand, is slightly flatter and longer in shape. The size is slightly bigger than Robusta but the texture of Arabica is smoother than Robusta. This is the basis of this study where the images of the two coffee beans will be extracted using the first-order texture feature extraction method based on MU parameters, standard deviation, skewness, energy, entropy, and smoothness. The method for collecting data was in the form of a quantitative method using images from each coffee bean, both Arabica and Robusta, with a total of 130 images. The comparison between training_data and test_data is 80:20. Through research conducted in the form of performance parameters with the best accuracy, including: Learning rate 0.01, max epoch or maximum iteration of 10 and 30%, the amount of training data used is 39 training images and 26 test images resulting in an accuracy presentation of 71% for the training process and error with a percentage of 96% for the test process.


INTRODUCTION
Coffee is a plantation product that has a high usability value among other garden crops, coffee also has an important role as a source of foreign exchange for the country.Coffee is a source of income for one and a half million coffee farmers in Indonesia [1]- [6].Coffee has health benefits, namely reducing headaches, relaxing breath, and generating stamina.From the past until now coffee has become the most popular drink in the world.Every year coffee in Indonesia has increased.Indonesia is the fourth largest coffee producer in the world [7]- [9].
There are two types of coffee, namely Robusta coffee and Arabica coffee which have their respective criteria.Arabica coffee has lower caffeine levels and higher acid levels than Robusta coffee, so the price of Arabica coffee is more expensive.The shape of their seeds is also different.Many coffee farmers and entrepreneurs, such as coffee shops, do not know clearly how to distinguish the two beans [5], [10], [11].The sorting process is carried out before the coffee is marketed to separate Robusta and Arabica coffee beans [4], [12]- [15].Manual classification with eye visualization by humans using a 100 gram coffee sample or around 100-200 coffee beans.In this way it is very possible that errors often occur due to fatigue of the human eye.From this, the idea emerged to use technology using digital image processing and Artificial Neural Networks (ANN) to classify coffee beans.
Various artificial intelligence methods [8], [10], [16] are widely used to determine the type of coffee beans.As in several studies which used texture analysis of coffee bean images, assisted by the Neural Network Backpropagation method to classify the types of coffee beans.Backpropagation is an algorithm that can carry out the learning process on ANN which produces the minimum possible error value.The combination of digital image processing and ANN can provide optimal results.
Image processing [8] is combined with the Neural Network method.Assisted by ANN algorithms in processing data, the authors use ANN which has supervised learning, namely bacpropagation which can minimize error rates by changing the network.Therefore the researcher intends to create a Digital Image Processing System tool to classify Arabica and Robusta coffee beans which are expected to help entrepreneurs in the coffee sector, for example coffee shops, in conveying to consumers the differences between Robusta and Arabica coffee beans.
As we know that in this day and age technological developments are developing very rapidly, the use of computers or other technologies is very much that can be seen from our daily lives.Now with computers, it can make life easier by helping our work.There are many kinds of computing that are growing rapidly, one of which is Artificial Neural Networks.ANN is a technique in the field of artificial intelligence [10] which is built similar to the way the human brain works in solving a problem through its synaptic weights [12], [14], [17], [18].ANN can introduce various activities only through old data, the data is then studied by ANN so that it can make decisions on data that has never been known or studied.
One of several methods in ANN, namely the perceptron which is a single layer network that has a processing section of connected neurons and several inputs and outputs.perceptron has a goal to increase the number of input multiplication values of only a few parameters per problem compared to a threshold as shown in Figure 1.Multilayer Perceptron is a perceptron that has 1 or many layers at an output and input at a layer.Multi Perceptron is a new version of the previous network or single perceptron as shown in Figure 2 Learning Vector Quantization or can be called LVQ is a method that is always used in conducting training for layers left in a competitive layer as shown in Figure 3.As already explained that LVQ [3] is a method in which supervised competitive training is carried out.From this layer it is able to learn by itself learning in classifying the input vectors given, this method can be referred to as the process of learning carried out by LVQ with training data which can be called training.By this training learning process, LVQ can recognize patterns originating from objects that will produce optimally weighted values.The advantage of the LVQ method is that it can summarize large data to be smaller when compared to other methods, this method also has a fairly fast speed and only requires fewer data samples.This method is more often used in classification pattern recognition where the model can be updated gradually.

RESEARCH METHOD 2.1. Data Collection
The method for collecting data was in the form of a quantitative method using images from each coffee bean, both Arabica and Robusta, with a total of 130 images.The comparison between training_data and test_data is 80:20.Through research conducted in the form of performance parameters with the best accuracy, including: Learning rate 0.01, max epoch or maximum iteration of 10 and 30%, the amount of training data used is 39 training images and 26 test images as shown in Figure 4 and Figure 5.

Proposed Method
Quantitative method is the method that will be used in this research.The data to be used is primary data.The samples themselves are Arabica and Robusta coffee bean images with 55 each to be used as training data and 5 images as test data.By using first-order or secondorder statistical features, there is a case that this time the researcher will use first-order statistical feature extraction where the first-order characteristics are based on the characteristics of the image histogram.Generally, first order features are used to provide a macrostructure texture, where the macrostructure is a periodic repetition of the local pattern.The steps in the research using the LVQ method are as follows: 1. Get the image of coffee beans as a research sample, 2. Image grouping according to the type of coffee beans, 3. Application of the Learning Vector Quantization method, namely by creating a LVQ network architecture with 6 input variables namely skewness, deviation, brightness, mu, energy and entropy then there are 2 output variables as well, namely training characteristics and training targets, and the calculations are made manually with excel or using Matlab.4. Application GUI Design.At this stage an application will be designed to recognize coffee beans with a display for the user to see. 5. Designing Application GUI Designs.Making the design using the GUI from matlab.
Implementation of applications that have been made.After creating the design, the next step is to implement it by writing the coding in MATLAB 2021a.6. Testing and Analysis.The application that has been made will be tested at this stage.
Testing will be carried out by calculating the level of accuracy in classifying coffee beans in the test data using several variables or parameters such as MU, standard deviation, skewness, energy, entropy, and smoothness, the amount of training data used.The conclusion is obtained from the final results of the research analysis in the form of the level of accuracy that can be produced in the process of classifying coffee beans.This stage in Figure 6 will be carried out when processing training data and test data.The purpose of this stage is to make it easier for the system to find objects.The following is a picture of the preprocessing stage flowchart.This stage begins by inputting the original image of Arabica or Robusta coffee beans.After that, the system can change the color of the original image to grayscale and from grayscale we take the necessary parameters in binary form.Then the system will analyze the area to get the area of the region.After getting the area to be used, the system looks for the features that will be needed from these features, we can enter them into the data table which has 6 features.These features will become a training feature database to be compared with the test data.Learning Vector Quantization (LVQ) is the final stage in obtaining a class from the results of the image research carried out.The class obtained is according to the Euclidean distance from the input vector to the other vectors.Figure 3 is the classification process with LVQ.The training data that has been preprocessed will be feature extracted by order 1 feature extraction.The resulting vector features will be used for the network training process which will eventually be saved in the database which is the final stage of the image identification process.Then the test image will be inputted and classified according to the vector characteristics in the database, so that the test image can be identified according to its class.
In this research, an application for classifying Arabica and Robusta coffee beans will be made.To make this, data related to the image of the coffee beans for each type is required.before the research was carried out the researcher took an image of each coffee bean, the image of the coffee bean that was in accordance with the needs was taken and stored in their respective folders so that they were not mixed up, the researcher of course sorted out which coffee beans were good to sample as a representation of the coffee beans both Arabica and Robusta.Artificial Neural Network with the LVQ method for classifying coffee bean images using Matlab 2021a software.Data collection used the direct observation method, which examined directly into the field so that they could directly pay attention to and take images from coffee bean samples as shown in Table 1.In Table 1, the outputs to be obtained are 1 and 2 where 1 represents Arabica coffee beans and 2 represents Robusta coffee beans.The LVQ method or learning vector quantization is a pattern classification method in which each output unit represents a class in question.The LVQ method runs supervised learning of the competitive layer.A competitive layer will automatically learn to classify input vectors.The number of inputs is adjusted from the input variable of order 1 feature extraction, the number of inputs is 6 and the number of outputs is 2, if the LVQ network architecture is made it will look like Figure 6 below.3) o If  ≠  , then use the equation ( 4) c. Subtract alpha by the equation ( 5) The training process will stop when it reaches the maximum epoch limit or the minimum learning rate (α).At the end of the training process, the final weight (w) is obtained.these weights will be used later intended for the test process.The LVQ training flowchart can be seen in Figure7.Then the test process uses the LVQ (Learning Vector Quantization) method as follows: 1. Initialize the initial weight by using the last weight resulting from the previous training.2. Initialize from initial conditions, true = 0 3. Perform steps 4 to 6 as many times as the number of data.4. Calculate the distance from the equation ( 6) () = √∑ ( − )  2 =1 (6) 5. Determine the shortest distance (Cj) from the calculation based on the minimum value of all class distances existing (D(j)).6. Check using the conditions in equation ( 7) and (8).-If Cj ≠ T, then  = 0 (8) 7. Calculate the accuracy of where the comparison between the amount of data (the output of the system created) is correct with the amount of the total data as in equation (9).

RESULTS AND DISCUSSION
When the test process has been completed, it can be seen the level of accuracy of the system.The next step is to design the Arabica and Robusta Coffee Bean Detection application at the same time with the actual appearance, designed by (GUI) matlab and implement it entirely by being able to make Matlab coding and program design work properly.
In the flowchart image above it is explained that in the first stage a training stage will be carried out which at this stage will carry out training on the training data and selected parameters.And if the results of the training image training are not the same as what we want by the researchers, the training steps will be repeated by changing the parameters until the desired training image accuracy is obtained.
Then after obtaining the appropriate results a test phase will be carried out, at this stage the application will perform the sum with the predetermined test image, the accuracy of the training image can be seen by the number of target classification errors if it does not have the same value as the target on the test image.if the achievement of accuracy is achieved by the software gets poor accuracy, then the LVQ method is not suitable for this study, but if the accuracy results obtained by the software are good, then the LVQ method is suitable for this study.
The next thing to do is testing and investigation.At this level, testing will be carried out on the system that has been made.System testing will measure the accuracy achieved in diagnosing images on image testing using existing parameters.In this training the first parameter used is the achievement of the previous training process, namely the best accuracy achievement parameters include: Learning rate 0.05, max epoch or maximum iteration of 10 and 80% (104 data) of the amount of training data used.

Figure 8. Learning Rate Testing
Testing on the learning rate value is intended so that researchers recognize the effect of changing the learning rate on the amount of accuracy obtained.The range of learning rate values that can be allowed is [0, 1], the learning rate values to be tested are 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09 and 0.1.If the value you want to use is so small that the results obtained seem insignificant so that changes cannot be seen clearly, and if the learning rate continues to be large, the accuracy will be poor and the training process will be abnormal.This test was carried out to obtain the highest accuracy of the learning rate.At the testing stage, researchers will use a maximum number of iterations of 10 and 80% training images, namely 104 information.The test results can be seen in Figure 8.
Based on Figure 8, it can be concluded that the least amount of Learning Rate with the most accuracy in test images is at 0.01, so the recommended Learning Rate from this test description is 0.01.This test was carried out with the aim that researchers understand that changing NumEpoch (number of iterations) can change the accuracy value or not.The iterations used are from 10 to 100.In this test, the Learning Rate value is 0.01, and the number of training images is 160 images.The test results look at Figure 9.In Figure 9 above, it can be concluded that the maximum number of iterations is the least with the highest accuracy based on the test image in the 10 number of iterations.Therefore, the recommended maximum iteration of this test scenario is 10 maximum iterations.This test session was carried out so that researchers recognize the percentage change in the amount of training information used whether it affects the total accuracy or not.The percentage of the amount of training information used is from 10% to 80% of the total information used is 104 pieces of information, because the other 20% is used for test information.The amount of training information used is 10% with 13 images, 20% with 26 images, 30% with 39 images, 40% with 52 images, 50% with 65 images, 60% with 78 images, 70% with 91 images and 80% with 104 images.In this test session, it uses a learning rate of 0.01, and a maximum iteration of 10.This test produces the results that can be seen in Figure 10.In Figure 10, it can be concluded that the effect of the very small number of training images with the highest accuracy lies in the 30% training image.So that the recommended number of training images from this test image is 30% with a total of 39 images.Based on the test data previously provided, there are results with the highest level of accuracy by using the best network architecture, with a Learning Rate value of 0.01 and 30% number of images.

Figure 11. Testing Result
According to Figure 11, the X symbol represents the test results and the T symbol is the training target, it can be concluded that the LVQ parameter used is based on the test results and the effect of the amount of training information and test information is the best parameter because it produces great accuracy, namely out of 26 test information there are only 2 errors.

Figure 10 .
Figure 10.Testing the Effect of the Number of Trained Images

Table 1 .
Image Classification Weights using First Order Feature Extraction