A Classification of Batik Lasem using Texture Feature Ecxtraction Based on K-Nearest Neighbor

- In this study, batik has been modeled using the GLCM method which will produce features of energy, contrast, correlation, homogenity and entropy. Then these features are used as input for the classification process of training data and data testing using the K-NN method by using ecludean distance search. The next classification uses 5 features that provide information on energy values, contrast, correlation, homogeneity, and entropy. Of the two classifications, which comparison will produce the best accuracy. Training data and data testing were tested using the Recognition Rate calculation for system evaluation. The results of the study produced 66% recognition rate in 50 pieces of test data and 100 pieces of training data.


INTRODUCTION
The development of the era also influenced the development of batik, batik was originally done by drawing using canting, the result of which we are familiar with hand-made batik, but in modern times modern batik appears by combining hand-painted batik and printed batik, or simply printed batik [1].Indonesian batik motifs have distinctive features according to their respective regions, for example Jogja batik, Solo batik, Lasem batik, and Pekalongan batik.This difference in motives is influenced by differences in geographical location, customs, arts and culture [2].One of the famous batik that has a high degree of complexity, namely batik written lasem, no wonder if many hunted by middle and upper class batik collectors.Batik Tulis Lasem has a distinctive written batik with motifs and colors that are influenced by symbolic patterns of Chinese tradition coupled with local motifs, namely batik with chicken blood red and motifs such as gods, dragons and hong birds or the familiar known as phoenix .And there are still many other specialties that are influenced by the story of history, nature, and Javanese-Chinese culture.
Some of the famous batik motifs of lasem are watu kricak motifs inspired by fragments of stone during the making of the Daendels era Anyer-Panarukan road, latohan motifs inspired by the fruits of plants that live on the seashore, then lokchan motifs are motifs of birds that are acculturated from Chinese culture, and many other motives.Since 2 October 2009, batik has been admitted by UNESCO as one of the cultural heritage of Infonesia [3].
Since February 26, 2015 in the village of Babagan Lasem became a tourist center for batik as a cultural tourism and education, making the area flooded by tourists from outside the country and abroad, from the general public, students and students.As a tourist center for batik, of course there are learning tours about history, the process of making and the types of batik motifs.Because there are too many types of motifs, of course making trouble tourists recognize the name of the motif on batik.Based on this, technology is needed that can help recognize batik motifs.Digital Image Processing (Digital Image Processing) is a discipline that studies image processing techniques.Images can be in the form of still images (photos) or moving images (videos), then digitally processed using a computer.One PCD technique is image classification, which is a process of grouping all pixels in an image into groups so that they can be interpreted as a specific property [4].To classify an image, information from the image is needed, the way to get information from the image is to extract image features, features that are generally extracted are shape, size, geometry, texture, and color [5].So for batik images can take the texture feature information to recognize patterns.
The process of classification of digital images consists of preprocessing stages that change the image of the batik berwarana into a grayscale image and change the size of the image.Then the feature extraction stage is carried out to get information on the image, and the last classification phase for grouping according to class.In previous studies several classification techniques have been proposed, one of which is Kusuma et al. [6], which classifies the maturity level of tomatoes based on their texture using GLCM with output of 4 features, namely contrast, homogeneity, correlation, and energy which produce 100% accuracy.Another study was conducted by Sutojo et al. [7], regarding classification of cattle types based on texture in cattle images with output of 5 features of GLCM namely contrast, homogeneity, correlation, entropy and energy which produced an average accuracy of 95%.From these studies, it can be seen that the level of classification accuracy is influenced by the amount of data, feature extraction, and the method used.In this study, the GLCM method will be applied to extract the texture features and analyze the accuracy of using 4 and 5 features.At the classification stage, the method to be applied is KNN.The advantages of the KNN method are that it is resilient to training data that has a lot of noise and is effective when the data training is large.Whereas one of the weaknesses of the KNN is the need to determine the value of the parameter k (number of closest neighbors) [8].Another research by [9], has been conducted by grayscale processing, binary and canny processes and the result of invariant moment's calculation and has been combinated using Canny detection to enhance result during Wavelet Transform implementation.

Image Classification
Image classification is grouping pixels on certain classes with predetermined references.Based on the technique, classification is divided into 2 namely supervised classification and unsupervised classification.Guided classification, namely classification carried out by the direction of the analyst, grouping is based on the characteristics of each class obtained from extracting information in the image.While the non-guided classification, the grouping process is determined by the computer based on proximity, similarity or similarity of pixel values in the image.The picture below is a description of the stages of image classification in general.

K-Nearest Neighbor (K-NN)
K-NN is a method for classifying objects based on learning data which is the closest distance to the object [6].K-NN is a method that uses supervised algorithms [10] which aim to find new patterns in data by linking existing data patterns with new data.K-NN will consolidate the results of calculations with training data that have the highest number of relatives in the specified range (k).The distance between training data and test data can be calculated by various methods, including using the Euclidean equation.
Where E (i, j) is the euclidean distance between vector i and vector j, then k is the value of membership, besides n is the number of features in vectors i and j as shown in [6].

Gray Level Co-Occurrence Matrix (GLCM)
GLCM is a statistical tool for extracting second-order texture features [4] from images that present the neighboring relationship between pixels in an image in various orientation directions θ and spatial distance d [11].The matrix is known to be effective for texture analysis.The GLCM matrix of an image f [x, y] is a two-dimensional matrix [x, y] where each element of the matrix represents the probability of occurring along the level of intensity x and y at a certain distance d and angle sudut.If the GLCM matrix has the size of [L + 1] x [L + 1], then the maximum intensity value of the image is L. In general, there are 4 directions used in making the GLCM matrix, namely the angle direction θ = 0 0 , 45 0 ,90 0 and 135 0 For one direction there is one GLCM matrix for each value selected from distance d and angle θ.In GLCM, several characteristics will be used to extract images, including energy, contrast, variance, entropy, homogeneity and correlation [12].

Recognition Rate
Evaluation at the classification stage is done by calculating the accuracy of the method used.Accuracy in classification is the percentage of the amount of data classified precisely according to its class after classification testing.Recognition Rate is one way to calculate the level of accuracy below.
Keterangan : ∑ ‫ݐܿ݁ݎݎܿ‬ = number of data that is correctly classified ∑ ‫݈݁݉ܽݏ‬ = amount of data used for testing

RESULTS AND DISCUSSION
In this research, the data used is divided into 2 training data and testing data.Training data is used for machine learning so that the machine has knowledge of the types of batik motifs that exist.While testing data is used as test data to test the ability of the engine.Batik motifs used are aseman, kawung baganan, latohan, sidomukti, and naga.The training data and data testing used as many as 150 images of batik written by Lasem, each of which was divided into 5 classes.
Pictures are taken at around 9:00 a.m. to 10:00 a.m. in January.The data are grouped into 5 batik motifs, namely the folk motif, the kawung motif, the latohan motif, the sidomukti motif, and the dragon motif, where 3 different motifs will be taken, and 10 motifs will be taken, so the number of datasets is used 150 images of batik written by Lasem.Testing uses training data 100 and testing data 50.
The method used for texture feature extraction is the Gray Level Co-Occurence Matrix (GLCM).Images that have been converted from RGB to grayscale are used for the texture feature extraction.The attributes used are distance (d) and angle (θ), where d = 1 and θ = 0 0 , 45 0 , 90 0 , 135 0 .Examples of simple GLCM calculations are 3x3 3-bit color images with d = 1 and θ = 00.The normalization of the GLCM matrix will be used to calculate the texture feature extraction value from the 3x3 size matrix in Figure 10 using the following features: 1.Energy Energy is obtained by summing each value of the GLCM matrix after normalizing the squared.Examples of energy feature extraction calculations: The result as in the table below: Table 3 is an example of the calculation of the contrast formula, all the values in the table are summed to get the results of the contrast calculation.

Correlation
Correlation is obtained by summing each x value deducted by the mean x and multiplied by the y value subtracted by the mean y and multiplied by the normalization value of the GLCM matrix, then divided by the standard deviation x multiplied by the standard deviation y.Before calculating the correlation, it is necessary to get the mean (μ) and standard deviation (σ) first.Furthermore, the results of the mean and standard deviation are used to calculate the correlation.Example calculation of correlation feature extraction.The mean x is obtained by summing each value of x multiplied by the normalization value of the GLCM matrix.Example calculation: (5) The mean y is obtained by summing each value of y multiplied by the normalization value of the GLCM matrix.Example calculation: The standard deviation of x is obtained by summing each value of x deducted from the value of the mean x which is then squared and multiplied by the normalization value of the GLCM matrix, after adding up everything is then rooted.Here's an example of the calculation: The standard deviation of y is obtained by summing each value of y deducted from the value of the mean y which is then squared and multiplied by the normalization value of the GLCM matrix, after adding up everything is then measured.

Homogenity (IDM)
Homogenity is obtained by summing each normalization value of the GLCM matrix divided by the value of x minus the value of y then absolute and added value 1.An example of calculation of homogenity feature extraction (IDM) uses a probability matrix from Table 4.

Figure 1 .
Figure 1.A Common Scheme of Image Recognition

Figure 2 .
Figure 2. K-NN Scheme Based on Figure 2, KNN stages can be described: 1. Determine the parameter k (number of closest neighbors).2. Calculates the square of the eucliden distance of an object against the training data provided.3. Sort results no 2 in ascending order (sequential from high to low) 4. Collect Y categories (The nearest neighbor classification is based on the value of k)

Tabel 2 .
Sample Calculation feature extraction energy Contrast is obtained by summing each x value deducted by y and multiplied by the normalized value of the GLCM matrix.Examples of contrast feature extraction calculations: Con = ∑ ∑(x − y) ଶ .p(x, y) Entropy is obtained by summing each GLCM matrix normalization value multiplied by the log of the GLCM matrix normalization value, after adding up all then multiplying by -1.Examples of entropy feature extraction calculations:

Figure 4 .
Figure 4. (a) Group of Images based on motive, (b) Datasets of Batik Tulis Lasem Motive

Table 2 .
Experiment result using 100 data training and 50 data testing based 5 GLCM feature