Classification of Bird Based on Face Types Using Gray Level Co-Occurrence Matrix (GLCM) Feature Extraction Based on the k-Nearest Neighbor (K-NN) Algorithm

- Indonesia is one of the countries with a large number of fauna wealth. Various types of fauna that exist are scattered throughout Indonesia. One type of fauna that is owned is a type of bird animal. Birds are often bred as pets because of their characteristic facial voice and body features. In this study, using the Gray Level Co-Occurrence Matrix (GLCM) based on the k-Nearest Neighbor (K-NN) algorithm. The data used in this study were 66 images which were divided into two, namely 55 training data and 11 testing data. The calculation of the feature value used in this study is based on the value of the GLCM feature extraction such as: contrast, correlation, energy, homogeneity and entropy which will later be calculated using the k-Nearest Neighbor (K-NN) algorithm and Eucliden Distance. From the results of the classification process using k-Nearest Neighbor (K-NN), it is found that the highest accuracy results lie at the value of K = 1 and at an degree of 0 ° of 54.54%.


INTRODUCTION
Animals can form a kingdom and in the classification of living things are divided into 2 groups, namely Kingdom Plantae and Kingdom Animalia [1]. The classification of Kingdom Animalia itself consists of Invertebrates and Vertebrates [2]. Because the wealth of fauna that Indonesia has is very large, including the group of insects, hymenoptera and molluscs, the wealth of this fauna has 151,847 species, 30 thousand and 5170 species. For vertebrate mammals, there are 720 species (13% of the world's species), 1605 birds (16% of the world's species), 723 reptiles (8% of the world's species), 385 amphibians (6% of the world's species) and butterflies. 1900 species (10% of the world's species) [3], [4].
Some of the methods used for feature extraction that can be used are the Gray Level Co-Occurrence Matrix (GLCM) and k-Nearest Neighbor (K-NN) methods. This method is widely used for the classification of diseases [5], fruits [6], rice seed [7], handwriting [8], [9], leaves [10], batik [11] and animal species [12]. In the GLCM process will produce features, namely contrast, energy, entropy, homogeneity and correlation. In addition, to the Gray Level Co-Occurrence Matrix (GLCM) method [13], the k-Nearest Neighbor (K-NN) method is a method that can classify objects that are close to these objects. Therefore, the GLCM method and the K-NN algorithm are suitable for this study [5], [6], [12].
In the Gray Level Co-Occurrence Matrix (GLCM) method, the texture calculation uses second-order calculations to calculate the two neighboring pixels of the original image, while the first-order calculation calculates the statistics for the pixel value of the original image and does not pay attention to neighboring pixels [14]. First-order and second-order extraction are included in statistical methods. In addition to the Gray Level Co-Occurrence Matrix (GLCM) and k-Nearest Neighbor (K-NN) methods there are many methods that can be used to represent the classification of facial images in animals, such as Global Contrast Saliency. This method works by taking the dominant region of an image object by taking into account the difference in color contrast in each region and this method can display the area of the object that is more dominant than the resulting saliency image. In addition to Global Contrast Saliency, there is also a Context Aware Saliency (CAS) method. This method can be used to determine objects by separating the object from the background. In the use of the CAS method [15], it can be illustrated that the area of the object is more dominant than the background, because it is used in order to see the shape of the object more clearly. But, for the classification of the face of a bird species, the Gray Level Co-Occurrence Matrix (GLCM) and k-Nearest Neighbor (k-NN) methods will be used. By applying the Gray Level Co-Occurrence Matrix (GLCM) method to recognition of students' mouth images, the researcher conducted four trials. About 10%, 20%, 30% and 40% are testing data taken from training data. Then, the k-Nearest Neighbor (k-NN) method is a method for classifying a set of data from previously classified data. In addition, it can also classify new objects that are in close proximity to these objects based on attributes and sample training data.
The author chose the Gray Level Co-Occurrence Matrix (GLCM) and k-Nearest Neighbor (k-NN) methods used for the feature extraction process on the faces of bird species, while k-Nearest Neighbor (k-NN) was used to determine the accuracy of the value in the calculation, which was done because in previous studies many have used other methods such as Global Contrast Saliency, Context Aware Saliency (CAS), Histogram of Oriented Gradient (HOG) [7], Support Vector Machine (SVM) [16], K -Means Clustering [17] and many more. This research can perform image processing with Digital Image Processing and will use the Gray Level Co-Occurrence Matrix (GLCM) and k-Nearest Neighbor (k-NN) methods using training data of 55 images with 11 different classes as a reference for obtaining objects, while the test data used are 11 images and these images should not be taken from the training data. For example kind of birds is shown in Figure 1. Based on descriptions that have been conveyed, the authors want to do research by making a classification system for the types of bird faces using the Gray Level Co-Occurrence Matrix (GLCM) and k-Nearest Neighbor (k-NN) methods to determine the accuracy of the value of the method. The author also displays the results of the accuracy of the value of this study so that it can be compared with other methods.

K-Nearest Neighbor (K-NN)
K-Nearest Neigbor (K-NN) is a supervised learning algorithm used for classification and regression based on the k value of it is nearest neighbor. This algorithm is basically calculating the distance between the new data and the training data which is then classified against the data according to the closest class [7], [15], [18]. To know the closeness of a data to class, it is necessary to calculate the distance which in K-NN is called Eucliden Distance. Eucliden Distance [19]- [21] is a formula for finding the distance between 2 points in two-dimensional space as shown in (1), where d is Eucliden Distance, A is training data, B is test data, I is mumber of closest neighbors, and n is number of images.

Gray level co-occurrence matrix (GLCM)
GLCM is a method for calculating and obtaining second order statistical values by calculating the probability of the proximity relationship between two pixels at a certain distance and degree. The widely used degree values are 0°, 45°, 90°, and 135°. In Gray level cooccurrence matrix (GLCM) [14], [22] has feature extraction that produces several features such as contrast, correlation, energy, homogeneity and entropy. Contrast is a feature that can represent the difference in both of the colors level or the grayscale that appears in an image. The contrast will be 0 if the neighboring pixels have the same value. Correlation is a feature that has a linear relationship from the degree of the gray image. Correlation is between -1 and 1. Energy is also a feature that can measure uniformity in the image. If you produce a high value for image similarity, the value for Energy will also be high. Homogeneity is features that can represent a measure of uniformity and have a high value if all pixels have a uniform value.

Accuracy
In this paper, the testing process has been evaluated using accuracy as in (2)

Data Collection
The research data is obtained from google search. Obtained various bird faces with 11 classes with 5 images for each class. Data obtained based on the feasibility and texture of the image. This data is very important and necessary as research material for the classification of bird faces. Seeing the many types of bird animals, in order to further limit the coverage so that it is not too broad or too narrow, the researcher will use a screwdriver taken from the front and side of the animal's face as shown in Figure 2.

Proposed Scheme
The process of feature extraction will produce a matrix value and feature value that can be used as a reference for how accurate the method used is in Figure 3. The following are the stages of testing the methods in this study as follows : 1. Image input is used for testing. 2. Preprocessing processes from the image that has been inputted earlier.
3. Changing the color image or RBG to gray or grayscale. 4. Extraction is carried out in which there is a graycocrops function which will produce features in the GLCM and the graycomatrix function which will produce a matrix value to distinguish one image from another. 5. Classification as the end result of this process and will classify the image into which class according to its classification.

RESULTS AND DISCUSSION
In this paper, the data taken are public from internet searches with 55 bird images consisting of 5 images of Bald Eagle birds, 5 images of Scarlet Macaw birds, 5 images of Cockatoo birds, 5 images of Indigo Bunting birds, 5 images of birds. Black Swan, 5 images of Lilac Roller, 5 images of Long Eared Owl, 5 images of White Cheeked Turaco, 5 images of Taiwan Magpie birds, 5 images of Golden Chlorophonia birds and 5 images of Bearded Barbet. The results of this study are quite satisfying because they have high accuracy. Here are the results of the GLCM feature extraction at degrees of 0°, 45°, 90°, and 135° as explain in Table 1 and Table 2. In this study, the K-NN classification process was used to determine the results of the accuracy and class division based on the facial image of the bird species. After getting the results of GLCM feature extraction on training data and testing data such as contrast features, correlation, energy, homogeneity and entropy. This feature will be used as a parameter to classify the face image of a bird species by using the value k = 1 to the value of k = 10, then classification using the K-NN algorithm using the Eucliden Distance formula to calculate the closest distance between the training virgin and the test data. If the distance is close to the test data, it means that the data belongs to one type of bird. The K-NN classification process is at an degree of 0 ° with a value of k = 1 -10 and in this experiment, 55 images of training data and 11 testing data were used and resulted in an accuracy value. After getting the results from the feature extraction from GLCM, then the feature will be used as a parameter to classify the face image of a bird species using the value k = 1 to the value k = 10, then classification using the K-NN algorithm using the Eucliden Distance formula to perform calculations. The following is a graph in Figure 4, shown results of image testing using the GLCM and k-NN methods. The accuracy obtained is quite high with this method. According to Figure 4, it can be seen that the highest accuracy value is found at the value of K = 1 with an degree of 0 ° reaching 54.54% and the lowest accuracy is at the value of K = 3 with an degree of 90 ° reaching 0% and it can be concluded that the value of K and the degree can affect the results of the accuracy in this study.

CONCLUSION
Here, the feature extraction method uses the Gray Level Co-Occurrence Matrix (GLCM) and based on the k-Nearest Neighbor (K-NN) algorithm produces high accuracy for classifying bird species based on their faces. Using various degrees for feature extraction and using values of k = 1 to k = 10 which results in an accuracy of up to 54.54% which can be called a very good or very good classification diagnostic value with the Confusion Matrix testing technique. This system can classify various types of birds from the front and side images. Because, degrees are used to provide information from each image by the number of images tested accordingly. Thus, it is easier to classify the types. The data report provides graphical information on the accuracy of testing data using k-NN. With the intention of classifying to which class the tested birds are.