CNN for Image Identification of Hiragana Based on Pattern Recognition

- Hiragana is one of the letters in Japanese. In this study, CNN (Convolutional Neural Network) method used as identication method, while he preprocessing used thresholding. Then carry out the normalization stage and the filtering stage to remove noise in the image. At the training stage use maxpooling and danse methods as a liaison in the training process, wherea in testing stage using the Adam Optimizer method. Here, we use 1000 images from 50 hiragana characters with a ratio of 950: 50, 950 as training data and 50 data as testing data. Our experiment yield accuracy in 95%.


INTRODUCTION
Image recognition is very important in the evolution of artificial intelligence. In particular, finding an efficient model for reading handwritten characters is an ongoing study for very different letters [1] [2], whose symbols vary drastically [3]. One of them is Japan, Japan is a developed country and many people like this country, because it is a clean country and many works of art are in demand by many people [4]. It is also an interesting language for someone who likes Japanese and characters. For example, comics, anime and films produced by the Japanese state attract people to learn Japanese [5], one of them is the Hiragana letter. Japanese itself has 3 characters, namely Kanji letters, Hiragana letters, and Katakana letters. where Hiragana and Katakana letters are used as alphabets from Japanese, while kanji are used as root words in Japanese. And also writing in Japanese is different from writing in Indonesia, where in Indonesia the way of writing is from left to right while in Japanese writing is from top to bottom. And the writing for the next sentence is written on the left after the previous writing so that it reads from right to left. Hiragana characters were used as a smooth writing which is widely used by Japanese women. Hiragana letters exist on several syllables and have their own rules and the base letter as 46 letters. Which are used as verbs and adjectives, adverbs and are used for formal writing.
Hiragana letters are the basic letters used in writing the original Japanese vocabulary. Japanese has a sentence structure consisting of Hiragana letters, kanji and katakana [6] [7]. The syllabic combination uses hiragana letters and katakana letters, while those for pictures use kanji characters [8]. In addition, the addition of other symbols can change different meanings, therefore Japanese may have more than one combination.
In terms of character optimization, one of the methods of the neural network can be used. The process in a neural network [9] is likened to the workings of the human brain which must be trained first to get the right results with a high degree of accuracy. Each pixel that is analyzed and matched with the training data that has been carried out by the neural network method is very suitable for solving the specification case or for detecting damaged writing because the method is efficient than other methods.
Convolutional Neural Network or CNN for short is a continuation of a neural network that is good at handwriting recognition [10]. For this occasion, it will imply the convolutional neural network algorithm (CNN) neural network method [11]. In the last few years the Convolutional Neural Network has made very big breakthroughs because the Convolutional Neural Network has significantly improved in terms of image classification and processing. This is supported by a strong computational factor and a variety of training techniques. The Convolutional Neural Network algorithm is called the best model in solving problems such as object detection and others. The evidence is based on the results of previous studies. The data set used was taken containing 50 different characters and 20 different samples in each letter character. The aim of this study was to show that although using a small sample as training data, it is possible to create a learning model using a convokutional neural network to classify 50 different hiragana characters. In addition, CNN has several other models, namely CNN with 1 convosional layer, CNN with 2 layers, CNN with 3 layers and CNN with 4 layers. However, CNN also has a weakness in the long training process [9]. However, with the support of today's hardware, it is very easy to solve it. Another problem is the lack of suitable training data due to the difficulty of obtaining a dataset.  [11] In order to achieve the desired achievement to achieve the goal, review journals published on the internet. The learning used is the hiragana letter pattern recognition with CNN. After learning enough about the methods and concepts, a GUI and CNN (Convolutional Neural Network) architectural design were made by implementing the previously studied theories and concepts.

Convolutional Neural Network (CNN)
Artificial intelligence is a technique used to imitate intelligence or the workings of the human brain or inanimate objects in solving problems. For this purpose, at least 3 methods have been developed namely fuzzy logic, evolutionary computation and machine learning. In ANN, there is Deep Learning where machines are trained first from the inside [12]- [15]. Deep learning is a branch of machine learning that uses deep neural networks to solve problems in the machine learning field [5], [16], [17]. Machine learning can be defined as a method based on experience with increased performance to get accurate predictions. Experience here is likened to pre-testing knowledge or knowledge that has been known before being taught.
Convolutional Neural Network is an Artificial Neural Network (ANN) method that develops and is inspired by human nerves that can distinguish and recognize an object [18]. Convolutional Neural Networks can also be used to detect and recognize an image. Convolutional Neural Network can be said that it is included in the Deep Neural Network because of its high depth, therefore it is used as an implication for an image [16], [19]. Convolutional Neural Networks are also the same as neural networks that choose rules from input to produce output (feature map). The rule consists of 3 layers: convolution, layer activation and pooling [10], see Figure 2.   CNN carries out the convolution process by moving a filter with a certain size in the form of an image, then the computer will analyze and get new information with the result of multiplying parts of the image using the filter used. The process carried out in CNN is to split the image into a few pixels. Each pixel resulting from the convolution is then used as input to get the feature representation result. This process is used as an object recognition step when the object appears. From the results, it will be made into a new array, but the array is still large, so the step to unravel the array is even smaller, or is called max pooling. The next process is fully connect using the smallest array to be inputted into other neural networks and the last neural network will be used as the key whether the image matches or not.

Proposed Method
The initial stage carried out in conducting this research is to collect data on Hiragana letters, where the data will later be used as data in training and test data. This data collection was carried out and searched from internet sources as well as some editing. In terms of data collection, of course there are limitations that are carried out, one of which is that data collection is carried out only in the form of basic Hiragana letters, and then it will be made into several classes. Where the Hiragana letters are divided into 50 classes. In one class, it contains approximately 20 character images with a pixel size of less than 83x84 and uses the .JPG format which is opened using the Windows Photos software, which is an application available on Windows. The data that has been collected in this study are 1000 handwritten image data of Hiragana letters which are divided into two, namely as test data and training data where 950 is used as training data and 50 is used as test data.
After dividing or splitting the image into several classes for testing and testing data, then the image that has been collected will be carried out in the preprocessing stage. This stage is an initial process carried out in carrying out handwriting recognition. The process carried out in the preprocessing stage is image normalization, thresholding, filtering and labeling.

RESULTS AND DISCUSSION
The data collected is a dataset in the form of a JPG file format. Each character will be tested by collecting data, namely trainer data and testing data. In the current study, 45 Hiragana vowel consonants and 5 Tenten Maru letters were used. Training data and test data are each taken from 45 Hiragana Vowels Consonants and 5 Hiragana Tenten Maru letters, where 20 samples of each letter will be tested to get a role, then you will get 1000 sample data. In this test, the test was carried out several times using different files. The author can write down the resulting level of accuracy. Testing is done by taking several samples from the dataset and taking the accuracy value according to equation (1). The results of the process at this stage produce an image like  Then at the labeling stage, each hiragana letter is labeled according to the class to be determined. With the labeling stage, the data is in accordance with the predetermined classes. Can be seen in Figure 8.  Figure 8, then proceed with the modeling stage using the CNN (Convolutional Neural Network) method. After labeling and separating the training and test data, it is continued by training the dataset with CNN by forming a network architecture to get the accuracy of the results. Data input was carried out by taking data from a previously collected dataset in the form of handwritten hiragana letters which were used as a database for training using CNN. At this stage there are 950 training data images, where each character has 20 training data samples. The epoch data used is 200 where the process at times of season passes 200 times with the aim of getting the smallest error results to produce more optimal accuracy. While the loss used is 'categorical_crossentropy' with added Adam's optimizer using code below. Convolution is the combination of two series of a number in an image to produce a new series. The numbers contained in the image are in the form of an array matrix. Based on the input image, the image has a pixel size of 83x84, which can explain the size of the pixel height and width of the image. And the input image has 3 colors red, green and blue or called RGB. Each pixel already has a different pixel, the input image will be convoluted with the filter that has been determined. A filter is another block with a smaller height and width but the same depth as the original image. The purpose of using a filter is to determine the pattern which will be recognized by multiplying the value in the input matrix, therefore the values in the column and row will depend heavily on the recognized pattern.
In the last stage, validation was carried out by testing using 50 hiragana letter test data, where an accuracy value of 95% was obtained, but previously a comparison of 800: 200 was also carried out where obtaining an accuracy of 82% in detail can be seen in Table 1. Following are the results of the second experiment with training and test data with a ratio of 950: 50, 950 as training data and 50 data used as testing data, the results can be seen in Table 2. After going through several stages, it's time to test the accuracy of the data and its loss using the Adam optimizer.
Where using epoch iterations of 50 with validation accuracy results from different experiments using 950 training samples and 50 testing tests as below. It can be seen in the image above, that the accuracy results show a large loss of 33% with the highest accuracy value of 100%. For validation data, it is known that the loss value is 19% and the accuracy is 95%. it can be stated that the accuracy rate obtained is 95%. For a classification with many classes, namely 50 classes, with this accuracy it can be stated that it is good to run. It can be seen in Figure 8.

CONCLUSION
The Convolutional Neural Network (CNN) model uses input measuring 83x84 and uses 200 Epochs. The dataset used is 1000 images. The training process uses 950 images while the testing or validation process uses 50 images. With the results of the level of accuracy training or training and testing in the recognition of the hiragana character handwriting pattern of 100% for its accuracy. Meanwhile, 95% is validated for accuracy. When testing with a ratio of 800: 200, the accuracy value obtained is only 82%. Thus it can be stated that the more training data used as training data, the higher accuracy and more accurate it will be.