Butterfly Image Classification Using Color Quantization Method on HSV Color Space and Local Binary Pattern

 A lot of methods are used to develop on image research. Image detection to relay back new information, widely used in various research field, such as health, agriculture or other field research. Various methods are used and developed to get better results. A combination of several methods is performed for testing as part of the research contribution. In this study will perform the combination results of the process color feature extraction with texture features. In color feature extraction using HSV color space method that gets 72 feature extraction and on texture feature extraction using local binary pattern that gets 256 feature extraction. The process of merging the two extracted results gets 328 new feature extractions. The result of combining color feature extraction and texture feature extraction is further classified. Results from image classification of butterflies get an accuracy score of 72%. The results obtained will be tested performance. The results obtained from performance testing get precision value, recall and f-measure respectively 76%, 72% and 74%


I. INTRODUCTION 1
Research on image processing is widely used in various fields. Research is growing rapidly as research findings and research can still be developed. Research on image processing for example face recognition using ESLGS (Extended Symmetric Local Graph Structure) method is an improvement of the previous method of SLGS [1]. An accuracy of face recognition obtained an accuracy of 84.24% using the proposed method from the previous 80.59%. The advantage of the proposed method is to provide better performance in accuracy and complexity than other operators. In the other image processing research is the detection of tuna based on texture and its shape using gray level co-occurrences matrix (GLCM) [2]. This method adds geometric feature extraction of the region of interest (ROI). The results shown in the incorporation of GLCM and ROI methods get an accuracy of 86.67%. Several methods and related research in the field of other images include the identification of butterfly species automatically using local binary pattern (LBP) method to detect textural characteristics and then classified using an artificial neural 1 Dhian Satria Yudha Kartika, Darlis Heru Murti, Anny Yuniarti are with Department of Informatic Engineering, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember (ITS), Kampus ITS Sukolilo, Surabaya 60111, Indonesi. E-mail: agen2009@gmail.com; darlis@its-sby.edu; anny@if.its.ac.id. network (ANN) [3]. In the study, rich divide 50 datasets of butterfly species are grouped into 5 species. The unknown feature extraction process position on the butterfly wing is calculated to be made LBP matrix. The process is to show the grayscale color distribution on the butterfly wings. The method in this study divides and compares the butterfly wings into 64 sub blocks for feature extraction.
Research conducted by Kaya still uses one feature extraction. Better results when added other feature extraction, example color, and shape. The purpose of rich research is to increase the accuracy value in the classification process up to 98% [3]. Some of the earlier rich studies used several methods to detect and classify butterfly imagery entitled a computer vision system for the automatic identification of butterfly species via gabor-filterbased texture features and extreme learning machines: GF + ELM [4]. In the research mentioned classification process using a conventional method by giving chemical substance to the dataset, the process needs long time and expensive cost. So it takes an alternative method using Gabor Filter (GF) and Extreme Learning Machine (ELM). GF method used for texture detection on the image because it optimally detects spatial domain (pixel manipulation process from an image to generate a new image) and frequency.
Based on these studies [4] research gets 97% accuracy. Kayci and Rich also conducted the study [3] using Graylevel co-occurrence matrix (GLCM) for automatic identification of butterfly images with an accuracy of 93.2% [5]. Related research of butterfly image is also done rich using GLCM and LBP. Both methods compared the results and results 98.25% accuracy for GLCM and 96.45% for LBP [6]. Although the results of the previous study showed LBP was slightly lower than other methods, LBP was introduced to describe images well and widely used in computer vision, image processing and retrieval of information on images, remote sensing and biomedical image analysis [7]. The advantage of using an LBP operator is its tolerance to illumination changes, a quickly computation that allows analyzing images in real-time [8].
In another study related to color feature extraction, performed by Kartika using HSV color space method. The study used koi fish dataset which will be classified. In this study mentioned the basic components commonly used for research in the field of imagery such as features of color, shape and texture [9]. In Satria research related classification based on color in koi fish get an accuracy value of 97.92%. In the classification testing using tools weka. The process of color feature extraction is to convert the color from RGB to HSV color space then perform the process of color computing into color quantization. The purpose of inclusion of color feature extraction results into color quantization is to reduce the color feature extraction results. It aims to speed up the computation process to make it faster [10]. In related research, Yousef conducted a trial on color feature extraction using Hue values only and comparing when using Hue, Saturation, Value simultaneously. The result of Saturation and Value values gives an increase in the dimension value and adds more information about the image [11].
Therefore, this research proposes combining the result of color feature extraction using Color Quantization in HSV color space with the result of texture feature extraction using Local Binary Pattern. The result of combining the two feature extraction results will be calculated by using the Support Vector Machine (SVM) accuracy value. The result of the merger of extraction results other than the calculated value of accuracy, will note performance analysis using confusion matrix. In the performance analysis will be calculated value of precission, recall and f-measure.

A. Method
The method used in this study as shown in figure 1 Methodology Research states that the dataset that has been prepared in the research will be done preprocessing aimed at removing noise before the color feature extraction and texture feature extraction. Noise on the butterfly dataset is related backgrounds in the form of twigs or leaves and flowers where the butterflies perch. After the noise on the image is removed then it is normalized on the dataset, ie resizing the image. All image sizes on the dataset will be normalized to a size of 420x315 pixels. After normalization, the data to be extraction process has the same paramaters, so that the resulting output has standardization [9].
After all the image is done normalization process, then extraction feature of color and texture feature extraction. The extraction results of each feature will be combined for the classification process. Before the classification process is done, the dataset will be divided into data training and data testing. Data training is used to build a classification model. After the classification model is formed then classification testing of pre-separated data testing will be done. The result of classification will show the accuracy value. Not enough to get the value of accuracy, the system already built will be tested performance. This performance test aims to assess the compatibility between the system built with the results achieved. In this study performance analysis using confusion matrix.

B. Materials
In this study used a dataset previously used in Wang research [12] in his study entitled Learning Models for Object Recognition from Natural Language Descriptions. At this stage the process of collecting and analyzing the data to be used as a dataset. The data used in this study is a picture of butterflies as much as 890 images with JPEG and PNG format. Dataset image capture process various position, from the top, front, rear, right or left side. A total of 890 datasets are divided into 10 classes among others Danaus plexippus, Heliconius charitonius, Heliconius erato, Junonia coenia, Lycaena phlaeas, Nymphalis antiopa, Papilio cresphontes, Pieris rapae, Vanessa atalanta and Vanessa cardui. In figure 2 describes the stages of preprocessing data until the data is ready to perform the feature extraction process of color and texture feature extraction. The image represents the 890 images used as the dataset and the entire data will be processed as shown in figure 2. Figure 2a shows the original image used as the dataset. Figure 2b is a mask in previous research [12], which in this research is used for cropping data to get figure 2c. Figure 2c is an image after removal of noise and ready for feature extraction process.
In the extraction process the color feature of the most important component is the color in the image, which will be converted into binary numbers. In the extraction process feature texture information to be taken is the texture or pattern of each dataset. In the butterfly has a unique texture, the pattern of each class is different and has its own uniqueness. And the entire feature extraction process will be converted into binary numbers. Each more detailed extraction process will be explained at the next point.

C. HSV Color Space
Proposed method in this paper use HSV color space optimalization for image feature extraction process [10]. Before using HSV color space, previous research using Kmeans for two color extraction. Two differences resulting image is black and white. Color representation in the digital image consists of red, green, blue (RGB). The black color in RGB that has been combined into (0,0,0), and the white color, the brighter that have been combined into (255,255.255). But the result of a combination of not favored by humans because it does not correspond to the original color. So that the digital image related research proposes HSV color space which is a representation of the Hue, Saturation, and Value [10]. Hue is a kind of color, saturation represents the amount of color and value is the amount of light.

D. Color Quantization
The result of RGB extraction to HSV color space will then be reduced to reduce computing without reducing image quality [11]. One of the quantization techniques by separating the unused numbers. The extraction process divides into 72 sections shown in Equation (1)

for hue, Equation (2) for saturation and Equation (3) for value.
Color quantization used to reduce computation on image quality. One technique quantization will be split into several features. A good combination to have high computing and a good performance as was done in previous studies [10]

E. Local Binary Pattern
LBP is a texture analysis method that uses statistical and structural models. LBP has the advantage that this method is invariant to rotation (LBPROT), so it does not restrict the taking of images from multiple sources, example the internet or taking objects directly. It is for this reason that the butterfly image research uses the LBPROT method.
P is the number of many neighbors, R is the radius between the center point and the neighboring point, (LBP) _PR is the decimal value that converts the binary value. I_C is the value of the intensity of the central pixel, I_ (P, R) is the pixel pixel intensity value (p = 0.1, ..., P-1) with radius R. While s (x) is the thresholding function [13]. The first LBP concept was introduced by Ojala [14] explaining that LBP is a great way to describe textures. In each pixel in the image, the binary code generated by the threshold value is equal to the pixel that is centered in the image.
The texture feature extraction process is a continuation of the preprocessing stage and the normalization of the data, where the image has been resized pixels subsequently converted to grayscale. The texture feature extraction process has been proposed, mentioning the extraction process using a local binnary patter (LBP) method that is invariant to rotation. The texture feature extraction process will be calculated based on the image's neighboring value from the center point. Among the number of neighbors 1, 2, 4, 8, 16, 32, 64, 128. So the resulting value on texture feature extraction as much as 256 bins (space) for texture features.

F. Analysa Performa
At the stage of performance analysis and test results will be done classification process. With the classification can know the level of accuracy. For performance analysis based on tables in confusion matrix to see the value of each class. In addition to analyzing the results of combining color feature extraction and texture feature extraction, analysis is also performed on each feature extraction. The classification process on the feature extraction results is done using the matlab application.
Phase before the classification process has been done merging between color feature extraction, feature shapes and texture features. A total of 890 data extracted features of color and texture characteristics then merged. In the classification process, the previous data already has a label or class for each type of butterfly and in accordance with the amount of data. Before the whole process of classification of data will be divided into two, namely data training and data testing. Data training as much as 790 data and data testing as much as 100 data. Data training is used as a reference testing process of data testing.

A. Color Extraction Features
Feature extraction process can be done after the normalization process is complete. Color feature extraction is to process the dataset in order to calculate the associated RGB color values in the butterfly image. As described in Equation 1, 2, 3 earlier. The process of extraction of color features by changing the RGB value into 3 dimensional space is HSV color space. Next will be calculated color quantization value. The results of color feature extraction as in table 1.

B. Texture Extraction Features
The texture feature extraction process is a continuation of the preprocessing stage and the normalization of the data, where the image has been resized pixels subsequently converted to grayscale. The texture feature extraction process as mentioned in Equation 4 and 5, previously mentions the extraction process using a local binnary patter (LBP) method that is invariant to rotation. The texture feature extraction process will be calculated based on the image's neighboring value from the center point. Among the number of neighbors 1,2,4,8,16,32,64,128. So the resulting value on texture feature extraction as much as 256 bins (space) for texture features.

C. Evaluation
In the classification process shows the accuracy value of each feature extraction and the combination of feature extraction results. From the value of accuracy can be seen the good performance of some testing process or scenario has been done. After the classification process, then will be tested system. System testing process with Confusion Matrix. By calculating the value of Precision, Recall and F-Measure. The results of the test system obtained as in table 3 below.  In table 3 the confusion matrix results of combining both color feature extraction and texture features, mentioning for datasets in grade 3 and grade 8 has an accuracy value of 100% which means all data testing in accordance with the existing classification model in the data training. In grade 4 only worth 40% of the data testing found. The accuracy value is low because it is not found in the same data testing as the model built on the data training. In the 4th and 10th grade on the 4th data there are similarities, it is possible that the result of merging has similar values between the data with each other. In the process of combining the results of color feature extraction and texture features. In addition to analyzed the performance of each, in table 4 is calculated the value of accuracy. It aims to know how well and accurately the system in taking and matching between data training and data testing. Table 4 shows the accuracy value of each feature extraction results (color and texture) as well as the combination of color feature extraction and texture feature extraction. The accuracy value shows the combined value of feature extraction on the 420x315 pixel image size of 72% and the accuracy of the color feature extracting results by 75% and the accuracy of the texture feature extracting results by 60%.

IV. CONCLUSION
The combination of color feature extraction results and texture feature extraction results is a combination of methods in previous studies has never existed. Classification of butterflies based on color features and texture features results an accuracy of 72%. Classification is also performed on each color feature result accuracy of 76% and the classification on texture features results a value of 60%.
In the color feature shows the dominant results compared to texture features, so affecting the value of the accuracy of the merge results. The future needs to be improved especially for texture feature extraction in order to get better results.