A Case Study of Applying Customer Segmentation in A Medical Equipment Industry

― The purpose of this paper is to apply LRFM (length, recency, frequency, monetary) for customers in the medical equipment industry and identify differences in each customer segment. This study uses LRFM and clustering to segment its customers. This research uses transaction data of the medical device industry in Indonesia. This data will be extracted for the length, recency, frequency, and monetary (LRFM). The optimal cluster obtained from the validation process is four which will be used as a basis for customer segmentation. This study uses the K-Means algorithm as a clustering method and Decision Tree as a classification method and the application of IF-THEN rules. The segmentation process will be identified based on LRFM criteria in each segment that has been formed and will form a marketing strategy that is appropriate for the company. The results obtained from this study are four customer segments based on LRFM with each segment given a profile name as: Best, Frequent, Low and Uncertain. This study provides guidance on customer identification based on LRFM that can be used by medical equipment companies to develop strategies that are in accordance with the criteria of each segment that has been obtained to improve customer relationships management system and new ways of marketing products. entitled "A two-stage clustering method to analyze customer characteristics to build discriminative customer management: A case of textile manufacturing business" uses systematic analytical methods to analyze consumer characteristics with LRFM customer relationship models which consists of four dimensions, namely: relation length (L), recent transaction time (R), buying frequency (F), and monetary (M) in the textile industry. The results of the study indicate that grouping customers using the LRFM method has a statistically significant explanation in terms of marketing strategies and research can be used to differentiate customer relationship management. in the


A. Background
USTOMERS are one important factor in the world of industry or business. One company's progress and profits depend on the interests and number of customers. If a company pays attention to the needs of customers as well as market needs, then it is also getting stronger in profit. In the past many companies focused on the product, so that the product was of good quality to be marketed. With a quality product, it is expected that sales will increase and profits will be easier to obtain.
In recent years there has been a change in orientation in the field of industry or company. The focus of modern companies in various industries has changed from being product oriented to being customer oriented [1]. This change occurred quite quickly due to increased interest in Business Intelligence (BI) in general and Customer Relationship Management (CRM) in particular [2]. The reason for this trend is because customer activities can be stored through data storage and sources of information about demographics and customer lifestyles already available. This information can later be analyzed using CRM so that many companies are interested in this trend. The second reason is that companies can understand customers and create value for customers to reach target markets [3]. According to Craven (2003) [4], orientation to customers and competitors is one method that can be used if the company wants to excel in competition.
Customer Relationship Management (CRM) is one means to establish an ongoing relationship between the company and customers. CRM helps companies to know what is expected and needed by customers, customer management strategies, ranging from marketing processes, sales to after sales services, which aims to increase customer satisfaction, which leads to customer loyalty [5].
The needs and desires of customers need to be known in the business world. Each customer has different abilities and needs, so customer segmentation needs to be done. Customer segmentation is done to determine the priority level of customers in the related industry, so the company can choose the type of customer that can be chosen and provide benefits for the company. To get customer segmentation, Clustering Technique is used. This technique aims to group a number of data into clusters (groups) so that within a group has similar data [6].

B. Previous Research
In this research, K-Means method is used to classify customer segmentation and Decision Tree classification to identify potential attributes to be followed up, sorting out new and old customers in each customer segment at PT. Edison Duta Sarana. K-Means is one of the well-known algorithms for cluster analysis. This method has been widely used in various fields including in data mining, statistical data analysis, and other business applications. Cluster analysis is a statistical technique used to identify a set of groups that both minimize variation within groups and maximize variation between groups based on distance or inequality functions and the aim is to find the optimal cluster collection [7].
Research Li, D. C et al (2011) [8], entitled "A two-stage clustering method to analyze customer characteristics to build discriminative customer management: A case of textile manufacturing business" uses systematic analytical methods to analyze consumer characteristics with LRFM customer relationship models which consists of four dimensions, namely: relation length (L), recent transaction time (R), buying frequency (F), and monetary (M) in the textile industry. The results of the study indicate that grouping customers using the LRFM method has a statistically significant explanation in terms of marketing strategies and research can be used to differentiate customer relationship management.
Journal [9] applied the LRFM model by adopting the Self Organizing Maps (SOM) technique in the children's dental 120 clinic market segmentation in Taiwan. There were twelve clusters out of 2258 patients, then the average LRFM value was calculated for each cluster and overall patient. The result is that there are three clusters that have above average LRFM values of 454 patients which can be seen as the main patient group .
Clustering is used to group objects based on differences in similarity in each object. One method that can be used is K-Means. Research [10] segmented customers using the K-Means and Particle Swarm Optimization (PSO) methods. In this study, it was obtained how to group customers through RFM variables to indicate the level of customer interest. Furthermore, the level of customer interest can be used by business people to improve the quality and service to their customers.
The K-Means method is also used by Cheng & Chen (2009) [11] in classifying customer value segmentation through the RFM model and RS theory in the electronics industry in Taiwan. The steps in this research are first, using the RFM method to produce quantitative values as input attributes, then the K-Means algorithm is used to classify customer values. Finally the LEM2 algorithm is used for classification which helps companies drive good CRM. The result is that grouping of customers is more important and which customers contribute more to the company's revenue. Cheng and Chen hope that this research can help companies   Table 4. Sample data before transformation. Table 5. Sample data after transformation. 121 focus on target customers and then get maximum profits with win-win solutions for customer companies.
Journal [12] in their study entitled "Hybrid soft computing approach based on clustering, rule mining and decision tree analysis for customer segmentation problems: Real case of customer-centric industries" doing new segmentation in a company that is focused on customers. Three methods were used in this research namely K-Means; decision making and data filtering systems; and using the decision tree analysis method, IF -THEN rules. The results of this study are the application of the proposed approach to handling real life cases.
Research [13] also uses RFM techniques and K-means methods to find out favorable customer profiles for hotels in Antalya, Turkey. The results show that RFM can effectively group customers. This grouping can then direct the hotel manager to produce new strategies to improve their capabilities and services to customers.

A. Research Methodology
This research method defines the processes that have been structured based on methods and literature studies that have been determined to carry out this research so that the process of this research can be understood and understood by other parties.

1) Stage of problem identification
Before doing the research process by the author, the writer is required to understand the purpose of the research. So the research will solve the problems that have been previously identified and explain clearly the purpose of the research.

3) Stage of data collection
The data collection stage is the initial stage used in the next segmentation process. The author gets customer data from the company database which includes 2018 and 2019 data running. The transaction data contains 2115 data from PT. Edison Duta Sarana. The focus of research is on clustering which is used to classify customer priorities that will be used for the company's marketing strategy.

4) Data Processing and Analysis Discussion
At this stage describes the process of data processing that has been obtained and continued with the clustering process which is the core discussion of this research can be seen in Figure 1. a. Preprocessing Data Data obtained from collection is a form of raw data that has not been structured, especially customer data. Therefore, data preprocessing is needed in several ways, namely data integration, data cleaning, data reduction and data transformation. b. Variable Weighting After doing the data preprocessing process, then we do the weighting of variables that aim to determine the weight of each attribute so that it is known which is the most influential on the process of clustering and classification in the next process.    c. Data Clustering At the clustering stage, it is carried out to find out how much segmentation can be obtained from the process using the K-Means method. Attributes obtained in the previous process are used in this clustering process with several scenarios ( = 1, = 2, . . . , = 10) to get the optimal value of each cluster process. d. Clustering Process Clustering process is a clustering process using the K-Means method. The data used comes from LRFM data that has been carried out preprocessing processes which include data integration, data cleaning, data reduction and data transformation. Clustering scenario uses the value K = 10. This is done to support the validation process by the Davies-Bouldin index. More and more scenarios can see the difference in DBI results in each cluster so that the smallest value and change insignificantly is a valid cluster reference to use. e. Data Classification In the classification stage, carried out to get the value regarding the prediction of the status of the customer group f.
Classification Process The classification process is a process used to determine group predictions from customers. From these results it can be observed how precise the prediction regarding exsisting customers who are part of each group. The tools used at this stage use RapidMiner with an open-source license. g. IF-Then rules process IF-Then rules are obtained using Decision Tree to Rules. The process helps the company to decide the customer based on the group by considering the value of each attribute used in the IF-Then rules process. The data used is the LRFM data that has been scaled using the membership function.

A. Preprocessing Data
In this process data preprocessing will be carried out in several ways, namely data integration, data cleaning, data reduction and data transformation. The following data will be preprocessing can be seen in Table 1. Table 1 is a snapshot of the data that will be used in this study. The above data has not been preprocessed so it is still raw data.
The data in Table 1 will then be processed using several processes which will be explained as follows.
Data integration process is carried out to merge some data from several excel files obtained from the previous data export process.
The data cleaning process is carried out to clean up data that contains null-values and is not used in the computation process.
The data reduction process refers to Table 2 which displays the attributes that will be used in this research.
The data transformation process is carried out to standardize data with certain formats. Then the results will be used as material for further research calculations. Data transformation uses standardadization. In addition there is scaling data for the classification process shown in Table 3.

B. Variable Weighting
This stage is weighting the variables used to identify the value of each variable in this study.
In Table 7a and 7b the values of the variables are identified based on various measurement methods tested, it can be concluded that the Length variable has the highest value of each method tested, followed by recency, frequency and monetary respectively. So the Length variable is very influential in this study.

C. Clustering
In the clustering stage, it is used K-means algorithm with the calculation of the number of = 2, . . . = 10. With = 100. This is done to get the optimal value of each cluster tested so that the best value is obtained based on the Davies-Bouldin Index.

D. Cluster Validation
At this stage continue the results of clustering that have been obtained in the previous stage in point 3.3. This validation will use the Davies-Bouldin Index as a reference for consideration of selecting the appropriate number of clusters.
Based on Table 9 the results obtained are uncertain. = 2 gets a DBI value of 0.119 and at = 3, . . . = 10 has a much greater value than = 2. Then the most stable value is taken and does not have too much distance with the number of other clusters. A value of = 4 containing a value of 0.621 is the most realistic value when looking at the condition of the value of the closest cluster. Because there is no significant difference in values with = 3 and = 5.

E. Cluster Results
Based on the results of cluster validation, the number of clusters used in the research was 4 clusters. The following will be displayed in Table 10 and visualization images in each cluster. Table 10 identifies the average value of each variable including length, recency, frequency, monetary and LRFM scores. From these results it can be seen if cluster 1 has the highest LFM value compared to other clusters. And in cluster 2 only has a high R value which indicates the distance between the last transaction and the study period, cluster 3 has a large LFM score but not as large as cluster 1. While in cluster 0 the LRFM score is low but still has a higher R value high compared to cluster 2, which indicates the distance of the last transaction with the research period is still close. This can be identified through the average value of each variable that exists.

F. Segmentation Groups
At this stage, the naming of each segment that has been obtained will be identified. The naming is best, frequent, low, uncertain. The following data will be displayed in Table 11.
In Table 11 the naming of each cluster has been carried out based on the respective prioritize based on the LRFM value in Table 4.16. of these results the cluster with the most number of samples is cluster 1 which is low cluster with a percentage of 45.3%. Then followed by cluster frequent which has a percentage of 36.26%. Cluster uncertain with a percentage of 18.3% and cluster best with a percentage of 0.27%. In the best and frequent clusters the LRFM score has a higher percentage than the other 2 clusters. But best cluster have higher F and M values than frequent cluster.

G. Classification
At this stage the classification process it is used decision tree algorithm to test the accuracy of customer determination in the clustering process based on the value of each variable. The data used in this process is 364 data. Table 12 uses training data samples of 255 data from a total of 364 data. With a percentage of 70% for training data. The definition of each variable is as follows: VL (Very Low), VH (Very High), M (Medium), H (High) and M (Monetary). Table 13 uses 109 data tests from a total of 364 data. With a percentage of 30% for test data. The definition of each variable is as follows: VL (Very Low), VH (Very High), M (Medium), H (High) and M (Monetary).

H. Classification Validation
At this stage is the validation process of each process of training data and test data in point 3.7.
In table 14 of the training data process with 255 data, it is known that there was a change in the uncertain group into a frequent group of 3 customers which resulted in a score of 97.25% in uncertainty. The accuracy of the training data is 98.82% While in table 15 the data test process with 109 data, there are changes in the uncertainty of 2 customers who are predicted to be in the frequent group. The accuracy produced in this process is 98.17%. Table 16 explains the results of the validation of 364 data. The accuracy value in Table 11 shows that the performance value of the cross validation is in the appropriate range of values. Where in this process helps validate data not seen in the previous test and training process. The resulting accuracy is at a percentage of 96.98% with an error value of 3.02%.

I. IF-Then Rules
At this stage IF-Then rules are performed to find out the rules that classify each customer criteria into a group. The group that is referred to is the group generated by the clustering process in Table 11.
In Table 17 it is explained that L is Length, R is recency, F is frequency and M is monetary. These rules group each variable into each group that is generated in the previous clustering process.

J. Strategy analysis
At this stage customers will be defined based on the characteristics of each group that has been formed. Then based on these characteristics a strategy will be defined in accordance with the conditions of each group.
To interpret the results of customer segmentation based on the RFM model, Marcus (1998) [15] proposes a customer value matrix based on frequency (F) and monetary (M) variables to form four main types of customers, including best customers (F ↑ M ↑), spender customers ( F ↓ M ↑), uncertain customer (F ↓ M ↓) and frequent customer (F ↑ M ↓). In fact, [16]  clusters. This shows that the two clusters have a very close long-term relationship with the company. Then it can be identified that the customers in this cluster are loyal customers to the company. It is known that in cluster 2 there is only the percentage of customers 0.27% with the highest total income from these customers during the study period. And in cluster 4 there is the percentage of customers 36.26% with average income.
By using this information the company must be able to maintain relationships with customers, especially with cluster 2 which produces high value for the company to increase profits. It is important to look at the nature of the product purchasing habits of Cutomer to provide various promotional or discount services on specific products that are usually purchased by these customers. With this strategy, ordinary customers will also be interested in increasing transactions to expect special treatment [17], which is the same as customers in cluster 2. For example, free of service facilities for products purchased thus increasing the level of customer loyalty. Free gifts and free of sponsorship at each customer's scientific moment so that it increases the sense that the company is paying attention to the customer.
Cluster 4 has an average value of M (Monetary) which is not too high compared to Cluster 2. To increase the monetary value of each transaction, promotion can be implemented on products that are usually purchased. To achieve the application of certain promotions, the customer must reach a certain value to meet the discount [18]. For example, purchasing products in the form of instruments and electromedical products will get a 30 percent discount on each transaction. So expect an increase in monetary value in cluster 4. Giving rewards when they have reached a certain value will increase customer loyalty. Cutting prices for subsequent transactions will also increase the frequency of purchases and customer monetary values.
In cluster 1, the average value of variable L (length) is low and R (recency) is close to the study period. This shows that the customers in cluster 1 are new customers who have not recently traded, which can be identified based on the proximity of the R value to the study period. Low F and M values also indicate that the customer has not routinely done transactions with the company. With this value, the customer can switch to another company if the company does not increase relationships with customers. Customers in this cluster have the potential to become best customers if the company adopts the right strategy. And can be lost if the company ignores the customer. To avoid this, companies must pay attention to prices to attract new customers who are on the cluster 1. So that they will be more interested in buying products repeatedly to the company with attractive price promotions.
Cluster 3 has a low L (length) value and a high R (recency) value. This identification means that the customer is a new customer who on average has not made repeat orders based on high R values. So the distance of the last transaction with a long research period. To return the customer back to the transaction required information about the reason the customer left the company. To implement this strategy, the company must understand the customer's buying habits to 125 identify the reasons why the customer left. Besides the company also asks customers to buy back products from the company. By paying attention to customer buying and feedback patterns, strategy adjustments can be made to customers in cluster 3, including giving promotions or discounts to customers who are not satisfied with the price and informing them if there is a purchase bonus if they make a new transaction on the customer cluster 3.

IV. CONCLUSION
Based on the results of research conducted, the following conclusions are obtained number of segments owned by medical equipment companies based on clustering results is four segments. The naming of the segments is best, frequent, low, uncertain.
Classification results used to predict the segment which will be used to generate IF-THEN Rules to customers based on LRFM variables.
Used IF-Then rules can make the company easier to identified the customers based on their criteria in each LRFM variable. New customers or old customers can be identified using IF-Then rules method.
Strategy recommendations can be used by the company to improve services to customers and can improve profits. The strategy also can make the company to win the market share from competitor.
This study also can be use for other medical company to get valuable customers for the company with different market conditions.