Segmentation Analysis of Students in X Course with RFM Model and Clustering

In the business world, the competition to maintain and obtain more customers has become tougher. The presence of new players entering the market is driven by the developments of internet and advertisement. The X guitar course is an institution engaged in the field of non-formal education services. The customers are the course student that has made the payment transaction. The map of customer segmentation is one of the most important components in finding the main needs of each customer. Know the main needs of each customer is expected to increase the customer’s loyalty. Customer segmentation can be done by using the clustering method through a data mining approach in the form of RFM (Recency, Frequency and Monetary) Models. Recency is the data of the last payment transaction date. Frequency shows the number of course payment transactions. Monetary comes from the nominal amount of the transaction. RFM data is combined with the Fuzzy Gustafson-Kessel and K-Means clustering method to produce output in the form of k-clusters of customer. The formed segment is expected to represent the need of customers that vary by using validation process with the Global Silhouette Index. The customer population of the course is 225 students. It has been concluded that the RFM score for each subject by using 3 FGK clusters is the optimum cluster model with the largest Silhouette Index, which is 0.523. This research is expected to provide an in-depth analysis of customer segmentation for X guitar course.


Introduction
The competition in the business world mainly focused on maintain and obtain more customers. Every business player is required to be able to observe the customer's changing needs. The tight competition can be seen from the increasing number of new players entering certain Industries, which has been driven by the development of internet technology. The development of the internet in Indonesia encourages a variety of innovations in various lines of social life, including business activities. The internet is a growing technology and is able to develop new business relationships and market opportunities. (Pratminingsih, et al, 2013) states that the internet is a useful tool for gathering information on customers, competitors and potential markets and it can inform about various products and services (S.Pratiminingsih et all, 2013). Agency on Statistics (Badan Pusat Statistik) state that SMEs are businesses with a net worth of at most IDR 200,000,000 excluding land and buildings for business premises (Rahmana A, 2008).
The creative economy has quite a promising potential to support the national economy (B.E Kreatif, 2017). The economic Gross Domestic Product (GDP) that emerged from creative ideas reached IDR 1,009 trillion in 2017, an increase from IDR 922.59 trillion in the previous year. The number of workers involved in the creative economy in 2016 reached 16.91 million workers and increased to 17.43 million workers. Until the end of 2018, the contribution of the creative economy to national GDP is estimated to reach IDR 1,105 trillion and will again increase to IDR 1,211 trillion by 2019. Technology reflects the potential of the digital economy such as e-commerce, online game services, food delivery services and digital video services that can drive the growth of the creative economy. The government targets the value of Indonesia's digital economy in 2020 to reach the USD 130 billion or equivalent to IDR 1,888 trillion. This value is equivalent to 11% of the national GDP. The creative industry in the music sector is one of the fast-growing MSME subsectors. This was reinforced by the increase in the contribution of the music subsector by 7.26% to the GDP of the creative economy in 2015. Non-formal education is an educational activity organized outside the formal education system. The course is an educational activity that takes place in the community carried out deliberately, organized, and systematically to provide one or a series of specific lessons to certain adults or teenagers in a relatively short time (B.E Kreatif, 2017). One form of non-formal education is music courses.
Music is a work of sound art in the form of songs or musical compositions, which express the thoughts and feelings of the creator through the elements of music, namely rhythm, melody, harmony, song form/structure, and expression (D. Jamalus, 1988). As part of human life, music is studied in the existing social environment. Humans use and facilitate music as a situational factor for social development (V.J Koneni, 1982). As part of the culture, the development of music is very dynamic and triggers market demand. A course institution is needed to accommodate market demand. Supporting data needed at the spatial level of the region can determine the sustainability of market demand, especially in a big city such as Surabaya.
Economic growth is one of the macro indicators to see the real economic performance in a region. The rate of economic growth is calculated based on changes in the GDP based on the constant price of the relevant year against the previous year. Economic growth can be seen as an increase in the number of goods and services produced by all business activities of economic activity in an area over a period of a year. From 17 existing economic business sectors, there are 4 samples of business sectors in Surabaya which is shown in Table 1. In Table 1, one of the business sectors experienced positive growth while the rest experienced contraction. The variance in Table 1 measures the range of GDP growth rate is in each business field. The education service sector has a growth variance of 0.05 and is the lowest when compared to sixteen other sectors. This indicates that the education service sector has more stable business sustainability.
The X course that was established on January 17 th , 2017 is a business engaged in the creative economy in the non-formal education services sector. Business owners see the opportunity for many people in Surabaya who want to learn music, especially guitar, but is constrained by several factors. The main factor is the high price of guitar courses. The X course came to bridge the demand by offering more affordable course fees for Surabaya citizens. Students can take courses in tutoring places or call the teacher to come to their house. In addition, from January 1 st , 2019, the X course opened another service division, namely the Research Consultation Service to assist students in conducting research both online and through face-to-face.
At present, the X course is centered in Pucang Anom Timur, Gubeng District with two branches which are in Pondok Rosan, Wiyung District and Simorejo Sari, Sukomanunggal District, all in Surabaya city. As of October 31 st , 2019, there were 225 students who had taken courses with 52 active students in October 2019.
Every day there was a guitar learning class (excluding holidays on Tuesday), in which either the teacher came to the student's house or the students came to the tutoring place, with a turnover of around IDR 12 million per month.
There were business competitors before the X course was established in Surabaya. All competitors certainly have an influence on the development of this business. Until now, the X course has not yet implemented a specific business strategy to retain and satisfy students. Therefore, a retention strategy by maintaining student loyalty, which is to be carried out in this research, is expected to reduce the effects of competitors. The strategy of obtaining and retaining certain customers (students) is considered important to create added value for both the company and the customer (S.I Shim, W.S.Kwon and S.Forsythe, 2013).

Business
Sector 2014  Customer segmentation is an alternative in treating individual customers (K. K. Tsiptsis dan A. Chorianopoulos, 2011).
The X course is one of the creative business fields that need to manage relationships with its customers.
The owner cannot directly meet with all customers but can only meet with some customers and all teachers.
Therefore, the teacher as a "distributor" plays an important role in helping the achievement of student targets.
Not all students are registered students who often make course payments (transactions). The company has never analyzed student behavior in paying for courses, so no strategy has been launched in maintaining student loyalty. The relationship between the owner and the teacher also depends very much on subjective communication. There are adverse effects if business owners erroneously plan and implement student retention strategies, which includes losing students and transferring students to competitors.
The problems experienced by the X course in managing customer relationships can be solved by the segmentation process that is extracting student payment history data at a certain period. Students will be grouped into several segments which are distinguished based on student behavior in making course payments.
This student behavior can be described O.2013). The formed segment is expected to represent varied consumer needs. The observation object in this research is the X guitar course service in Surabaya. The selection of these services is based on the analysis of Table 1 where the guitar course is one of the educational services.
Previous research regarding the application of customer segmentation, customer satisfaction and guitar course case study has been carried out. Wei, et al. (2016)  This research proposes an analysis of segmentation and satisfaction of students in the X course by using the RFM model and clustering. The customer segmentation carried out in this research aims to be a robust method of representing various customer needs. This research is expected to contribute to the development of a more targeted and optimal CRM (Customer Relationship Management) system in the X course, therefore helping the X course in increasing nominal payment transactions by valuable customers and retaining customers who are making a large profit contribution to the course.

Customer Relationship Management
Based on the SIPOC (supply, input, process, output, and customer) diagram, the customer is the party that uses the output of the process. Each customer has its own characteristics, and strategies are needed in managing relationships with customers. Customer relationship management, often known as CRM, is one of the strategic approaches in handling proper relationships with key customers and other customer groups.
CRM provides a better opportunity for business industry players to use available information to find out the type of customer and create added value for each customer. There are three phases in managing customer relationships, which are (R. Kalakota dan M. Robinson, 2001): 1. Acquire, which is a phase of the company's strategy to get new customers. Generally, the acceptance of the best services and the company's superior products determines the number and frequency of new customers; 2. Enhance, is a phase of the company's strategy to increase profitability or increase profits from existing corporate customers. Business people can make efforts to establish long-term relationships with customers; 3. Retain, namely a phase of the company's strategy to retain customers who have high profitability. This phase focus on providing whatever the main customer wants, not based on market demand. • Most Valuable Customer (MVC), is the customer who has the highest value in business sustainability.
This customer group has the biggest contribution in providing benefits to the company; • Most Growable Customer (MGC), is a customer who has great potential to become an MVC in the future.
Often business people are not aware of customers in this group; • Below Zero (BZ), is the customer group that provides the least profit compared to other customer groups; • Migrators, is a group of customers that needs to be further analyzed so that the actual category can be known. This customer group is located between BZ and BWC.
Increasing customer profitability growth is one of the main targets of CRM. This goal can be achieved if the company is able to continue improving the ability to know and understand customer behavior. Therefore, we need a CRM strategy that is able to achieve those goals, one of which is by using customer segmentation.
The customer segmentation aims to match the potential of these customers with the undertaken services and marketing strategies so that the provided marketing strategies and services are effective (M. J. Berry dan G. S. Linoff, 2004).

Customer Segmentation
Customer segmentation is the process of distinguishing customer profiles and characteristics. The segment formation by using the help of data mining can enrich the segmentation results (K. K. Tsiptsis dan A. Chorianopoulos, 2011). The purpose of segmentation is to adjust products, services, and marketing messages for each segment (M. J. Berry dan G. S. Linoff, 2004). Customer segmentation is a preparatory step for classifying each customer according to the specified customer group (S. Jansen, 2001). The segmentation process places customers according to the characteristics of similar customer groups. Customer characteristic variables are as follows (M. J. Berry dan G. S. Linoff, 2004) • Demography, including age, gender, number of family members, the extent of residence, income, profession, last education, homeownership status, social status (position or title), religion and citizenship; • Physiography, including personality type, lifestyle, moral values; • Behavioral, including the principle of the benefit of the product/service sought, the purchase status, the level of use of the product/service, the frequency of purchase; • Geography, including country, province, city, district, postal code, climate.  Linoff, 2004). In addition to external or market research data, transaction and customer payment data can also be used to gain insight into customer behavior.
This way, the segmentation will allocate customers to form groups based on the amount of their expenses.
This can be used to identify high-value customers and prioritize services (K. K. Tsiptsis dan A. Chorianopoulos, 2011). The company needs to know the customer profile. Customer profiles are very closely related to these customer segments (A. M. Scridon, 2008). Several strategies for analyzing customer profiles are as follows: • RFM analysis, is one of the most commonly used types of customer profiling. RFM is a method used to segment based on the time of the customer's last transaction, usually not taking into account the nominal of the transaction made; • Demographic analysis, is very closely related to the geographical or location of the customer originates.
However, in some demographic research, this can also be interpreted to segment according to age, gender, income, and marital status; • Life stage analysis, is an analysis related to customer behavior. The behavior of each customer is certainly different, therefore it is very interesting to be understood by the business players.
The geographic differences in customer locations have become an important practical component of marketing strategies. This is largely due to organizational expansion goals which force managers to consider the increasingly complex delivery and advertisement system layout of new product launches and management (B. J. Bronnenberg dan P. Albuquerque, 2003). Researchers in the fields of marketing and economics have developed an interest in the spatial aspects of growth and market structure. The resulting research tradition has been called "the new economic geography".
This flow of research began in 1970 in the field of industrial organizations. The X guitar course needs to treat their customers differently because of different customer locations. Therefore, the map will be formed based on the distribution of students in each segment so that it can be a reference for service providers in treating customers with different geographies.
This flow of research began in 1970 in the field of industrial organizations. The X guitar course needs to treat their customers differently because of different customer locations. Therefore, the map will be formed based on the distribution of students in each segment so that it can be a reference for service providers in treating customers with different geographies.
The process of studying customer behavior is one of the challenges of a company's marketing team (J. Blythe dan P. Megick, 2010). There are many ways to understand customer behavior, one of which is customer behavior in making transactions or the customer's time in conducting transactions. To help with this analysis, this study will display customer transaction habits per week. This will make it easier for the X guitar course in understanding the behavior of customer habit related to certain day/week/month where there are more or fewer customer making

transactions. RFM (Recency, Frequency & Monetary) Model
The object of observation in this study is the X guitar course service in Surabaya. The formed segment is expected to represent various consumer needs. The RFM scale attribute can be seen in Table 2. The RFM model was originally developed by (A. Hughes, 1994) and (J. R. Bult dan T. Wansbeek, 1995) to differentiate customer profitability based on several attributes, which are the length of time the customer is active, the frequency of customer payments and the nominal amount of money paid by customers. The following explanation of the attributes of the RFM model: • Recency of the last payment (R), is an attribute that states a reviewer or the distance between the customer's last payment date and the current date. If the interval is closer to the current date, the R score gets higher; • Frequency of the payment (F), this attribute shows how often customers make payments. For example, customer A makes payments four times a month more often than customer B who makes payments once a month. F scores will be higher if the frequency of payments is more frequent; •

Clustering Analysis
Cluster analysis is a technique for grouping data according to certain characteristics (R. A. Johnson dan D. W. Wichern, 2007). The result must have high homogeneity within a group and have high heterogeneity between groups. Cluster analysis will allocate a group of individuals to independent groups so that the individuals in the group are similar to each other, while the individuals in different groups are dissimilar (S. Sharma, 1996). This grouping is usually called a partition (B. Ruswandi, 2008). The similarity measurement that can be used is the euclidean and mahalanobis distances.
The clustering methods can be grouped based on its distance measurement technique. This distance-based method consists of hierarchical methods (agglomerative), which is including complete linkage, average linkage, and Ward method, and also the non-hierarchical method, which is including K-Means clustering (M. R. Anderberg, 2014). Hierarchical clustering and K-Means clustering only pay attention to the size of the distance between the objects of observation without considering other statistical aspects, such as the distribution of data or the objects on overlapping clusters.
The K-Means clustering is a very popular and common method. This method groups objects into k clusters and the division of clusters is based on differences in the average value of an object to the center of the cluster. ) f    1. Enter number of data that will be grouped.
3. Set the initial objective function as 0 and the iteration as 1.
4. Form the matrix as the initial partition matrix.
Calculate the center of each cluster k ( ).
Performance measurement of clustering results is a method to determine the validity of the clustering. One of the evaluation methods for measuring the clustering performance is the global silhouette, which formula is shown in Eq. (7) [36]. This method used to evaluate the quality of clusters produced from the clustering process. The validity of clustering can be seen by the level of optimization of a cluster and homogeneity among cluster members. The value of the silhouette ranges between -1 ≤ ≤ 1, where the results of clustering are good if the silhouette value is positive (0-1) [35]. This indicates that the data is in the right group.:

Research Methodology
There are several steps of analysis used in this research, which is shown in Figure 2. In general, the flowchart diagram of this research is based on KDD (Knowledge Discovery on Database) or data mining.
There are 3 steps in KDD, namely the preprocessing, the data mining and the postprocessing [10]. Data mining is discovering new information by looking for certain patterns or rules from a very large amount of data [19]. Data mining plays a role in the process of finding interesting and hidden patterns of a large data set stored in a database, data warehouse, or other data storage [19]. In this research, we collected secondary data from the X guitar course, a service provider for guitar learning services that was operating since January 7 th , 2017 in the Surabaya city. The data includes historical data on student payment transactions for courses from February 9 th , 2018 until October 31 st , 2019 with a total of 634 payment transactions from 216 unique users registered as course students in that period. The data is presented in an Excel file with the attributes of the transaction date, student's home district, student's name and nominal of payment.
There are four research variables that are used in research, which are shown in Table 3. The payment date variable is ordinal scale due to the order of differences between dates. The district and name of the student variable have a nominal scale because there are no degrees of difference. Because there is an absolute value, the nominal of Payment has a ratio scale. The first variable has a format explaining the year, month and date adjusting of the transaction. The second and third variables have general format types (no special treatment).
The fourth variable has a number format because it contains the amount of money in IDR.
The first step is the preprocessing of secondary data. This step focuses on data cleaning, which includes adjusting the input format of the Excel file to the RStudio program. Each research variable derived from secondary data must be in accordance with the input format. The first predictor variable ( 1 ) is the date of the occurred transaction date which is initially irregular with the format of dd-mm-yy (date, month, year) that must be changed to the yyyy-mm-dd format. The second predictor variable ( 2 ) is the district of students. The third predictor variable ( 3 ) is the student's name which is initially recorded very irregularly (informal recording) so a more formal format by using an initial capital letter is needed. The fourth predictor variable is the nominal of payment, which was originally in the form of currency IDR XXX,XXX was changed to the XXX (in 1,000 IDR) number format.
The next step is the secondary data processing stage (data mining). The processing is divided into 2 phases, namely the RFM modeling which followed by the clustering process. All phases are carried out by using RStudio programming. RFM modeling phase begins with the input process by reading data from CSV and The RFM table contains columns for students' names, number of recency (in days), number of transactions, nominal of payment for each transaction, recency score, frequency score, and monetary score.
After the RFM scores have been obtained, the next phase is the clustering phase to segment the customer based on its similarities. This phase aims to group 216 students into several segments based on the respective R, F, and M scores. This phase uses FGK clustering and K-Means clustering for comparison.

Collection of Secondary Data
The collection of customer's data is an important thing in the business industry. In this research, the main objectives for collecting data for business purposes are as follows: 1) increase the nominal payment transactions by valuable customers; and 2) retain customers who make a large profit contribution to the course.
Those two business objectives need to be linked to the customer relationship management strategy in the customer segmentation process and embodied in the business objectives in the data mining process. This customer segmentation aims to build consumer profiles related to customer payment patterns and transaction  Table 5. Second input of payment transactions after data cleaning history so that customers a e (valuable) and unprofitable segments. The example from 642 rows of initial input data for student payment transactions obtained from the X course can be seen in Table 4.

Preprocessing
This process is the attribute selection and cleaning data stage. From a total of 5 attributes contained in the raw data, 4 attributes will be selected. Attributes that will be used for the next process are transaction date, student's home district, student's name, and amount of payment. In the data cleaning stage, there were 8 transaction lines that are not entered because the location of the student districts was outside the target marketing areas such as Blitar, Lamongan, Pamekasan and Makassar. The example of 4 attributes and 634 rows of RFM input data in the form of student payment transactions obtained from the X course can be seen in Table 5.

Data Mining
After the second input of the payment transaction is completed, it will be carried out to get a score each stage of R, F, and M by using RStudio software. The R (recency) value is the difference between the current time (November 5 th , 2019) with the last time each student made a payment transaction. The R value obtained by using LUBRIDATE function. The F (frequency) value is a value that illustrates the number of payment transactions made by students. The F value is obtained from calculating the number of transaction dates with the COUNT and GROUP BY functions. The M (monetary) value is the total cost paid by students. The M value is obtained by adding up the cost with the SUM function. After the script above is run on RStudio, it will produce an output in the form of RFM scores for every 216 students of the X course which can be seen in   RFM score becomes the input at the next stage, which is the clustering stage. The clustering stage aims to group 216 students into several segments based on the respective R, F, and M scores. The number of segments to be formed for comparison is 3, 4 and 5 clusters or segments. The clustering methods are of Fuzzy Gustafson-Kessel Clustering and K-Means Clustering which performance will be compared by using global silhouette.
The highest or optimum silhouette value is 0.523 which is obtained by using the FGK method for clustering into 3 segments. Table 7 shows the example of cluster members obtained by using the FGK method for clustering into 3 segments. There are 70 students as the members of segment 1, 83 students as the members of segment 2, and 63 students as the members of segment 3.

Postprocessing
Visualization aims to help decision-makers (the X course) in analyzing customers. There are several visualizations used in this research, which are: 1. Boxplot, which describes the form of the population distribution of each segment in the form of skewness, the size of the central tendency and the size of the distribution of observational data; 2. Monetary segment size, to ensure that the logic used for transaction customer classification is sound and practical; 3. RFM score distribution, which aims to see the tendency of certain RFM scores to become members of certain segmental; 4. Calendar heatmap, useful to see the behavior of customer's payment time; 5. The 2D plot, to show the closeness between the object of observation; 6. Hotspot map, which is a geographical visualization of students as customers. Figure 3 is a combined boxplot for recency, frequency and monetary. The leftmost boxplot shows recency where segment 1 has the narrowest whisker compared to other segments and has slices with segment 3 which has the longest whisker. There is a similarity between the newness of students in segment 1 and segment 3. The middlebox plot shows the frequency where segment 3 has the widest and highest whisker. This emphasizes the striking difference between segment 3 and other segments. The rightmost plot shows monetary where segment 3 has the widest and tallest whisker. The middle value in segment 2 is lower than segment 1. This indicates that the money paid by students in segment 2 tends to be lower than segment 1.
The biggest median is in segment 3 with transactions amounting to IDR 1,400,000 then followed by IDR 412,500 in segment 2 and IDR 280,000 in segment 1.  Based on Figure 7, there are differences between segments 1, 2 and 3. The larger dot circle indicates that the amount paid by the customer will be even greater and vice versa. The red color represents the members in segment 1, then the green color represents the members in segment 2 and the red color represents the members in segment 3. It appears that segment 1 has a low recency-frequency tendency, segment 2 has a high recency tendency but a low frequency is the opposite recency-frequency segment 3 highest. There is a tendency that the lowest RFM score of 18 categories of RFM scores has been entered into segment 2 (111, 112, 113, 121, 122, 123, 132, 133, 134, 211, 212, 213, 214, 222, 224, 233, 311). While there is a tendency that the highest RFM score has entered segment 3 even though there are 2 students who come from 2 categories of low RFM scores (145 and 154) so that the initial conclusion can be drawn that segment 2 tends to be opposite to segment 1 and segment 3.
Based on Figure 9, it can be seen that students' geography has spread. According to interviews with The X course, this geographic analysis can be utilized in determining promotion and retention strategies if there are specific patterns in certain segments. The dot that is colored red, green and blue contains a mixture of more than 1 segment. The red dot is not clearly visible, while the blue dot is clearly visible in the district of sawahan, which indicates that almost all students living in the district of sawahan are members of segment 3.
The green dot looks spread without any specific pattern, so there is no need for a special promotion strategy.
The following is a summary explanation of the descriptive statistical characteristics of each segment to identify the most valuable customers (segments).
Based on Table 8, there are differences in valuable segments when viewed from each indicator. Recency is better if the average, median and payment time span is shorter so that students who are listed in segment 1 are the most valuable customers. While the frequency and monetary are getting better if the average, median and range are getting bigger so that students who are enrolled in segment 3 are the most valuable customers. The following are suggestions that can be considered for future works: 1. The attribute that can detect the length of a customer's membership is needed. This attribute can facilitate the X course in knowing the loyalty of students following the course.
2. Monetary attributes can be changed to profit. The largest monetary contribution can still be defeated with the largest profit contribution. In the future, The X course needs to add expenditure aspects so that profits can be analyzed.
3. Dynamic visualization is needed so that the data can be changed immediately which will automatically change the output as well.