Analysis of Song Popularity in Business Digital Music Streaming for Increasing Quality Using Kohonen SOM Algorithm

Consumption of digital music services has grown dramatically in recent years. There is an increase in music streaming consumption from 2015 to 2016, which is 76.4%. One of the most popular music streaming services, Spotify, has experienced an increase in customers from year to year. This increase enables businessmen / music producers to increase their business profits by analyzing music / songs to find out the audio attributes that make the song enjoyable for many people. Processing and analysis data are using Kohonen SOM Algorithm. The function is to find out which audio attribute groups are most liked by Spotify users where a good music is a music that can be used as a therapy. The result is LR = 0.1, PLR = 0.9, and epoch = 70 - 500, it can be concluded that cluster 2 is the cluster that has the most number of streams with 27 songs where the smallest DBI value is obtained at epoch = 200. Thus, with the statistic analysis, the obtained information is; it is expected that businessman / music producers can increase their business profits by improving their music quality that focus on songs with modes = 0 (Minor) and loudness features

I. INTRODUCTION 1 In this era, digital and creativity have a big influence on the development of the music industry. Music has become an integral part of our culture throughout human history or it can be said that music nowadays has become our lifestyle. With the development of technology, listening to music can be done anywhere and anytime using a variety of media such as radio, music players or digital music applications (streaming music online). Users can access a lot of tracks from computers that are connected to the official web and no cost on music streaming services. The music trade is desperate need of this service because it is seen as a window to the longer term of its business.
According to research conducted by Felix Richter, consumption of streaming music online has increased by 76.4% from 2015 to 2016, while sales on CD albums, digital, and each track has dropped to 16.3%; 20.1%; and 25.0% [1]. In 2017, the music industry generated $ 8.72 1 Chyntia Kumalasari Puteri is with Department of Management Technology, Institut Teknologi Sepuluh Nopember, Surabaya, 60111, Indonesia. E-mail: chyntiaakp@gmail.com. 2 M. Isa Irawan is with Departement of Mathematics, Institut Teknologi Sepuluh Nopember, Surabaya, 60111, Indonesia. E-mail: mii@its.ac.id. billion in the United States. In 2019, it is expected that the revenue prediction from the digital music segment will reach US $ 12.424 million. This revenue is expected to generate a market volume of US $13,627 million by 2023. For now, the largest market segmentation is the use of music streaming with a predicted market volume of US $ 10.472 million in 2019 [2]. So, with the growth of music streaming services (Spotify, Apple Music, etc.), the music industry continues to grow where popular songs secure the largest share of the revenue generated [3]. With more than 24 million active users worldwide and nearly 6 million paying between $ 5 and $ 10 per month to use Spotify, Spotify is that the largest music streaming service within the world [4].
This increase allows business people / music producers to increase their business profits by analyzing the music so that the music produced has good quality music. Music itself has one function, namely as a means of therapy. Music therapy is widely used to overcome various problems such as to reduce stress [5], improve wellbeing [6], and so on. So it is very good for business people if they can develop their business profits by analyzing music so that the music can be liked by listeners so that it can also be used as music therapy.
Every song that is owned by Spotify has 13 audio features. These features have been provided by Spotify. The data used in this study is the Global Top 200 song. Therefore, to find out the audio features that are very preferred by the user community can be done by processing and analyzing data by utilizing information and communication technology especially in the field of data mining, namely clustering methods. The most commonly used method and the most ideal method in clustering analysis is the Kohonen Self Organizing Maps (SOM) method. Clustering with Kohonen SOM is able to group documents with similar contexts and contents [7]. This analysis is predicted to be ready to provide recommendations / suggestions for music producers so as to boost the quality of music to be better music by becoming a means of therapy for the community.

A. Music as a Theurapic Tool
In its development, music always develops following the active development of the community. Music used for healing in its development inspired the emerged of music therapy. Music therapy is a therapy thats distributed using music and music activities to facilitate the therapeutic process in helping his clients. As with therapy which is an effort designed to help people in a physical or mental context, music therapy encourages clients to interact, improvise, listen to, or actively play music [5].
So far music therapy has been widely used to overcome various problems such as to reduce stress [8]. Music is also used as a medium to improve well-being [6], and as a medium of intervention to develop the abilities of autistic children [9]. In addition, music also provides relaxation media with communication through rhythm, listening to music, non-verbal cues, exploration, movement, and improvisation [10].

B. Spotify
As a music streaming platform, Spotify application can be used on a variety of digital devices such as cellphones, desktops, tablets, speakers, smart tv, to Bluetooth-based audio devices. Spotify also provides features, such as Top Charts by Country, Viral 50, and many more. The following are 13 audio features on the Spotify platform [11]. 1) Danceability is how much the song suits for dancing in the range of 0 (the least suitable for dancing) up to 1 (most suitable for dancing). 2) Energy, energy in a song indicates the intensity of emotions and stresses in the song. Energy is calculated on a scale of 0 to 1 [12]. Energy in a song makes the listener feel energetic, or vice versa. 3) Key maps the tone of the song using the standard Pitch Class notation Class ( 0 = C, 1 = C♯ / D ♭, 2 = D, etc ). 4) Loudness is the song noise level in decibels, where the average score of all songs is compared with relative loudness with a typical score of -60db and 0 db. 5) Mode is the modality of a song, where major is delineated by 1 while minor is represented by 0. 6) Speechiness is the appearance of words spoken in a song. If the range is 0.33 to 0.66, the song can consist of music and speech; > 0.66 indicates that the track can consist of spoken words and <0.33 indicating that the song is most likely to represent music without a human voice. 7) Accousticness is music level in songs without electronic amplification in the range 0 to 1. 8) Instrumentalness is the number of human songs contained in a song in the range of 0 to 1. The higher the value, the higher the probability that the song does not contain vocals. 9) Liveness measures the presence of viewers during a recording, where higher aliveness scores increase the chance that the song is live broadcast. 10) Valence shows the attitude of a song with features that are used to describe positive or negative emotions. In Spotify, the valence is 0 to High valence songs sound additional positive like comfy, cheerful, or happy. Conversely, songs with poor valence are going to be additional negative such as woeful, oppressed, or mad [13]. 11) Tempo is the approximate speed of songs in BPM.
Tempo described the relationship between velocity and intensity of emotions in a song. A high tempo in a song generally has a higher and faster energy compared to songs that have a low tempo [14]. 12) Duration is the track length in miliseconds. 13) Time signature is an estimate of the overall time mark of a song track.

A. Kohonen Self Organizing Maps (SOM)
In 1982, Kohonen Self Organizing Maps (SOM) or Kohonen Network was first introduced by Professor Teuvo Kohonen. SOM is one method in the Neural Network that uses unsupervised learning [15].
The Kohonen SOM architecture consists of two layers, namely the input and output layers. Each neuron within the input layer is connected to each neuron in the output layer that show in Figure 1. Each neuron within the output layer represents the cluster class of the input given. The number of processing neurons is K groups, each input data and weights have a relationship to each neuron. The Kohonen SOM architecture can be described topographically, so that in order to be able to provide a grouping visualization, each neuron in the SOM Kohonen will represent one group. (1) 4) Repeat steps four to step six. 5) Calculate the distance of the input vector to the connection weights for each output neuron using Eq. 2: (2) 6) Determine the index j in such a way as to the Minimum 7) Make improvements to the values of for each unit j around J by using Eq. 3: 8) Modification of the pace of learning α with Eq. 4:

B. Davies Bouldin Index (DBI)
DBI was introduced by David L. Davies and Donald W. Bouldin in 1979 that are used to appraise clusters. To calculate the validity, the results of clustering that are done, how the good results are using the amount and spinoff options of the data set. Sum of Square Within Cluster (SSW) as a cohesion metric in a cluster i is developed by Eq. 5 [16].
While the metric for separation between two clusters, as an example cluster i and j, is used the formula Sum of Square Between Cluster (SSB) by measurement the gap between centroid / metric dan. As in the following Eq. 6: Then is the ratio of the value of the comparison between the i and j cluster. Value is obtained from the components of cohesion and separation. A good Cluster is one that has the smallest cohesion possible and biggest (as wide as) possible. can be formulated by Eq. 7 bellow.
While to calculate the Davies-Bouldin Index (DBI) value obtained from Eq. 8 of the following: So, the DBI value can be obtained from the average value of. From the calculation requirements defined above, it can be observed that the smaller the SSW value, the better the clustering results obtained. Essentially, DBI demands the smallest possible value (non-negative ≥ 0).

C. Multiple Linear Regression Analysis
Multiple linear regression analysis is an analysis that has more than one independent variable. The multiple linear regression technique during this study is employed to see whether or not there are effects of two or a lot of independent variables ( 1 , 2 , 3 , … , 12 ) on the dependent variable (Y).
The purpose of this process is to understand th(e3) link of two or a lot of variables to predict variable dependent scores based on scores that are known linearly from independent variables [17]. Therefore, multiple linear regression models for 13 variables in this study can be shown in Eq. 9: In the multiple linear regression, there is a part of test called Partial Regression Coefficient Test (T Test). The T test is used to see whether or not the influence of every independent variable has a significant impact on the dependent variable. This test is carried out using criteria below: 1) If the probability value is sig > 0.05 means no significant impact 2) If the probability value is sig < 0.05 means that it has a significant impact, or 3) If tcount > ttable then H0 is rejected and H1 is acceptedIf tcount < ttable then H0 is accepted and H1 is rejected with: H0 : Partially there is no significant impact H1 : Partially there is a significant impact

A. Data Collection
The data used is in the form of audio features of songs obtained from Spotify, an online repository of music

B. Flowchart Research
The following is the flow of this research:

1) Data Normalization
At this stage data normalization will be carried out before data becomes input in the clustering process. Data is normalized in the range 0-1.

2) Clustering with Kohonen Self Organizing Maps (SOM)
At this stage, clustering will be carried out using the Kohonen SOM algorithm. The input from this process is the Top Global Charts 200 Data that has been normalized. The output of this process is clustering from the Top 200 Global Charts Data. Then ranking on each cluster is to find out which cluster has the most number of streams.

3) Validation of Clustering with Davies-Bouldin Index
(DBI) At this stage, after the clustering process is complete, a validation process will be carried out from the results of the clustering that has been done. The input from this process is the cluster results from Top Charts 200 Global. The output of this process is the minimum DBI value.

4) Analysis to Find Out the Influential Audio Features
with Multiple Linear Regression Analysis At this stage, analyzing on cluster which has the most number of streams using multiple linear regressions is done and testing the results of the coefficients using the T Test (Partial Test to find out whether the audio features have a significant impact on the number of streams).
Based on the explanation of the process above, it will be illustrated in the form of a flowchart that shown in Figure  2.

C. Kohonen SOM Clustering
In this clustering process the initial stage is to determine the parameters so that the resulting clustering is the best that will be validated using the Davies-Bouldin Index. Thus, the initial parameters used from the Kohonen SOM algorithm in this study are as follows:   Table 1 above, it will be seen that from the Top 200 Global Chart songs if it is grouped into 4 Clusters, the songs are spread evenly into the clusters where Cluster 1 has 61 Global songs, Cluster 2 has 27 Global songs, Cluster 3 has 59 Global songs, and Cluster 4 there are 53 Global song.
Then some of the results of clustering will be shown for each cluster above so that readers can see the cluster into what mode category and the number of streams that will be shown in Table 2 for cluster 1, Table 3 for cluster 2,  Table 4 for cluster 3, and thelast Table 5 for cluster 4.

The 1 st International Conference on Business and Management of Technology (IConBMT)
August 3rd 2019, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia Based on Table 2, with clustering using Kohonen SOM, the result is cluster 1 is a group of songs with mode = 0 (Minor). At the same time in Table 3, with clustering using Kohonen SOM, the result is cluster 2 is a group of songs with mode = 0 (Minor) such as cluster 1. Then in Table 4, with clustering using Kohonen SOM, the result is cluster 3 is a group of songs with mode = 1 (Major). And the last one is in Table 5, with clustering using Kohonen SOM, the result is cluster 4 is a group of songs with modes = 1 (Major) such as cluster 3.
So, to clarify/shorten the results of clustering with Kohonen SOM Algorithm as in Table 2, Table 3, Table 4 The 1 st International Conference on Business and Management of Technology (IConBMT) August 3rd 2019, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia and Table 5, can be seen in Table 6 below to find out the details of each cluster. Next, ranking will be done supported the quantity of streams from each cluster that has been obtained as in Table 1 which will be calculated based on the average of each cluster so that we can find out which cluster has the highest number of streams. Because this study using 4 clusters, the ratings for this ranking are divided into 4 streaming categories, they are: Many Streams, Medium Stream, Enough Stream, and Few Stream. The results of cluster ranking based on the number of streams are shown in Table 7.  Table 7 on top of, it will be terminated that the cluster that has the highest number of streams is cluster 2, which is 0.881 with songs as many as 27 songs. To find out the percentage of each cluster, the researcher presents the data above in the form of a pie chart shown in the following Figure 3.

D. Clustering Validation
At this stage, the number of clustering will be validated using the Davies-Bouldin Index so that the best number of clustering will be obtained from the parameters above. The following will be shown the results of clustering validation using the Davies-Bouldin Index (DBI) in Table 8.

E. Analysis of Audio Features
Based on 4 clusters with 200 epoch produced using the Kohonen SOM Algorithm, an analysis of the results of the clustering will be conducted to find out from the 12 audio features provided by Spotify that focuses on cluster 2 with many streams, which audio features affect the listeners very much like the songs with the conditions in cluster 2 must be in mode = 0 (Minor) because the cluster results show that cluster 2 is a song category with the mode = 0 (Minor). Because in the last audio feature, called mode, the mode here only acts as categorical (Mayor and Minor). So, at this stage analysis will be carried out using multiple linear regressions with the following results of Table 9.

The 1 st International Conference on Business and Management of Technology (IConBMT)
August 3rd 2019, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia