Comparing the Result of Infrastructure and Facility Demand Distribution Similarity Test between Using Percentage Value and Real Value

Understanding the Infrastructure & Facility Demand Behaviour is important. Sometimes, comparing two Demand Behaviour need to be done. A method to make Distribution Similarity test has been developed for Transportation Trip Length Distribution Similarity test. This test can be used for comparing Infrastructure & Facility Demand Behavior. There are still questions on whether the test must be made upon the Distribution of Real Value or upon the Distribution of Percentage Value. The research result indicates that Comparing Distribution Similarity must use Distribution Percentage Value. The similarity must be measured based on Accepted Goodness of Fit measured in χ 2 Value, and Accepted Difference Value measured in Absolute Difference Value.


INTRODUCTION
In Infrastructure & Facility Asset Management (IFAM), Demand Behavior need to be well understood. Infrastructure & Facility (I&F) Planning, Design, and Operation must be developed, made and executed in accordance to the I&F Demand. Either the actual demand or the predicted demand, it depends on the case treated (Chilongola et al, 2020;Hamzah et al, 2020;Suprayitno et al, 2006;Suprayitno, 2020;Susanti et al, 2017;Upa & Setyadi, 2020;Valguna et al, 2020).
In I&F Demand Behavior Analysis, a lot of time, it is necessary to compare the Distribution of a Demand Behavior Characteristics between a certain case to another case. Examples of comparaison items can be on distribution of passenger ages, distribution of passenger genders, distribution of passenger occupation, distribution of passenger education level, distribution of trip purpose, distribution of travel distances, distribution of mode utilization, etc (Avecedo & Nohara, 2004;Suprayitno, Pambudi & Cahyono, 2017;Susanti et al, 2019;Susanti et al, 2020;Upa et al, 2018).
A method for Distribution Comparaison has been developed in Transportation Demand Modeling. This method is designated for determining the Minimum Number Sample for Trip Length Distribution Survey. Based on Goodness of Fit Statistical Test, the Number of Sample can be calculated. This is a trial-and-error method. The Demand Characteritics Similarity is measured based on Accepted CP&EV (Curve Pattern and Error Value), by using Goodness of Fit Statistical Test combined with an Error Acceptance Test (Blank, 1982;Siegel, 1956;Suprayitno, Ratnasari & Saraswati, 2017).
Certainly, this method can be used for comparing I&F Demand Behaviour Characteristics. But, to be used for IFAM Demand Behavior comparaison, in general, the method name should be changed into Accepted Curve Pattern Similarity and Absolute Value Difference (CPS+AVD).
The author, sometimes, are still questioned by the students, the researchers, or the academicians on the type of data, the calculation should be based on. The calculation should be based on the distribution value in percentage value or on the real value. The answer needs to be investigated.
This paper presents the investigation of distribution similarity test by using percentage distribution value and real distribution value.

RESEARCH METHOD
This research method is to investigate whether, in distribution similarity testing, it is better to use Real Value Distribution or Percentge Value Distribution. The Method of Accepted CP&EV was used to execute the investigation. Two Cases were tested, i.e. a Special Case and the Previous Work Case. The experiment upon two cases were finalized by a conclusion.

Experiment Objective
The experiment objective is to investigate whether, in case of comparing distribution characteristics similarity, is it necessary to use distribution on percentage value or to use distribution on real value.

Statistical Test for Comparing Two Distributions
Problem of Comparing the Similarity of Two Distributions is part of Statistical Inference. Two groups of Statistical Literatures are refered. It is written in these two groups of literatures that investigating Similarity of Two Distributions must use the same χ 2 Test. But there are certain differences between the two groups (Blank, 1982;Siegel, 1956;Engmann & Cousineau, 2011;Susetyo, 2010;Siregar, 2016;Purwanto & Sulistyastuti, 2017).
A statistical literature, a refered book, discussing the matter as a problem of Statistical Inference, called Goodness-of-Fit. It is about investigating whether two Discrete Distributions are from the same distributions or not. The test used is the χ 2 Goodness of Fit test. Where calculated χ 2 =  ((yoyr) 2 / yr), with df = kr -1 (Blank, 1982). A research has been done to compare two different tests for investigating the non-parametric distribution similarity. The two tests involved are the Anderson-Darling Test and the Kolmogorov-Smirnoff test. The research concluded that the the Anderson-Darling Test is more powerful than the Kolmogorov-Smirnoff test (Engmann & Cousineau, 2011).
In three refered Indonesian Statistical Books, the Statistical Tests to investigate the similarity of a Sample Distribution to the Reference Distribution are explained. It can be said that the data is classified as nominal data expressing categorical data or frequency data. Thus, the statistical test is to check whether the Observed Frequency is the same or not to the Expected Frequency. It is tested by using χ 2 test. Where calculated χ 2 =  ((fofe) 2 / fe) with df = n -1. As example, the three books explain the comparaison between Obsereved Frequency compared to Expected Uniform Frequency, as a special case of Two Distribution comparaison (Susetyo, 2010;Siregar, 2016;Purwanto & Sulistyastuti, 2017).

Travel and Tourism Behavior Characteristics Comparaison
A lot of researches on Travel and Tourism Behaviour Characteristics have been done. Several of them can be mentioned as follows: urban bus travel behavior characteristics, commuter train travel behavior characteristics, tourism voyage characteristics, and others.
Apart from those, comparaison on travel and tourism behavior have been done. Among others, a comparison between Trans Maminasata and Trans Koetaradja user trip behaviors, a comparaison of influence area for motorcycle trip, a comparaison of tourism voyage characeristics between the young voyagers nd senior voyagers (Avecedo & Nohara, 2004;Suprayitno, Pambudi & Cahyono, 2017;Susanti et al, 2019;Susanti et al, 2020;Upa et al, 2018).

Distribution Similarity Test
A method to compare Distribution Similarity has been developed. This method is designated for determining the Minimum Sample Size. According to the Statistical Theory, the Distribution Similarity must be checked bu using Goodness of Fit test based on  2 test. But experiments indicate that sometimes even if the result of the Goodness-of-Fit is good, the Error can still be high enough. Therefore, the Distribution Similarity for that purpose is added by Accepted Error Value test, based on mean absolute error. Those two tests are presented as follows (Suprayitno, Ratnasari & Saraswati, 2017;.

Goodness-of-Fit Test
Goodness of Fit test is a statistical test to investigate whether two Distributions can be considered as the same Distributions or not. The test used is the χ 2 test. The Goodness of Fit test is presented as follows (Suprayitno, Ratnasari & Saraswati, 2017;.
Where : χ 2 = calculated  2 value.  2 0 = reference  2 value, on certain degree of freedom and significance level. y i = the tested y value y0 i = the reference y value n = number of samples k = number of cases υ = degree of freedom  = significance level

Acceptable Error Value Test
Acceptable Error Value test is to investigate whether the Absolute Error of the Sample Distributions, compared to the reference distribution, is acceptable or not. Error of 2%, 5% or 10% are normally used as an acceptable threshold, depend on the case (Suprayitno, Ratnasari & Saraswati, 2017;.
H0 : if | ̅ | < e0, the error is accepted. H1 : if | ̅ | > e0, the error is not accepted. Where : | ̅ | = absolute difference value | ̅ | = the mean of absolute difference value e0 = accepted difference value n = number of distribution values (in percentage) yi = the tested distribution value y 0 i = the reference distribution value

Research Cases
In order to get a clear explanation of the problem, for this research, two cases were taken and investigated. Those are the Special Case and the Previous Case (Suprayitno, Ratnasari & Saraswati, 2017)     This special phenomen, that the Test Results are different between by using Distribution of Real Value and by using Distribution of Percentage Value can be explained through Figure  1 and Table 4 as follows. The Real Value Distribution Case has 3 different Distribution Graph. In fact, those three graphs have the same pattern but different values. Those give three different Distributions. While, the Distributions of Percentage Value, those three have exactly the same Distribution Values, measured in Percentage. The three graphs are exactly the same (see Fig.  1). The two Test Results are summarized in Table 4.  It can be concluded that, Comparing Distribution Pattern Similarity must be based on the Distribution of Percentage Value. The Distribution of Real Value can be used only and absolutely only when Sample Size is the same as those of Reference Distribution.

Experiment on Previous Sampling Case
The Previos Case, used to develop the Method for Determining Minimum Sample Size, was retested by using Distributions of Real Value. Then, the test results were compared, between by using Distribution of Percentage Value and by using Distribution of Real Value. Four Distribution Similarity Tests were taken for the 90%, 80%, 70%, and 60% Samples. For each Percentage sample, 3 different samples were taken . The experiment written above indicates that, for this case, using Distribution of Percentage Value give better result.
For this experiment, the Error Difference cannot be compared, since the two tests use different unity value. The previous calculation used percentage unity, while the new calculations are in number.
New calculation results based on Distribution of Real Value for the 90% Sample Case are presented in Table 5 as follows. It can be seen that, for all of three cases 90A, 90B, and 90C, all of χ 2 calculations give higher value compared to those of calculated based on Percentage Value. These indicate that calculation based on Distribution of Percentage Value give more accurate  2 value than those calculated based on Distribution of Real Value.

Distribution χ2 Calculation Difference
The calculation of Distribution Similarity Test for the 80% Sample Case is presented in Table 6 as follows. Again, the calculated χ 2 values for Real Value Distribution are worse then the calculated χ 2 for Percentage Value Distribution. Even, according to the Test, the Distribution of the three 80% Samples are similar to the Distribution of Reference.  The χ 2 Calculation for the 60% Sample can be seen in Table 8 as follows. The χ 2 calculation results for Real Value Distribution are all worse compare to the χ 2 value for Percentage Value Distribution.

52
The summary of those four  2 calculation results are presented in Table 9 as follows. It can be seen that the  2 based on Percentage Distribution are better than  2 based on Real Value Distribution. Similar to the first experiment, it can be concluded that comparing Distribution Similarity must be executed based on the Distribution of Percentage Value. This can not be donne based on Real Value Distribution.

CONCLUSION
As, the research have been finished, several principal conclusions are presented as follows.
 Comparing Distribution Similarity must be based on accepted Curve Pattern Similarity (CPS) test and accepted Average Difference Value (ADV) test.  The aceepted CPS test is done by using χ 2 Goodness of Fit test (with significant value of 1% -10%), and the accepted ADV test is done by using the accepted average absolute value (with accepted different value of 1% -10%).  In comparing Distribution Similarity, the Distribution of Percentage Value always give beter and correct answer, compare to the Distribution of Real Value. Thus, comparaison of Distribution Similarity must be based on the Percentage Value Distribution.  It is better to name the method as Distribution Similarity Test based on Accepted Curve Pattern Similarity and Absolute Value Difference (Accepted CPS+AVD).
After finishing this researchs a certain further curiousity arose, i.e. to test the Method for Diferent Real Cases.
NOTE. This paper is a part of Working Papers for developing the Knowledge and Science of Infrastructure & Facility Asset Management. This paper is a result of reflection collaboration among a Statistician and Civil Engineers from Institut Teknologi Sepuluh Nopember (ITS), Surabaya, Indonesia.