Reflection on Data Error Identification Methods for Field Survey Data on Commuter Train Passenger Travel Behavior

Good understanding of Demand Behavior is important for Infrastructure and Facility Planning. Therefore, Field Survey for Travel Behavior Characterisitcs on Commuter Train Passenger is very important. The data collected and processed must be correct data. Meanwhile errors could easily happen in data collection and tabulation. How is the method to identify the data error. Experiment indicates several following methods : using spread-sheet software is strongly suggested for easiness to develop the whole process, establish numerical data tabulation for data error identification easiness, etablish code for data back-tracked, develop field survey data table, develop data error possibilities tabel, develop a error logical tests, develop spreadsheet logical test function, do error identification calculation.


INTRODUCTION
One important part of Infrastructure & Facility Asset Management is a good understanding of the Infrastructure & Facility Function and its Demand . For Commuter Train as Transportation Facility, its Main Function is to flow the Passenger from station A to station B, with enough capacity, fluently, safely, comfortly, affordably. Therefore, Commuter Train Passenger's Travel Behavior Characteristics is important to be well understood.
A good understanding of Travel Behavior Characteristics are very useful for : defining the train station's influence area, defining the acess and egress mode and their distances, defining the parking capacity needed, defining the public transport feeder needed, defining facilities at station, and other needs (Pratiwi & Suprayitno 2016;Suprayitno & Upa 2018;Susanti, Soemitro & Suprayitno 2017a;.
The Travel Behavior Characteristics are gotten from Field Survey. Correct and accurate Travel Behavior Picture can only be gotten from complete, correct, and enough number of sample. Meanwhile, personal experiences shows that Data Errors can easily occures. Therefore, "How the Data Error can be identified and corrected ?", is an important question to be addressed in Data Collecting and Processing.
Travel Behavior Surveys in Australia, especially in Melbourne were done by using face-to-face interview or self completion drop-off questionnaire. Data collection by telephone, by internet and by GPS done in Toronto, Chicago, Germany were observed and evaluated. Among those three techniques, there is no perfect one, there advantages and disadvantages with regard to representativeness, response rates, data accuracy and costs. Sampling Method generally used in Melbourne can be considered as well enough, but response rates can potentially be improved by using mix of methods between internet and telephone interview for data collection to reach different demographic groupes (Inbakaran & Kroen 2011). Meanwhile, in Indonesia direct face-to-face interview is still the most used, and it seems still the most appropriate in Indonesia.
Research on Data Error Identification has been developed on various fields. Among others, it can be found concerning Language Grammatical Error, Earth Science Model, Laboratory Data Capture Apparatus, Distributor Pattern in Bussiness, and Ergonomics Science. Even the Theory, Method and Validation, on Error Identification, has been developed also (Baber & Stanton 2002;Baker 2017;Cisco 2013;Kohmar 2016;Wang et al 2002;Schmaltz et al 2017).
This paper present a Reflection on Field Survey Data Errors Identification Procedures on Commuter Train Passenger Travel Behavior.

RESEARCH METHOD
The research was executed by following these steps : statement of the backgound, defining the objective, executed related reference review, method develpment, and finnaly ended by a conclusion.
The method was developed with assumption that the Field Survey Data were tabulated on an Spreadsheet File, which the most used for micro computer is the Microsoft Execell.
Afterward, the method was developed following these steps : field survey procedure, data quality concept, data collected, probability of data correctness error, data correctness checking method, data error identification method.

LITERATURE REVIEW
Two basic literature review on Travel Behaviour and Data Error Identification are presented below.

Travel Behavior Research on Urban Public Mass Transport Passenger
Travel Behavior survey on Commuter Train Passenger and Urban Bus Passenger in several different train-lines and bus-lines are already executed. Several of them are mentioned below (Silaen, Nasution & Suwantoro 2018;Suprayitno et al 2006;Suprayitno & Upa 2016;Upa, Suprayitno & Ryansyah 2018;Suprayitno, Saraswati & Ratnawati 2018;Susanti, Soemitro & Suprayitno 2017a;Susanti, Soemitro & Suprayitno 2018a  13 Data Error Identification Data Error can happen in all aspects of life for all phenomena. Several of them are, for example : error on language grammar, on laboratory measurement apparatus, on bussiness, on modeling, on ergonomics science, on product design, etc (Baber & Stanton 2002;Baker 2017;Cisco 2013;Kohmar 2016;Wang et al 2002;Schmaltz et al 2017). As illustrations, certain of them is presented below.
Laboratory Apparatus Measurement gross error identification method has been developed by using a theory which is called Grey System Theory (Wang et al 2002). In Automated Evaluation of Scientific Writting (AESW), the method commonly used for grammar error identification is the attention-based encoder-decoder model. This method can be used for correction generation instead only error identification. A new method was developed, character-based encoder-decoder, which is proofed to be better for AESW (Schmaltz et al 2017). The Product Design field has developed Task Analysis for Error Identification (TAEI). It based on communication of user and products. This can represents a form of problem solving. Eeach state of dialogue offer the user potential of action. Therefore analysing action can be used for design or ergonomic error identification (Baber & Stanton 2002).
It can be noted easily, that in all cases presented above, the error identification task is different from the correction generation task. Correction must be based on Error Identification result.

Method Development Step
The Method development was done by following these steps : formulating Field Survey Procedure, formulating Data Quality Concept, thinking Example of Typical Data Collected, formulating Data Error Probability, formulating Error Ientification Procedure, and ended by a Method Trial.

Field Survey Procedure
The Field Survey Procedure generaly folow the following steps. It is strarted by survey design, survey execution, data tabulation, data correction procedure, and finished by data processing. These steps are presented in Figure 1 below.

Field Survey Data Quality Concept
Field Survey is executed to collect primary data on certain characterisitcs of an object. The data collected must be able to well picture the characteristics in question. Therefore, the data collected has certain quality paremeter as follows :  Data Completeness : Every data needed, must be succesfully collected. This problem is related on Survey Questionaire Design and Survey Execution  Data Correctness : All data tabulated and proceed on spread sheet must be the correct data. Suppose, the Survey Questionaire is correct, this could be be a problem on Field Data Collection and Data Tabulation. Data Error Identification procedure must be established.  Sample Size : The whole data must be able to accurately picture the Surveyed Characteristics. Even if all of the Tabulated Data are all correct, the accuracy still cannot be guaranted, unless enough sample size can be collected. Too small sample with correct data can produce different characteristics from the reality.
Accuracy Quality depend absolutely on Data Completeness, Data Correctness, and Sample Size. This paper discuss only the Error Identification Method..

Structure of Data Table or Basis Data
In general Data Table or Basis Data has a structure as explained afterward. Data of a Respondent is written in one line on the Data Table. Thus 100 Respondents will produce 100 lines of Data Table. Each Repondent characteritics is written in each defined column. The columns are always started by a a column indicating ID Number, follow by columns to fill the Repondents Characteristics. In Data Base System, technically, each column is called Field and each line is called Record (Schurmann 2006).

Example of Typical Collected Data
As an example, a Typical Data collected, on Commuter Train Passenger Travel Behavior, are presented below. Numerical data, such as : age, travel distance, travel time, vehicle posession, and others should be collected as numerical data. Other data should be posed in questionaire as a multiple choice data. Example of Typical Data Collected is presented in Figure 2 below.

Field Survey Data Error Correction Procedure
Field Survey procedure is presented in Figure 1 above. There is a step which is called Correction Procedure. Data Error Identification is part of Data Error Corection Procedure. Therefore to develop Error Identification Procedure must be based on Data Error Correction Procedure.
The Data Correction Procedure has as input Raw Data In this Check there are three possibilities. First, the Survey Form Data is actually correct, only the inputing data which is wrong, then the Record must be corrected on Raw Data Correction step, the processus is continued by having Raw Data on Spreadsheet and then Check wether the wether each Data on Spreadsheet is correct or not. Second, the Survey Form Data is not correct, but in someway can be corrected, then the procedure is continued by Survey Form Correction, which is followed by Raw Data Correction. Third, the Survey Form Data is totaly in-corrigable error, the the Record must be dump or the Record must be deleted. The Data Correction Procedure is presented in Figure 3 below.

Data Error Identification Method
All of Possible Error must be able to be identified. How is the method to identify such error. All data must be tabulated in a spreadsheet software by using numeric code as much as possible to ease the Data Error Identification program.
The Data Error Identification procedure was developed to follow the following steps. After establishing a Field Survey Data Table, the first step is to develop Data Logical Error, followed by developing Logical Test Rule. Now, based on Logical Test Rule, Spreadsheet Logical Test Function can be written, and ended by executing Error Identification Calculation. Example of the whole Error Identification Process is presented in sub-chapter Method Trial below. The steps are presented in Figure 3 below.

Method Trial
The Method Trial The Method Trial was executed by following these steps : experiment case (data code and data collected), data logical error, logical test rule, spread sheet function, error identification calculation (identification of double counting error, identification of trip maker characteristics error, identification of trip error, identification of trip maker ~ trip correlation error).

Experiment Case
Virtual Experiment Case was established and taken to do the Method Trial. The ase is about surveying Passenger Travel Behavior Data travelling in a Commutter Train with 6 stations, seriving station 1 to station 6. The Case is very simplified. The Trip Maker Data is limited only for name, age and accomplished education. The Trip Data are limited only on the Access and Egress Trip, with each denoting the zone, station, distance and mode.
All Data are presented in Data Code except for name. The Data Code are presented in Table 4 below, while the Field Survey Data are presented in Table 5 below.

Data Logical Error and Logical Test Rule
After the Field Survey Data are tabulated, for preparing the Error Identification Calculation, two steps has to be executed : developing the Data Logical Error and the Logical Test Rule. These two are presented below.

Data Logical Error
Data Logical Error step is to Identify Different Logical Error Existence in related data value. One of them, for example, is double counting error : a certain passenger is counted more than once. Another example is the Train heading to the north but the trip heading to the south.
In general the Data Logical Error can be classified into : double counting, trip maker characteristics data error, trip characteristics data error and data correlation between trip maker and trip characteristics data. An example of Data Logical Error is presented in Table 5 below.

Logical Test Rule
The Logical Error tabulated above has to be formulated in Logical Test Rule to be able to be programmed. One Logical Error for each Error Group is taken, forwhich the Logical Test Rule is formulated. The Logical Test Rule is presented in Table 6 below.
The checking procedure found that there is a Double Counting Error for Respondent no 1 and Respondent no 7. The error lies on the fact that the data recorded for these two individuals are all exactly the same. A probability of double counting for these 2 records is very strong. It must be checked wether a double counting has been done or not. The Double Counting Data Error calculation is presented in Table 6 below.  =IF(AND(age=0;edu<12);"ok";IF(AND(age=1;edu>13);"ok";IF(AND(age=2;edu>15); "ok";IF(AND(age=3;edu>18);"ok";IF(AND(age=4;edu>23);"ok";"X"))))) =IF(AND(E5=0;D5<12);"ok";IF(AND(E5=1;D5>13);"ok";IF(AND(E5=2;D5>15);"ok ";IF(AND(E5=3;D5>18);"ok";IF(AND(E5=4;D5>23);"ok";"X"))))) The Data Error Identification calculation found that there is a certain error for Respondent no 2. The error lies in the fact that the Respondent no 2 was recorded having an age of 20 years, meanwhile he finished already his higher education. The Error Identification calculation is presented in Table 7 below. =IF(OR(G5=K5;G5>K5);"X";"ok") The Error Identifation found that there is a certain error for Respondent no 6. The error lies in the fact that the Respondent no 6 was recorded boarding and alighting on the same station. The Error Identification Calculation is presented in Table 8 below. =IF(AND(mode=2;age<18);"X";if(AND(mode=3;age<20);"X";"ok") =IF(AND(I5=2;D5<18);"X";IF(AND(I5=3;D5<20);"X";"ok")) The Error Identification found that there is a certain error for respondent no 6. The error lies in the fact that a child of 5 years old riding a motorcycle to to the origin station. The Error Identifcation Calculation is presented in Table 9 below. Further researches are still needs to be done, among others are on : trying the developed method on real case, conducting experiment on minimum sample size, conducting experiment on minimum sample size calculation for various proportion number and proportion cases, developing the whole data correction method.