Exploiting Marine BD to Develop MLDB and Its Application to Ship Basic Planning Support

⎯ recently, the global marine logistics industry has changed significantly because of the global movement of goods. Where the amount of available data and attention paid to extensive data analysis in various topics exponentially grows, it is possible to obtain vast amounts of marine BD. However, the collection of BD groups is difficult to organize and frequently redundant. This is why the database can be so important. If these BD are effectively utilized, great innovation can be achieved in the marine industry. In this study, we develop a marine logistics database to ship basic planning support in the future. The database under study consists of BD sets, i.e. port, ship, route, international trade, and ship operation information from automatic identification system data. As a result, the relational database was developed. The effectiveness of the database is evaluated and extracted data from the database necessary for ship basic planning is discussed.


I. INTRODUCTION 1
Bi g Data (BD) is more than a word, and the business benefit of utilizing BD is widely understood. It is found that data-driven organizations perform 5% to 6% better per year [1]. BD is already playing a more significant role in shaping the maritime industry's future. By analyzing the data, the opportunity to drive improved efficiency and quality could be managed by the marine logistics players [2]. Moreover, it is widely known that BD can aid in improving forecasts, and BD can be adequate for forecasting demand and planning processes [3] [4].
A popular definition of BD is high volume, high velocity, and high variety information assets requiring new processing forms to enable enhanced decision making, insight discovery, and process optimization [5] [6]. The characteristics of BD are defined as the three Vs (volume, velocity, and variety), as shown in Figure 1. It can be further described as follows: • Data volume defines as the amount of data, and many factors can contribute to the volume increase in data. It could be the number of hundreds of tera/petabytes of information that is generated everywhere [7]. • Data velocity is defined as the generation of data that is rapid; acquiring, processing, and analyzing it requires fast mechanisms. The velocity emphasizes the real-time processing power of BD for enterprise needs [8]. • Data variety define as data types of BD, which includes structured and unstructured data such as text, audio, video, sensor data, posts, log files, and many more [8].
Significant potential and high value are hidden in the huge volumes of data widely used in various kinds of areas, including the maritime field. Because of the global movement of goods, global marine logistics has changed significantly. Hence, it is essential to develop ships that meet the specifications of the market needs. Simultaneously, marine logistics BD, e.g. port, ship, AIS data, etc., can be acquired more efficiently. If these data are effectively harnessed, great innovation might be obtained. Since BD are frequently redundant, to organize them, the database should be developed.
By considering the data available from marine BD, the objective of this study is to develop marine logistics database (MLDB) by exploiting marine BD and its application to ship basic planning support by extracting the data from MLDB. The bulk carriers which operate between Australia to Japan, Korea, and China are taken as an example. The effectiveness of the databases is discussed.

A. BD in the marine field
Many studies have applied BD in the marine field. The high accuracy of block component measurement method for construction applications has been developed by using the point cloud data of the 3D scanner [9]. Aoyama et al. [10] proposed new methods of extracting and utilizing monitoring data by introducing two different monitoring technologies and considering the reliability of each for advanced shipbuilding construction management. Perera et al. [11] analyzed large ship performance datasets to propose a model for evaluating ship performance under various seagoing conditions in the operations field.
Ando et al. [12] and Yoshida et al. [13] proposed a data collection platform called the Ship Information Management System. They utilized the data collected for many purposes (e.g., energy efficiency determinations, ship performance monitoring, and engine monitoring). MD Arifin et al. [14] [15] [16] develop a ship allocation model using marine logistics data that can forecast the demand for bulk carriers and examine the adequate principal particulars of ships for cargo transportation. Based on [12], BD application areas in the marine field are shown in Table 1. Limited studies have employed BD to improve ship construction, operation, and performance; few have examined the use of BD for basic ship planning [14] [16] [17]. Therefore, the aim of this study is to develop ship allocation for ship basic planning support by using marine BD.

B. The basic concept of MLDB
Marine logistics database is developed by integrating marine BD, i.e. operation information from AIS data, ship, port, route, and international trading information, into a relational database. The illustration of MLDB and its application for ship basic planning is shown in Figure  2. To extract valuable information from MLDB, the following requirements should be considered.
• The data structure of MLDB should be defined to make a relation between the data (operation, ships, port, route, and trade data) integrated into the relational database. • An error cleaning is required to remove some errors to ensure the quality of the data. • The cargo volume should be estimated to confirm the effectiveness of the database in this study

C. The basic concept of MLDB
The MLDB is developed by considering the following data: • Automatic Identification System (AIS): indicated speed, indicated draft, ship position, timing arrival and departure dates, and arrival and departure port from MINT [18]. • Port data that is collected from Sea-web Port, e.g., port name, longitude, latitude, port dimension, and cargo handling [19]. • Ship data that is collected from Sea-web Ship, e.g., ship name, DWT, IMO number, ship classification, principal dimension, operator, shipbuilder, ship status, and build year collected [20]. • Route data that is collected from Sea-web Port and IHS-Fairplay, e.g., departure, arrival, route, and route distances between the port [19] [21]. • Trade data that were collected from UN Comtrade, e.g., commodity trade data, trade periods, trade value, commodity code, trade quantity, reporter, and partner [22].

D. The basic concept of MLDB
The structure of MLDB in this study is shown in Figure 3. As shown in the figure below, all input data are organized and connected as a relational database. The relational concept faces challenges in handling BD and providing horizontal scalability, availability, and performance required by BD applications.
In this study, the effort of integration and relation among data ensures that valuable knowledge from vast amounts of BD can be found and easy to extract. For example, by integrating ships data and routes data and utilizing the extracted data from it, e.g. by considering speed and distances, the actual speed of ships can be identified. Moreover, by integrating ships and port data with operation data, some information related to the basic information of ships' operation state can be analyzed.
A relational database effectively organizes data in tables (or relations). The relationships created with the tables enable a relational database to efficiently store vast amounts of data and effectively retrieve selected data. The relational database design process is described in the following steps: • Step 1. Define the data structure as shown in

E. Error cleaning
In a high-density shipping area where thousands of ships may transmit AIS messages, it is a challenge for the AIS system to collect, process, and download all the messages efficiently. It results that many messages being lost, and sometimes error data collection occurs. Below are the samples of errors from the Automatic Identification System, as follows: • The draft value (d) is zero (0).
• Null information or blank space. Therefore, to ensure the quality of the data, the duplicate and NULL data should be eliminated, and the draft data should be evaluated.

F. Generating cargo information
Cargo information on an operating ship is essential for forecasting demand and understanding ship usage. However, that information is unavailable in AIS data. Therefore, the cargo type and volume of each operation are estimated. The cargo type is selected from 3 types: iron ore, coal, grain & others.
• Checking the data reliability Confirmation of data's reliability is required for a good cargo volume estimation. In our study, data reliability was evaluated by using Equation (1). Where di is defined as draft rate, dsail(i) (m) is defined as the sailing draft, and dmax(i) (m) is defined as the maximum draft of the ship.
• Checking the cargo type by using the information of the port The cargo of each operation was estimated by analyzing the cargo type from port data. As shown in Table 2, the estimation of cargo type was identified by checking the combination of arrival and departure ports cargo. For example, in operation from departure Port A to arrival Port D, the only common cargo is identified as coal. Therefore, the cargo type was estimated as coal. However, in the case of operation from departure Port B to arrival Port D, common cargos were identified as iron ore and coal.
In such a case, the cargo type was defined as multicargo and should be estimated by using the ship size • Checking the cargo type by using ship size information If the two or more common cargo types exist in port data, the cargo types were estimated using ship size. The distribution of ship size from Australia to Japan, Korea, and China for fixed cargo is shown in Figure 4 to Figure 6. Since the ship size and cargo type are closely related, the remaining operation could be estimated.

• Cargo Volume Estimation
The cargo volume was estimated by considering the maximum draft and deadweight of the ship and sailing draft that is extracted from AIS data. The cargo volume was estimated by using Equation (2). Where Vi (ton) is defined as cargo volume, DWTi is defined as deadweight, and di is the draft rate.

A. Evaluation of the cargo estimation
To verify the cargo estimation result in Section 2.6.4, we evaluate the estimation results by comparing the results with actual trade value from UN Comtrade data, using bulk carriers operating from Australia to Japan, Korea, and China in 2014. The estimation result covered around 93% from Australia-Japan, 93% from Australia-Korea, and 88% from Australia-China, as shown in

B. Extracted data for ship basic planning
The relational database structure in MLDB allows a user to get some valuable knowledge. The operation information and other information, e.g. LOA (m), DWT (ton), design speed (knot), B (m) d (m), could be easily extracted. By harnessing the extracted data from MLDB, the characteristics of the bulk carrier (Australia-Japan) and (Brazil-Japan) from 2014/01/01 ~ 2014/12/31 could be identified as shown in Table 3. Moreover, the critical information for predicting the ship's demand in the future can be obtained by using the ship allocation model. Ship basic planning support using the ship allocation model will be developed for future applications. A ship allocation model is developed by using the information from MLDB. The ship allocation model predicts the ship allocation when the user inputs the trade volume, economic situation, etc. Therefore, when the user inputs the future scenario, e.g., the state of the world economy, fuel price fluctuation, canal and port expansion, the new ship allocation will be generated so that the effective ship specification can be simulated and estimated. The future application architecture by exploiting the extracted data from MLDB is highlighted in Figure 10.

IV. CONCLUSION
In this study, MLDB using marine logistics BD is developed. Integrating BD and error cleaning, which is essential for MLDB, is executed. The estimation methods of cargo type and volume are proposed, and the valuable information that is important for ship basic planning support was extracted. The evaluation of the proposed methods is confirmed by comparing the estimation result with the international trade data from UN Comtrade. The critical data for basic planning support is extracted from the MLDB and discussed. The architecture of the future application by extracting data from MLDB is illustrated.