Comparison of Prediction Models for Delays of Trains by using Data Mining and Machine Learning Methods

Session

Computer Science and Communication Engineering

Description

On the one hand, having a tight schedule is desirable and very efficient for freight transport companies. On the other hand, a tight schedule increases the impact of delays and cancellation. Furthermore, the prediction of delays is extremely complex, because they depend on many factors of influence. To address these issues, this work will show an approach to forecast delays of trains by using data mining and machine learning methods. For this purpose, an international freight transport company in rail traffic, provided us with a huge amount of historical data of freight and passenger train runs. In order to get a suitable prediction model, we apply a knowledge discovery in databases (KDD) process, which contains the steps data selection, data preprocessing, data transformation, data mining and interpretation/evaluation. After the data selection and data preprocessing step we transform categorical features via one-hot encoder as well as via embedding with various embedding sizes. Furthermore, we present a transformation method for cyclical continuous features like weekday. In the actual data mining process, we use the prepared historical data to perform a regression analysis, which forecast the delays of trains, and compare several regression models like decision tree, random forest, extra trees and gradient boosting regression. An adequate prediction model will be integrated into an agent-based model, which tests the robustness of train networks.

Keywords:

KDD, data mining, machine learning, prediction models

Session Chair

Bertan Karahoda

Session Co-Chair

Krenare Pireva

Proceedings Editor

Edmond Hajrizi

ISBN

978-9951-437-69-1

Location

Pristina, Kosovo

Start Date

27-10-2018 10:45 AM

End Date

27-10-2018 12:15 PM

DOI

10.33107/ubt-ic.2018.87

This document is currently not available here.

Share

COinS
 
Oct 27th, 10:45 AM Oct 27th, 12:15 PM

Comparison of Prediction Models for Delays of Trains by using Data Mining and Machine Learning Methods

Pristina, Kosovo

On the one hand, having a tight schedule is desirable and very efficient for freight transport companies. On the other hand, a tight schedule increases the impact of delays and cancellation. Furthermore, the prediction of delays is extremely complex, because they depend on many factors of influence. To address these issues, this work will show an approach to forecast delays of trains by using data mining and machine learning methods. For this purpose, an international freight transport company in rail traffic, provided us with a huge amount of historical data of freight and passenger train runs. In order to get a suitable prediction model, we apply a knowledge discovery in databases (KDD) process, which contains the steps data selection, data preprocessing, data transformation, data mining and interpretation/evaluation. After the data selection and data preprocessing step we transform categorical features via one-hot encoder as well as via embedding with various embedding sizes. Furthermore, we present a transformation method for cyclical continuous features like weekday. In the actual data mining process, we use the prepared historical data to perform a regression analysis, which forecast the delays of trains, and compare several regression models like decision tree, random forest, extra trees and gradient boosting regression. An adequate prediction model will be integrated into an agent-based model, which tests the robustness of train networks.