Comparison of Prediction Models for Delays of Trains by using Data Mining and Machine Learning Methods
Session
Computer Science and Communication Engineering
Description
On the one hand, having a tight schedule is desirable and very efficient for freight transport companies. On the other hand, a tight schedule increases the impact of delays and cancellation. Furthermore, the prediction of delays is extremely complex, because they depend on many factors of influence. To address these issues, this work will show an approach to forecast delays of trains by using data mining and machine learning methods. For this purpose, an international freight transport company in rail traffic, provided us with a huge amount of historical data of freight and passenger train runs. In order to get a suitable prediction model, we apply a knowledge discovery in databases (KDD) process, which contains the steps data selection, data preprocessing, data transformation, data mining and interpretation/evaluation. After the data selection and data preprocessing step we transform categorical features via one-hot encoder as well as via embedding with various embedding sizes. Furthermore, we present a transformation method for cyclical continuous features like weekday. In the actual data mining process, we use the prepared historical data to perform a regression analysis, which forecast the delays of trains, and compare several regression models like decision tree, random forest, extra trees and gradient boosting regression. An adequate prediction model will be integrated into an agent-based model, which tests the robustness of train networks.
Keywords:
KDD, data mining, machine learning, prediction models
Session Chair
Bertan Karahoda
Session Co-Chair
Krenare Pireva
Proceedings Editor
Edmond Hajrizi
ISBN
978-9951-437-69-1
Location
Pristina, Kosovo
Start Date
27-10-2018 10:45 AM
End Date
27-10-2018 12:15 PM
DOI
10.33107/ubt-ic.2018.87
Recommended Citation
Leser, Dennis; Wastian, Matthias; Rößler, Matthias; and Landsied, Michael, "Comparison of Prediction Models for Delays of Trains by using Data Mining and Machine Learning Methods" (2018). UBT International Conference. 87.
https://knowledgecenter.ubt-uni.net/conference/2018/all-events/87
Comparison of Prediction Models for Delays of Trains by using Data Mining and Machine Learning Methods
Pristina, Kosovo
On the one hand, having a tight schedule is desirable and very efficient for freight transport companies. On the other hand, a tight schedule increases the impact of delays and cancellation. Furthermore, the prediction of delays is extremely complex, because they depend on many factors of influence. To address these issues, this work will show an approach to forecast delays of trains by using data mining and machine learning methods. For this purpose, an international freight transport company in rail traffic, provided us with a huge amount of historical data of freight and passenger train runs. In order to get a suitable prediction model, we apply a knowledge discovery in databases (KDD) process, which contains the steps data selection, data preprocessing, data transformation, data mining and interpretation/evaluation. After the data selection and data preprocessing step we transform categorical features via one-hot encoder as well as via embedding with various embedding sizes. Furthermore, we present a transformation method for cyclical continuous features like weekday. In the actual data mining process, we use the prepared historical data to perform a regression analysis, which forecast the delays of trains, and compare several regression models like decision tree, random forest, extra trees and gradient boosting regression. An adequate prediction model will be integrated into an agent-based model, which tests the robustness of train networks.