Session

Civil Engineering, Infrastructure and Environment

Description

In this paper K-means clustering algorithm is applied in order to classify customers into several groups showing the similarity within a group is better than among groups. After determining the relevant client’s attributes in a SQL Server database, PCA (Principal Component Analysis) is applied in order to reduce the number of features, and after that, K-means algorithm is performed in MATLAB programming environment, using fixed number of clusters. Each centroid defines one of the clusters, while each data point is assigned to the nearest centroid, based on the squared Euclidean distance. In this research, centroids are randomly generated, while the separation distance between the resulting clusters is analyzed and illustrated using the Silhouette index. The analysis and results presented in this paper could determine a similarity in purchasing or using the services by a population cluster in one luxury goods company, to develop market segments, to identify repetitive behavior or trends in aiming to provide a full assessment of actions and to create some new customer loyalty campaigns.

Keywords:

cluster analysis, K-means, Principal Component Analysis (PCA), Silhouette index

Session Chair

Feti Selmani

Session Co-Chair

Anjeza Alaj

Proceedings Editor

Edmond Hajrizi

ISBN

978-9951-550-19-2

First Page

86

Last Page

93

Location

Pristina, Kosovo

Start Date

26-10-2019 1:30 PM

End Date

26-10-2019 3:00 PM

DOI

10.33107/ubt-ic.2019.189

Share

COinS
 
Oct 26th, 1:30 PM Oct 26th, 3:00 PM

An Application of PCA Based K-Means Clustering for Customer Segmentation in One Luxury Goods Company

Pristina, Kosovo

In this paper K-means clustering algorithm is applied in order to classify customers into several groups showing the similarity within a group is better than among groups. After determining the relevant client’s attributes in a SQL Server database, PCA (Principal Component Analysis) is applied in order to reduce the number of features, and after that, K-means algorithm is performed in MATLAB programming environment, using fixed number of clusters. Each centroid defines one of the clusters, while each data point is assigned to the nearest centroid, based on the squared Euclidean distance. In this research, centroids are randomly generated, while the separation distance between the resulting clusters is analyzed and illustrated using the Silhouette index. The analysis and results presented in this paper could determine a similarity in purchasing or using the services by a population cluster in one luxury goods company, to develop market segments, to identify repetitive behavior or trends in aiming to provide a full assessment of actions and to create some new customer loyalty campaigns.