All Events

Albanian corpus dataset analysis using Apache Hadoop

Fëllanza Avdimetaj, University for Business and Technology - UBT

Session

Computer Science and Communication Engineering

Description

Nowadays, we are dealing with a very large number of data that are generated from different fields such as medical, economic, socials, etc. Data analysis is one of the most important branches today. There are many companies that offers their services to store this voluminous data such as: Prolifics, Clairvoyant, IBM, HP Enterprise, Teradata, Oracle, SAP, EMC, Amazon, Microsoft, Google, VMware, Splunk, Alteryx[1], The growth of these data is continuing exponentially and as such has made it impossible to handle them by a traditional database system since it exceeds its capacity. When we are talking about a large number of data also knows as Big Data, we are dealing with an increase from Gigabytes, TeraBytes, Peta Bytes, Zeta Bytes, and so on. Processing of data can incorporate multiple operations depending on usage like collecting, classifying, indexing, exploring, gather results, etc. The main problem has to do with the fact that no machine alone or a few machines can process such a large amount of data for a finite period of time. This paper presents an experimental work on big data problems using the Apaches Hadoop approach as a solution. The objective is to work with Hadoop with a glance focus on the MapReduce algorithm and analysis of a data set(Albanian Text Corpus) that will be created particularly for this case. The results gathered from this paper and several analyses show positive outcomes of the above approach to address such big data problems.

Keywords:

Big Data, Hadoop Technology, Hadoop Distributed File System (HDFS), MapReduce, WordCount.

Session Chair

Bertan Karahoda

Session Co-Chair

Besnik Qehaja

Proceedings Editor

Edmond Hajrizi

ISBN

978-9951-437-96-7

Location

Lipjan, Kosovo

Start Date

31-10-2020 10:45 AM

End Date

31-10-2020 12:30 PM

DOI

10.33107/ubt-ic.2020.526

Recommended Citation

Avdimetaj, Fëllanza, "Albanian corpus dataset analysis using Apache Hadoop" (2020). UBT International Conference. 331.
https://knowledgecenter.ubt-uni.net/conference/2020/all_events/331

This document is currently not available here.

COinS

Oct 31st, 10:45 AM Oct 31st, 12:30 PM

Albanian corpus dataset analysis using Apache Hadoop

Lipjan, Kosovo

All Events

Albanian corpus dataset analysis using Apache Hadoop

Session

Description

Keywords:

Session Chair

Session Co-Chair

Proceedings Editor

ISBN

Location

Start Date

End Date

DOI

Recommended Citation

Browse

Search

Author Corner

Links

Connect with UBT

All Events

Albanian corpus dataset analysis using Apache Hadoop

Presenter Information

Session

Description

Keywords:

Session Chair

Session Co-Chair

Proceedings Editor

ISBN

Location

Start Date

End Date

DOI

Recommended Citation

Share

Browse

Search

Author Corner

Links

Connect with UBT