Comparative Study the Effect of Similarity Measures on K-Means Algorithm in Clustering Arabic Texts based on Keywords

Authors

  • Suhad muhajer kareem Basra University \ Science Collage \ Computer Science Dept.

Keywords:

Text mining , Arabic text clustering , K-means, Euclidian similarity , Cosine similarity

Abstract

       Texts clustering is one of important and effective tasks in texts mining, it aims to divide a large sets of texts into subsets called clusters, these clusters contain objects have high similar among themselves but are dissimilar to objects in the other clusters. In this work, we proposed method is used to cluster Arabic texts using one of the famous techniques called K-Means algorithm. The proposed method include analysis of text as a primary step to prepare it to clustering algorithm which applied to 100 Arabic texts in four different groups included (sport, art , crime , health). Our method developed by using database of keywords for each field to select cluster centers rather than selected it randomly , then two similarity measures(Euclidian similarity, Cosine similarity) are used to calculate the distances between the centers and the texts for building clusters. In addition , we evaluate the impact of the two similarity (Euclidian similarity, Cosine similarity) on the results of k-means by using F-Measures and the results were as a compared between Euclidian similarity and cosine similarity based on the number of factors such as number of clusters and number of groups. Finally, we found that the performance of k-means algorithm  using cosine similarity work better than k-means algorithm  using Euclidian similarity.

Downloads

Download data is not yet available.

Downloads

Published

2017-08-10

How to Cite

muhajer kareem, S. (2017). Comparative Study the Effect of Similarity Measures on K-Means Algorithm in Clustering Arabic Texts based on Keywords. Journal of Al-Qadisiyah for Computer Science and Mathematics, 7(1), 11–24. Retrieved from https://jqcsm.qu.edu.iq/index.php/journalcm/article/view/96

Issue

Section

Math Articles