Comparative Study the Effect of Similarity Measures on K-Means Algorithm in Clustering Arabic Texts based on Keywords

Suhad muhajer kareem

Authors

Suhad muhajer kareem Basra University \ Science Collage \ Computer Science Dept.

Keywords:

Text mining , Arabic text clustering , K-means, Euclidian similarity , Cosine similarity

Abstract

Texts clustering is one of important and effective tasks in texts mining, it aims to divide a large sets of texts into subsets called clusters, these clusters contain objects have high similar among themselves but are dissimilar to objects in the other clusters. In this work, we proposed method is used to cluster Arabic texts using one of the famous techniques called K-Means algorithm. The proposed method include analysis of text as a primary step to prepare it to clustering algorithm which applied to 100 Arabic texts in four different groups included (sport, art , crime , health). Our method developed by using database of keywords for each field to select cluster centers rather than selected it randomly , then two similarity measures(Euclidian similarity, Cosine similarity) are used to calculate the distances between the centers and the texts for building clusters. In addition , we evaluate the impact of the two similarity (Euclidian similarity, Cosine similarity) on the results of k-means by using F-Measures and the results were as a compared between Euclidian similarity and cosine similarity based on the number of factors such as number of clusters and number of groups. Finally, we found that the performance of k-means algorithm using cosine similarity work better than k-means algorithm using Euclidian similarity.

Downloads

Download data is not yet available.

Comparative Study the Effect of Similarity Measures on K-Means Algorithm in Clustering Arabic Texts based on Keywords

Authors

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

indexed

Make a Submission

Information

Developed By

journaldetails

details

Journal Details

Journal Policy

Aims and Scope

About Paper Review

Review Process

Abstracting and Indexing

Feedback

guidelines

Guidelines for Authors

Instruction for Authors

Copyright Agreement

DECLARATION FORM

Example of Published Paper

Licenses and Copyright

Publishing Fees:

Current Issue

Journal of Al-Qadisiyah for computer science and mathematics (JQCSM)

ISSN 2521-3504 (Online), ISSN 2074-0204 (Print)

It is scientific journal issued by College of computer Science and IT / University of Al-Qadisiyah