Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

Number of pages: 103 File Format: word File Code: 32136
Year: 2012 University Degree: Master's degree Category: Industrial Engineering
  • Part of the Content
  • Contents & Resources
  • Summary of Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

    Master's Thesis in the field of Automation and Instrumentation Engineering

    Abstract

    Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

    Clustering is putting data into groups where the members of each group are similar from a certain angle.  The similarity between the data within each cluster is maximum and the similarity between data within different clusters is minimum.

    Fuzzy c-means is also a fuzzy clustering technique which, despite being sensitive to initialization and convergence to local optimal points, is one of the most common methods due to its efficiency and easy implementation. In this thesis, to solve the existing problems, the combined method based on the bat algorithm and Fuzzy c-means will be used. In order to validate, the proposed method will be implemented on several well-known different data and the results will be compared with forbidden search algorithm, ants, particle community, steel plating and k-means methods. The high ability and robustness of this method will be evident based on the results.

    Introduction

    The community of particles and patterns are one of the most important indicators in the world of information, and clustering is one of the best methods that have been provided to work with data. Its ability to enter the data space and recognize their structure has made clustering one of the most ideal mechanisms for working with the huge world of data.

    In clustering, samples are divided into categories that are not known in advance. Therefore, clustering is a learning method that independently categorizes data without prior knowledge and observing pre-defined samples.

    Clustering is actually finding structure in unclassified data sets. In other words, clustering is putting data into groups where the members of each group are similar from a certain angle. As a result, the similarity between data within each cluster is maximum and the similarity between data within different clusters is minimum. The criterion of similarity here is distance, which means that samples that are closer to each other are placed in a cluster. Therefore, calculating the distance between two data is very important in clustering; Because the quality of the final results will change.

    Distance, which is the representative of inhomogeneity, enables movement in the data space and creates clusters. By calculating the distance between two data, you can understand how close these two data are to each other and whether they are in the same cluster or not? There are various mathematical functions to calculate the distance; Euclidean distance, Hamming distance and .

    1-1- Statement of the problem

    Clustering is finding structure in sets of unlabeled data and it can be considered as the most important problem in unsupervised learning. The idea of ??clustering was first proposed in the decade of 1935, and today it has been present in various applications and aspects with the huge advances and leaps that have occurred in it. A simple search on the web or even in a library database reveals its amazing utility.  Clustering algorithms are used in various fields, the following can be listed as examples:

    Data mining[1]: discovering new information and structure from existing data

    Speech recognition[2]: in building a code book from feature vectors, in dividing speech according to its speakers or speech compression

    image classification [3]: classification of medical or satellite images

    Web (WWW): classification of documents or classification of sites and .

    Biology[4]: Classification of animals and plants based on their characteristics

    Urban planning[5]: Classification of houses based on their type and geographical location

    Seismography studies[6]: Identifying accident-prone areas based on previous observations

    Library: classification of books

    Insurance: detection of fraudulent people

    Marketing [7]: classification of customers into categories according to their needs through the collection of their latest purchases.

    Due to the increasing use of clustering, today we are witnessing the presentation of new and more efficient methods, each for a specific application. can be But with all these efforts, clustering has not been used as much as it should be in many sciences and there is a lot of expansion potential for it.

    1-2- Research background

    We live in a world full of data and we are faced with a large amount of storing or displaying information every day. One of the vital methods of controlling and managing these data is clustering. In this method, data that have similar properties are placed in a category or a cluster. For the first time, the idea of ??clustering was presented in the 1935s, and today it has attracted the attention of many researchers with the huge advances and leaps that have occurred in it. Therefore, it has been present in various applications and aspects, and various methods have been proposed for its exploitation [1]. In one sense, clustering algorithms can be divided into two general categories: hard clustering and fuzzy clustering. In hard clustering, a data belongs to one and only one cluster, while in fuzzy clustering, a data may belong to two or more clusters at the same time [2], [3], [4]. Fuzzy c-means algorithm is one of the famous fuzzy clustering methods that can be easily implemented. Unfortunately, its original version has limitations such as dependence on initial values ??and convergence to the local optimal response [5], [6]. In the genetic algorithm, these limitations have disappeared. At the same time, by combining these two algorithms, significant results have been obtained, and the speed of convergence has also increased far more than the previous examples [7]. By combining two genetic algorithms and PSO, Kao and his colleagues invented a method in which he used mutation and crossover operators for genetics. This method was able to solve various problems of continuous functions. Also, significant changes have been achieved in finding the global optimal solution and the convergence ratio [8]. Using the combination of genetic algorithm and fuzzy method, a method was suggested by Asgarian in 2016. In this method, the problem of dependence on the initial number of clusters and the initial location of their centers is high and with the inability to cluster data whose distance from the centers of several clusters is the same; It was countered. Another advantage of this combination is reducing the complexity of calculations [9]. Another combination method that is used in data mining problems is the use of the combination of Fuzzy c-means and PSO, which was able to improve the problem of convergence to local optimality and the speed of convergence [10], [11]. Another new combined method is the combination of FCM algorithm and fuzzy memetic algorithm. In order to improve clustering performance, the results of this technique show that it has better answers and its stability is also higher [12]. The combination of FCM and SA is another example of hybrid methods used in cancer diagnosis [13], [14], [15], [16]. style="direction: rtl;"> 

    BY

    Esmat Barzegar

     

    Clustering is a division of data into groups of similar objects. Each cluster consists of objects that are similar between themselves and dissimilar to objects of other groups. Fuzzy c-means (FCM) algorithm is one of the most popular fuzzy clustering techniques because it is efficient, straightforward and easy to implement. However, FCM is sensitive to initialization and is easily trapped in local optima.

    In this thesis, a hybrid fuzzy clustering method based on FCM and Bat algorithm is proposed which makes use of the merits of both algorithms.

  • Contents & References of Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

    List:

    1- Chapter One: Introduction .. 2

    1-1- Statement of the problem .. 3

    1-2- Background of the research .. 4

    1-3- Purpose of the research .. 5

    1-4- Importance of the research .. 5

    1-5- Dissertation speeches 8

    2- Chapter Two: Clustering based on Fuzzy c-means algorithm. 10

    2-1- Introduction .. 11

    2-2- Clustering of information. 11

    2-2-1- Difference between clustering and classification . 13 2-2-3- Types of clusters. 15 2-2-4- Clustering steps. 15 2-2-5- Types of clustering methods. 18

    2-2-6- Hierarchical clustering. 18

    2-2-6-1- Dividing hierarchical clustering. 19

    2-2-6-2- Condensing hierarchical clustering. 19

    Title

    2-2-7- Partition clustering. 22

    2-2-7-1- k-means algorithm. 23

    2-2-8- Clustering Overlap. 26

    2-2-8-1- Fuzzy clustering. 27

    3- Third chapter: Optimization based on bat algorithm. 33

    3-1- Introduction .. 34

    3-2- Description of the optimization problem. 35

    3-3- methods of solving optimization problems. 39

        3-3-1- particle mass optimization algorithm.  43

        3-3-2- bee mating algorithm. 45

    3-3-3- Ant algorithm. 46

    3-3-4- Prohibited search pattern algorithm. 48

        3-3-5-steel plating algorithm. 49

    3-3-6- Bat algorithm. 51

        7-3-3- Suggested solutions to improve the performance of the bat algorithm. 54

    3-3-7-1- Selection of the initial population based on the null rule of the opposite number. 54

         3-3-7-2- self-adjusting mutation strategy. 55

    3-4- Comparison criteria of optimization algorithms. 58

          3-4-1- Efficiency.. 58

          3-4-2- Standard deviation. 58

    3-4-3- Reliability. 59

          3-4-4- Convergence speed. 59

    Title 59 3-5-Definition of various numerical problems. 60

        3-5-1-Rosenbrock function. 61

    3-5-2- Schewefel function. 62

    3-5-3- Rastragin function. 63

    3-5-4- Ashley function. 64

    3-5-5- Greiwank function. 65

    4- Fourth chapter: proposed algorithm 4-1- Introduction .. 67

    4-2- Information clustering by the proposed combined method. 68

    4-3- Setting the parameters of the proposed algorithm. 71

    4-4- Examining the results of the proposed algorithm and comparing it with other algorithms. 71

        4-4-1- Introducing the data used and the simulation results related to it. 72     

          4-4-1-1- Iris data set. 72

    4-4-1-2- Wine dataset. 75

    4-4-1-3- CMC data set. 77

    4-4-1-4- Vowel dataset. 80

    5- The fifth chapter: conclusion and suggestions. 82

    5-1- Conclusion.. 83

    5-2- Suggestions for future works. 

    Source:

     

    [1] M.R. Anderberg, 'Cluster Analysis for Application.', New York Academic Press, 1973.

    [2]J.A. Hartigan, "Statistical theory in clustering.", Journal of Classification, 1985, Vol.2, pp.63-76.

    [3]Jon R Kettering, "The Practice of Cluster Analysis.", Journal of Classification, 2006, Vol.23, pp.3-30.

    [4]J.J. H. Ward, "Hierarchical Grouping to Optimize an Objective Function.", Journal of the American Statistical Association, 1963, Vol.58, pp.236-244.

    [5]J. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations.", Fifth Berkeley Symp. Math. Statistics and Probability, 1967, Vol. 2. pp.281-297.

    [6] Bezdek, J. "Fuzzy mathematics in pattern classification", Ph.D. thesis. Ithaca, NY: Cornell University,Ithaca, NY: Cornell University, 1794

    [7] I. Karen, A.R. Yildiz, N. Kaya, N. Ozturk, F. Ozturk, Hybrid approach for genetic algorithm and Taguchi's method based design optimization in the automotive industry, International Journal of Production Research 4 (2006) 4897-4914 [8] Yi-Tung Kao, Erwie Zahara, I-Wei Kao, “A hybridized approach to data clustering", Expert Systems with Applications, 2008, Vol.34. pp.1754-1762.

    [9] Ehsan Asgarian, Hossein Moinzadeh, Mohsen Siriani, Jafar Habibi "A new approach for fuzzy clustering by genetic algorithm.", 13th annual conference of the Iranian Computer Association. 1386.

    [10] Hesam Izakian, Ajith Abraham, "Fuzzy C - means and fuzzy swarm for fuzzy clustering problem", Expert Systems with

    Applications 38, 1835–1838, 2011. [11] K.S.F. Shu, Z. Erwie, A hybrid simplex search and particle swarm optimization for unconstrained optimization, European Journal of Operational Research 181 (2007) 527–548. [12] Fatemeh Golichenari, Mohammad Saniee Abadeh, A new Method For Fuzzy Clustering Based on Fuzzy C-means Algorithm and Memetic Algorithm,2007 [13] S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi, "Optimization by Simulated Annealing", Science, 220, 4598, pp. 671-680, 1983. [14] Saeed parsa, Hamid saadi, Hamid mohamadi, Scheduling jobs on computational grid using simulated annealings, 2007 [15] Suman, B. (2004) "Study of simulated annealing based algorithms for multi objective optimization of a constrained problem", Computers and Chemical Engineering, Volume 28, Issue 9, pp. 1871-1849.

      [16] Zhang, R. and Wu, C. (2010) "A hybrid immune simulated annealing algorithm for the job shop scheduling problem", Applied Soft Computing, 10, pp. 79-89.

    [17] Aida Khayabani, Jamal Shahrabi, Rasool Aliannejad, Arash Sabbaghi, "The use of data mining in the diagnosis of tuberculosis", 3rd Iran data mining conference, 2018. 1388.

    [19] J. C. Bezdek, "Feature selection for binary data-Medical diagnosis

            with fuzzy sets," in Proc. Nat. Comput. Conf. AFIPS Press, 1972, pp. 1057-1068.

    [20]Massoud Esani, Maryam Ranjpour, Farid Yousefi, "Review of Fuzzy Clustering Algorithms.", Iran Data Mining Conference, 1388.

    [21]Jiawei Han, Micheline Kamber. 'Data Mining concepts and techniques.', Diane Cerra, 2006.

    [22] Gabriela Czibula, Grigreta Sofia Cojocar, Istvan Gergely Czibula,

       “A Partitional Clustering Algorithm for Crosscutting Concerns         

           Identification.”, proceedings of the 8th wseas int. conference on software engineering, parallel and distributed systems, 2010, pp.111-116. [23] Jiahai Wang, Yalan Zhou, "Stochastic optimal competitive Hopfield network for partitional clustering". Expert Systems with Applications, 2009, Vol.36. pp.2072-2080.

    [24] A.K. Jain, M.N. Murty, P.J. Flynn, "Data Clustering: A Review.", ACM Computing Surveys, 1999, Vol.31. pp.264-323.

    [25]Georgios P. Papamichail, Dimitrios P. Papamichail, "The k-means range algorithm for personalized data clustering in e-commerce.", European Journal of Operational Research, 2007, Vol.177. pp.1400-1408.

    [26] Ohn Mar San, Van-Nam Huynh, Yoshiteru Nakamori, "An alternative extension of the k-means algorithm for clustering.", Int. J. Appl. Math. Comput, 2004, Vol.14. pp.241-247.

    [27] Tarsitano Agostino, "A computational study of several relocation methods for k-means algorithms.", Pattern Recognition, 2003, Vol.36. pp.2955-2966.

    [28]H. Ralambondrainy, "A conceptual version of the K-means algorithm

Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means