Word Files
Reference for Downloading Educational Files

Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

Number of pages: 103 File Format: word File Code: 32136
Year: 2012 University Degree: Master's degree Category: Industrial Engineering

Tags/Keywords: A community of particles - ants - Bat algorithm - Fuzzy clustering - Fuzzy c-means - Information clustering - Prohibited search algorithm

Part of the Content
Contents & Resources

Summary of Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

Master's Thesis in the field of Automation and Instrumentation Engineering

Abstract

Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

Clustering is putting data into groups where the members of each group are similar from a certain angle. The similarity between the data within each cluster is maximum and the similarity between data within different clusters is minimum.

Fuzzy c-means is also a fuzzy clustering technique which, despite being sensitive to initialization and convergence to local optimal points, is one of the most common methods due to its efficiency and easy implementation. In this thesis, to solve the existing problems, the combined method based on the bat algorithm and Fuzzy c-means will be used. In order to validate, the proposed method will be implemented on several well-known different data and the results will be compared with forbidden search algorithm, ants, particle community, steel plating and k-means methods. The high ability and robustness of this method will be evident based on the results.

Introduction

The community of particles and patterns are one of the most important indicators in the world of information, and clustering is one of the best methods that have been provided to work with data. Its ability to enter the data space and recognize their structure has made clustering one of the most ideal mechanisms for working with the huge world of data.

In clustering, samples are divided into categories that are not known in advance. Therefore, clustering is a learning method that independently categorizes data without prior knowledge and observing pre-defined samples.

Clustering is actually finding structure in unclassified data sets. In other words, clustering is putting data into groups where the members of each group are similar from a certain angle. As a result, the similarity between data within each cluster is maximum and the similarity between data within different clusters is minimum. The criterion of similarity here is distance, which means that samples that are closer to each other are placed in a cluster. Therefore, calculating the distance between two data is very important in clustering; Because the quality of the final results will change.

Distance, which is the representative of inhomogeneity, enables movement in the data space and creates clusters. By calculating the distance between two data, you can understand how close these two data are to each other and whether they are in the same cluster or not? There are various mathematical functions to calculate the distance; Euclidean distance, Hamming distance and .

1-1- Statement of the problem

Clustering is finding structure in sets of unlabeled data and it can be considered as the most important problem in unsupervised learning. The idea of ??clustering was first proposed in the decade of 1935, and today it has been present in various applications and aspects with the huge advances and leaps that have occurred in it. A simple search on the web or even in a library database reveals its amazing utility. Clustering algorithms are used in various fields, the following can be listed as examples:

Data mining[1]: discovering new information and structure from existing data

Speech recognition[2]: in building a code book from feature vectors, in dividing speech according to its speakers or speech compression

image classification [3]: classification of medical or satellite images

Web (WWW): classification of documents or classification of sites and .

Biology[4]: Classification of animals and plants based on their characteristics

Urban planning[5]: Classification of houses based on their type and geographical location

Seismography studies[6]: Identifying accident-prone areas based on previous observations

Library: classification of books

Insurance: detection of fraudulent people

Marketing [7]: classification of customers into categories according to their needs through the collection of their latest purchases.

Due to the increasing use of clustering, today we are witnessing the presentation of new and more efficient methods, each for a specific application. can be But with all these efforts, clustering has not been used as much as it should be in many sciences and there is a lot of expansion potential for it.

1-2- Research background

We live in a world full of data and we are faced with a large amount of storing or displaying information every day. One of the vital methods of controlling and managing these data is clustering. In this method, data that have similar properties are placed in a category or a cluster. For the first time, the idea of ??clustering was presented in the 1935s, and today it has attracted the attention of many researchers with the huge advances and leaps that have occurred in it. Therefore, it has been present in various applications and aspects, and various methods have been proposed for its exploitation [1]. In one sense, clustering algorithms can be divided into two general categories: hard clustering and fuzzy clustering. In hard clustering, a data belongs to one and only one cluster, while in fuzzy clustering, a data may belong to two or more clusters at the same time [2], [3], [4]. Fuzzy c-means algorithm is one of the famous fuzzy clustering methods that can be easily implemented. Unfortunately, its original version has limitations such as dependence on initial values ??and convergence to the local optimal response [5], [6]. In the genetic algorithm, these limitations have disappeared. At the same time, by combining these two algorithms, significant results have been obtained, and the speed of convergence has also increased far more than the previous examples [7]. By combining two genetic algorithms and PSO, Kao and his colleagues invented a method in which he used mutation and crossover operators for genetics. This method was able to solve various problems of continuous functions. Also, significant changes have been achieved in finding the global optimal solution and the convergence ratio [8]. Using the combination of genetic algorithm and fuzzy method, a method was suggested by Asgarian in 2016. In this method, the problem of dependence on the initial number of clusters and the initial location of their centers is high and with the inability to cluster data whose distance from the centers of several clusters is the same; It was countered. Another advantage of this combination is reducing the complexity of calculations [9]. Another combination method that is used in data mining problems is the use of the combination of Fuzzy c-means and PSO, which was able to improve the problem of convergence to local optimality and the speed of convergence [10], [11]. Another new combined method is the combination of FCM algorithm and fuzzy memetic algorithm. In order to improve clustering performance, the results of this technique show that it has better answers and its stability is also higher [12]. The combination of FCM and SA is another example of hybrid methods used in cancer diagnosis [13], [14], [15], [16]. style="direction: rtl;">

BY

Esmat Barzegar

Clustering is a division of data into groups of similar objects. Each cluster consists of objects that are similar between themselves and dissimilar to objects of other groups. Fuzzy c-means (FCM) algorithm is one of the most popular fuzzy clustering techniques because it is efficient, straightforward and easy to implement. However, FCM is sensitive to initialization and is easily trapped in local optima.

In this thesis, a hybrid fuzzy clustering method based on FCM and Bat algorithm is proposed which makes use of the merits of both algorithms.
Contents & References of Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

List:

1- Chapter One: Introduction .. 2

1-1- Statement of the problem .. 3

1-2- Background of the research .. 4

1-3- Purpose of the research .. 5

1-4- Importance of the research .. 5

1-5- Dissertation speeches 8

2- Chapter Two: Clustering based on Fuzzy c-means algorithm. 10

2-1- Introduction .. 11

2-2- Clustering of information. 11

2-2-1- Difference between clustering and classification . 13 2-2-3- Types of clusters. 15 2-2-4- Clustering steps. 15 2-2-5- Types of clustering methods. 18

2-2-6- Hierarchical clustering. 18

2-2-6-1- Dividing hierarchical clustering. 19

2-2-6-2- Condensing hierarchical clustering. 19

Title

2-2-7- Partition clustering. 22

2-2-7-1- k-means algorithm. 23

2-2-8- Clustering Overlap. 26

2-2-8-1- Fuzzy clustering. 27

3- Third chapter: Optimization based on bat algorithm. 33

3-1- Introduction .. 34

3-2- Description of the optimization problem. 35

3-3- methods of solving optimization problems. 39

    3-3-1- particle mass optimization algorithm. 43

    3-3-2- bee mating algorithm. 45

3-3-3- Ant algorithm. 46

3-3-4- Prohibited search pattern algorithm. 48

    3-3-5-steel plating algorithm. 49

3-3-6- Bat algorithm. 51

    7-3-3- Suggested solutions to improve the performance of the bat algorithm. 54

3-3-7-1- Selection of the initial population based on the null rule of the opposite number. 54

     3-3-7-2- self-adjusting mutation strategy. 55

3-4- Comparison criteria of optimization algorithms. 58

      3-4-1- Efficiency.. 58

      3-4-2- Standard deviation. 58

3-4-3- Reliability. 59

      3-4-4- Convergence speed. 59

Title 59 3-5-Definition of various numerical problems. 60

    3-5-1-Rosenbrock function. 61

3-5-2- Schewefel function. 62

3-5-3- Rastragin function. 63

3-5-4- Ashley function. 64

3-5-5- Greiwank function. 65

4- Fourth chapter: proposed algorithm 4-1- Introduction .. 67

4-2- Information clustering by the proposed combined method. 68

4-3- Setting the parameters of the proposed algorithm. 71

4-4- Examining the results of the proposed algorithm and comparing it with other algorithms. 71

    4-4-1- Introducing the data used and the simulation results related to it. 72

      4-4-1-1- Iris data set. 72

4-4-1-2- Wine dataset. 75

4-4-1-3- CMC data set. 77

4-4-1-4- Vowel dataset. 80

5- The fifth chapter: conclusion and suggestions. 82

5-1- Conclusion.. 83

5-2- Suggestions for future works.

Source:

[1] M.R. Anderberg, 'Cluster Analysis for Application.', New York Academic Press, 1973.

[2]J.A. Hartigan, "Statistical theory in clustering.", Journal of Classification, 1985, Vol.2, pp.63-76.

[3]Jon R Kettering, "The Practice of Cluster Analysis.", Journal of Classification, 2006, Vol.23, pp.3-30.

[4]J.J. H. Ward, "Hierarchical Grouping to Optimize an Objective Function.", Journal of the American Statistical Association, 1963, Vol.58, pp.236-244.

[5]J. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations.", Fifth Berkeley Symp. Math. Statistics and Probability, 1967, Vol. 2. pp.281-297.

[6] Bezdek, J. "Fuzzy mathematics in pattern classification", Ph.D. thesis. Ithaca, NY: Cornell University,Ithaca, NY: Cornell University, 1794

[7] I. Karen, A.R. Yildiz, N. Kaya, N. Ozturk, F. Ozturk, Hybrid approach for genetic algorithm and Taguchi's method based design optimization in the automotive industry, International Journal of Production Research 4 (2006) 4897-4914 [8] Yi-Tung Kao, Erwie Zahara, I-Wei Kao, “A hybridized approach to data clustering", Expert Systems with Applications, 2008, Vol.34. pp.1754-1762.

[9] Ehsan Asgarian, Hossein Moinzadeh, Mohsen Siriani, Jafar Habibi "A new approach for fuzzy clustering by genetic algorithm.", 13th annual conference of the Iranian Computer Association. 1386.

[10] Hesam Izakian, Ajith Abraham, "Fuzzy C - means and fuzzy swarm for fuzzy clustering problem", Expert Systems with

Applications 38, 1835–1838, 2011. [11] K.S.F. Shu, Z. Erwie, A hybrid simplex search and particle swarm optimization for unconstrained optimization, European Journal of Operational Research 181 (2007) 527–548. [12] Fatemeh Golichenari, Mohammad Saniee Abadeh, A new Method For Fuzzy Clustering Based on Fuzzy C-means Algorithm and Memetic Algorithm,2007 [13] S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi, "Optimization by Simulated Annealing", Science, 220, 4598, pp. 671-680, 1983. [14] Saeed parsa, Hamid saadi, Hamid mohamadi, Scheduling jobs on computational grid using simulated annealings, 2007 [15] Suman, B. (2004) "Study of simulated annealing based algorithms for multi objective optimization of a constrained problem", Computers and Chemical Engineering, Volume 28, Issue 9, pp. 1871-1849.

[16] Zhang, R. and Wu, C. (2010) "A hybrid immune simulated annealing algorithm for the job shop scheduling problem", Applied Soft Computing, 10, pp. 79-89.

[17] Aida Khayabani, Jamal Shahrabi, Rasool Aliannejad, Arash Sabbaghi, "The use of data mining in the diagnosis of tuberculosis", 3rd Iran data mining conference, 2018. 1388.

[19] J. C. Bezdek, "Feature selection for binary data-Medical diagnosis

        with fuzzy sets," in Proc. Nat. Comput. Conf. AFIPS Press, 1972, pp. 1057-1068.

[20]Massoud Esani, Maryam Ranjpour, Farid Yousefi, "Review of Fuzzy Clustering Algorithms.", Iran Data Mining Conference, 1388.

[21]Jiawei Han, Micheline Kamber. 'Data Mining concepts and techniques.', Diane Cerra, 2006.

[22] Gabriela Czibula, Grigreta Sofia Cojocar, Istvan Gergely Czibula,

   “A Partitional Clustering Algorithm for Crosscutting Concerns

       Identification.”, proceedings of the 8th wseas int. conference on software engineering, parallel and distributed systems, 2010, pp.111-116. [23] Jiahai Wang, Yalan Zhou, "Stochastic optimal competitive Hopfield network for partitional clustering". Expert Systems with Applications, 2009, Vol.36. pp.2072-2080.

[24] A.K. Jain, M.N. Murty, P.J. Flynn, "Data Clustering: A Review.", ACM Computing Surveys, 1999, Vol.31. pp.264-323.

[25]Georgios P. Papamichail, Dimitrios P. Papamichail, "The k-means range algorithm for personalized data clustering in e-commerce.", European Journal of Operational Research, 2007, Vol.177. pp.1400-1408.

[26] Ohn Mar San, Van-Nam Huynh, Yoshiteru Nakamori, "An alternative extension of the k-means algorithm for clustering.", Int. J. Appl. Math. Comput, 2004, Vol.14. pp.241-247.

[27] Tarsitano Agostino, "A computational study of several relocation methods for k-means algorithms.", Pattern Recognition, 2003, Vol.36. pp.2955-2966.

[28]H. Ralambondrainy, "A conceptual version of the K-means algorithm

How To Access The File

Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

Number of pages: 102 Category: Industrial Engineering

Master's thesis in the field of automation engineering and precision instruments, abstract presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means. The similarity between data within each cluster is maximum and the similarity between data within different clusters is minimum. Fuzzy c-means is also a fuzzy clustering technique that ...

Presenting an ant community algorithm to improve the time of doing tasks in the grid environment

Number of pages: 85 Category: Computer Engineering

Dissertation for M.Sc. Abstract In this thesis, we have presented a new method in network processing with Ant algorithm. The model we used in the network space is a continuous two-way auction. Due to their simplicity and dynamics, these models are used in many algorithms used to control resources and schedule tasks. Many of these models have weaknesses in their response time ...

Using an improved colonial competition algorithm for image segmentation

Number of pages: 89 Category: Computer Engineering

Dissertation for Master's Degree in Computer Engineering - Artificial Intelligence Abstract Image segmentation is a basic process in many applications of image processing and machine vision, which can be considered as the first low-level processing step in digital image processing. Image segmentation has various applications such as medical image processing, face recognition, ...

Cluster optimization using evolutionary algorithms for web personalization

Number of pages: 79 Category: Computer Engineering

Master's Thesis Field: Computer Engineering Major: Software Abstract Expensiveness of information is a major problem in the current web. To deal with this problem, web personalization systems have been provided that adapt the content and services of a website to people based on their interests and browsing behavior. A fundamental component of any web personalization system is ...

Fuzzy clustering of data based on fuzzy logic

Number of pages: 46 Category: Computer Engineering

Dissertation for Master's Degree in Computer Engineering - Artificial Intelligence Abstract Data clustering is a method for classifying similar data, which has been used for many years in various sciences and many algorithms have been designed in this field. Recent clustering research leads to hybrid methods that are more robust and accurate. Hybrid clustering tries to first ...

Improving the cost estimation of software projects in the COCOMO II model based on fuzzy logic algorithms

Number of pages: 126 Category: Computer Engineering

Dissertation for obtaining a master's degree in computer engineering - software orientation. Abstract In all the projects that are carried out today, the discussion of management is a very decisive issue. Software projects are no exception to this rule. One of the most important aspects of software development is time and cost management. Considering that in the early stages of ...

Improving the construction and composition of fuzzy rules using the colonial competition algorithm

Number of pages: 99 Category: Computer Engineering

Master's Thesis in Computer Engineering (Artificial Intelligence) Abstract Extracting common and understandable classifications[1] from data is an important role in many fields and issues. So far, several methods for classification [2] and pattern recognition [3] have been introduced. One of the successful and unique methods in the field of classification and pattern recognition ...

Review and evaluation of Monte Carlo algorithms and neural networks to predict air pollution in the environment of a spatial information system

Number of pages: 139 Category: Biology - Environment

Master's thesis abstract shows the necessity of having a healthy environment and raising the health level of the society, the need to have proper planning to reduce the sources of air pollutants production and predicting these pollutants to prevent its harmful effects is inevitable. Prediction of pollutants can be useful in air pollution management and control. In this research, ...

The role of geomorphological factors on the physical development of Arak city in order to develop a conceptual data model

Number of pages: 97 Category: Geography - Urban Planning

Dissertation for M.Sc degree in natural geography-geomorphology in environmental planning. Abstract The city expands in the geographical environment not based on capacity and facilities but on the basis of urgency and necessity. This issue is very noticeable in today's cities. It is noteworthy that the city must be in harmony with its natural location and it is essential that ...

Automatic segmentation of teeth using X-ray images

Number of pages: 80 Category: Computer Engineering

Dissertation for Master's Degree in Artificial Intelligence Abstract One of the most complex tasks in digital image processing is image segmentation. Due to increasing attention to this technique by researchers and turning it into a vital role, it is used in many practical fields such as medical applications. Today, in modern dentistry, techniques based on the use of computers, ...

Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

Summary of Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

Contents & References of Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means