Word Files
Reference for Downloading Educational Files

Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

Number of pages: 102 File Format: word File Code: 30894
Year: 2012 University Degree: Master's degree Category: Industrial Engineering

Tags/Keywords: Bat algorithm - Clustering - Fuzzy c-means - Information clustering

Part of the Content
Contents & Resources

Summary of Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

Master's thesis in the field of automation and instrumentation engineering

Abstract

Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

Clustering is placing data in groups where the members of each group are similar from a certain angle. The similarity between the data within each cluster is maximum and the similarity between data within different clusters is minimum.

Fuzzy c-means is also a fuzzy clustering technique which, despite being sensitive to initialization and convergence to local optimal points, is one of the most common methods due to its efficiency and easy implementation. In this thesis, to solve the existing problems, the combined method based on the bat algorithm and Fuzzy c-means will be used. In order to validate, the proposed method will be implemented on several well-known different data and the results will be compared with forbidden search algorithm, ants, particle community, steel plating and k-means methods. The high ability and robustness of this method will be evident based on the results.

Introduction

Data and patterns are one of the most important indicators in the world of information, and clustering is one of the best methods that have been provided to work with data. Its ability to enter the data space and recognize their structure has made clustering one of the most ideal mechanisms for working with the huge world of data.

In clustering, samples are divided into categories that are not known in advance. Therefore, clustering is a learning method that independently categorizes data without prior knowledge and observing pre-defined samples.

Clustering is actually finding structure in unclassified data sets. In other words, clustering is putting data into groups where the members of each group are similar from a certain angle. As a result, the similarity between data within each cluster is maximum and the similarity between data within different clusters is minimum. The criterion of similarity here is distance, which means that samples that are closer to each other are placed in a cluster. Therefore, calculating the distance between two data is very important in clustering; Because the quality of the final results will change.

Distance, which is the representative of inhomogeneity, enables movement in the data space and causes the creation of clusters. By calculating the distance between two data, you can understand how close these two data are to each other and whether they are in the same cluster or not? There are various mathematical functions to calculate the distance; Euclidean distance, Hamming distance and .

1-1-Problem statement

Clustering is finding the structure in unlabeled data sets and it can be considered as the most important problem in unsupervised learning. The idea of ??clustering was first proposed in the decade of 1935, and today it has been present in various applications and aspects with the huge advances and leaps that have occurred in it. A simple search on the web or even in a library database reveals its amazing utility. Clustering algorithms are used in various fields, the following can be listed as examples:

Data mining[1]: discovering new information and structure from existing data

Speech recognition[2]: in building a codebook from feature vectors, in dividing speech according to its speakers or speech compression

Image segmentation[3]: segmenting medical or satellite images

Web (WWW): classification of documents or classification of sites and .

Biology[4]: classification of animals and plants based on their characteristics

Urban planning[5]: classification of houses based on their type and geographical location

Seismography studies [6]: detection of accident-prone areas based on previous observations

Library: classification of books

Insurance: detection of fraudulent people

Marketing[7]: categorizing customers into categories according to their needs through the collection of their latest purchases.

Due to the increasing use of clustering, today we are witnessing the presentation of new and more efficient methods, each of which is provided for a specific application. But despite all these efforts, clustering is still not used as much as it should be in many sciences and there is a lot of potential for it to be expanded.

1-2-Research Background

We live in a world full of data and every day we are faced with a large amount of storing or displaying information. One of the vital methods of controlling and managing these data is clustering. In this method, data that have similar properties are placed in a category or a cluster. For the first time, the idea of ??clustering was presented in the 1935s, and today it has attracted the attention of many researchers with the huge advances and leaps that have occurred in it. Therefore, it has been present in various applications and aspects, and various methods have been proposed for its exploitation [1]. In one sense, clustering algorithms can be divided into two general categories: hard clustering and fuzzy clustering. In hard clustering, a data belongs to one and only one cluster, while in fuzzy clustering, a data may belong to two or more clusters at the same time [2], [3], [4]. Fuzzy c-means algorithm is one of the famous fuzzy clustering methods that can be easily implemented. Unfortunately, its original version has limitations such as dependence on initial values ??and convergence to the local optimal response [5], [6]. In the genetic algorithm, these limitations have disappeared. At the same time, by combining these two algorithms, significant results have been obtained, and the speed of convergence has also increased far more than the previous examples [7]. By combining two genetic algorithms and PSO, Kao and his colleagues invented a method in which he used mutation and crossover operators for genetics. This method was able to solve various problems of continuous functions. Also, significant changes have been achieved in finding the global optimal solution and the convergence ratio [8]. Using the combination of genetic algorithm and fuzzy method, a method was suggested by Asgarian in 2016. In this method, the problem of dependence on the initial number of clusters and the initial location of their centers is high and with the inability to cluster data whose distance from the centers of several clusters is the same; It was countered. Another advantage of this combination is reducing the complexity of calculations [9]. Another combination method that is used in data mining problems is the use of the combination of Fuzzy c-means and PSO, which was able to improve the problem of convergence to local optimality and the speed of convergence [10], [11]. Another new combined method is the combination of FCM algorithm and fuzzy memetic algorithm. In order to improve clustering performance, the results of this technique show that it has better answers and its stability is also higher [12]. The combination of FCM and SA is another example of combined methods used in cancer diagnosis [13], [14], [15], [16]. In line with the mentioned efforts, this thesis tries to take advantage of the advantages of two algorithms in solving clustering problems by using the combination of FCM algorithm and bat algorithm.

1-3-Research Objective

The purpose of this research is to present an algorithm that can cover the existing limitations to an acceptable extent by examining the existing algorithms in the field of clustering. Some of the existing limitations can be listed as follows:

× Performance for high-volume databases

× Discovery of clusters with different shapes

× Insensitivity to the order of input data

× Ability to interpret and use

1-4-Importance of research

Simultaneously with the increase of database systems and multiple tools for storing large volumes of data, there is a need for automatic methods to fully discover knowledge from within the data. It was felt. In addition, due to the high cost of human and material resources to perform operations on massive amounts of data, it was necessary to provide methods with minimal user intervention. Extracting appropriate information from the masses of data and turning them into knowledge needed by organizations - especially in organizational decision-making - required the use of new methods in this field. Data mining [8] is one of these tools that helps to discover knowledge from databases. It can be said that data mining is extracting valid, understandable and reliable information from very large databases, which helps to discover hidden patterns and reliable relationships between data and use it in decision making. In fact, knowing and dealing with data is one of the important goals in data mining.

This process was introduced in the late 90s and entered statistical discussions seriously since 1995, and it is currently one of the most important tools for the effective use of large amounts of data, and its importance is increasing every day.
Contents & References of Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

List:

Title

1- Chapter One: Introduction .. 2

1-1- Statement of the problem .. 3

1-2- Research background .. 4

1-3- Research goal .. 5

1-4- The importance of research .. 5

1-5- Dissertation speeches . 8

2- Second chapter: clustering based on Fuzzy c-means algorithm . 10

2-1- Introduction .. 11

2-2- Information clustering . .13

    2-2-2-clustering applications. 13 2-2-3- Types of clusters. 15 2-2-4- Clustering steps. 15 2-2-5- Types of clustering methods. 18

2-2-6- Hierarchical clustering. 18

2-2-6-1- Dividing hierarchical clustering. 19

2-2-6-2- Condensing hierarchical clustering. 19

Title

2-2-7- Partition clustering. 22

2-2-7-1- k-means algorithm. 23

2-2-8- Clustering Overlap. 26

2-2-8-1- Fuzzy clustering. 27

3- Third chapter: Optimization based on bat algorithm. 33

3-1- Introduction .. 34

3-2- Description of the optimization problem. 35

3-3- methods of solving optimization problems. 39

    3-3-1- particle mass optimization algorithm. 43

    3-3-2- bee mating algorithm. 45

3-3-3- Ant algorithm. 46

3-3-4- Prohibited search pattern algorithm. 48

    3-3-5-steel plating algorithm. 49

3-3-6- Bat algorithm. 51

    7-3-3- Suggested solutions to improve the performance of the bat algorithm. 54

3-3-7-1- Selection of the initial population based on the null rule of the opposite number. 54

     3-3-7-2- self-adjusting mutation strategy. 55

3-4- Comparison criteria of optimization algorithms. 58

      3-4-1- Efficiency.. 58

      3-4-2- Standard deviation. 58

3-4-3- Reliability. 59

      3-4-4- Convergence speed. 59

Title 5-3- Definition of various numerical problems. 60

    3-5-1-Rosenbrock function. 61

3-5-2- Schewefel function. 62

3-5-3- Rastragin function. 63

3-5-4- Ashley function. 64

3-5-5- Greiwank function. 65

4- Fourth chapter: proposed algorithm 4-1- Introduction .. 67

4-2- Information clustering by the proposed combined method. 68

4-3- Setting the parameters of the proposed algorithm. 71

4-4- Examining the results of the proposed algorithm and comparing it with other algorithms. 71

    4-4-1- Introducing the data used and the simulation results related to it. 72

      4-4-1-1- Iris data set. 72

4-4-1-2- Wine dataset. 75

4-4-1-3- CMC data set. 77

4-4-1-4- Vowel dataset. 80

5- The fifth chapter: conclusion and suggestions. 82

5-1- Conclusion.. 83

5-2- Suggestions for future works. 84

Table List

Title and Page Number

Table1 Table 2 benefits and disadvantages of algorithm K-Means

Table 2-2 Advantages and disadvantages of fuzzy average c algorithm. 31

Table 2-3 Similarity criteria based on different distance functions. 32

Table 3-1 Numerical functions used to test algorithms. 60

Table 4-1 Parameters related to the proposed algorithms. 71 Table 4-2 Cluster centers obtained by running the FCM-BA algorithm on the Iris dataset. 73

Table 4-3 Algorithm response available on the Iris dataset. 74

Table 4-4 FCM-BA algorithm response74

Table 4-4 FCM-BA algorithm response based on different parameter values ??on the Iris data set. 74

Table 4-5 response of existing algorithms on Wine data set. 75

Table 4-6 cluster centers obtained by running FCM-BA algorithm on Wine data set. 76

Table 4-7 FCM-BA algorithm response based on different values ??of parameters on Wine data set. 77

Table 4-8 Cluster centers obtained by running the proposed algorithm on the CMC dataset. 78

Table 4-9 Answers of existing algorithms on the CMC dataset. 79

Table 4-10 Answers of the FCM-BA algorithm on different values ??of parameters on the CMC dataset. 79

Table 4-11 Cluster centers obtained by running the proposed algorithm on the Vowel dataset. 80

Table 4-12 Answers of the existing algorithms on the set Vowel data. 80

Table 4-13 FCM-BA algorithm response based on different parameter values ??on the Vowel data set. 81

Source:

References

[1]M.R. Anderberg, 'Cluster Analysis for Applications.', New York Academic Press, 1973.

[2]J.A. Hartigan, "Statistical theory in clustering.", Journal of Classification, 1985, Vol.2, pp.63-76.

[3]Jon R Kettering, "The Practice of Cluster Analysis.", Journal of Classification, 2006, Vol.23, pp.3-30.

[4]J.J. H. Ward, "Hierarchical Grouping to Optimize an Objective Function.", Journal of the American Statistical Association, 1963, Vol.58, pp.236-244.

[5]J. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations.", Fifth Berkeley Symp. Math. Statistics and Probability, 1967, Vol. 2. pp.281-297.

[6] Bezdek, J. "Fuzzy mathematics in pattern classification", Ph.D. thesis. Ithaca, NY: Cornell University, 1794

[7] I. Karen, A.R. Yildiz, N. Kaya, N. Ozturk, F. Ozturk, Hybrid approach for genetic algorithm and Taguchi's method based design optimization in the automotive industry, International Journal of Production Research 4 (2006) 4897-4914 [8] Yi-Tung Kao, Erwie Zahara, I-Wei Kao, “A hybridized approach to data clustering", Expert Systems with Applications, 2008, Vol.34. pp.1754-1762.

[9] Ehsan Asgarian, Hossein Moinzadeh, Mohsen Siriani, Jafar Habibi "A new approach for fuzzy clustering by genetic algorithm.", 13th annual conference of the Iranian Computer Association. 1386.

[10] Hesam Izakian, Ajith Abraham, "Fuzzy C - means and fuzzy swarm for fuzzy clustering problem", Expert Systems with
Applications 38, 1835–1838, 2011 [11] K.S.F. Shu, Z. Erwie, A hybrid simplex search and particle swarm optimization for unconstrained optimization, European Journal of Operational Research 181 (2007) 527–548. [12] Fatemeh Golichenari, Mohammad Saniee Abadeh, A new Method For Fuzzy Clustering Based on Fuzzy C-means Algorithm and Memetic Algorithm, 2007. [13] S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi, "Optimization by Simulated Annealing", Science, 220, 4598, pp. 671-680, 1983. [14] Saeed parsa, Hamid saadi, Hamid mohamadi, Scheduling jobs on computational grid using simulated annealings, 2007 [15] Suman, B. (2004) "Study of simulated annealing based algorithms for multi objective optimization of a constrained problem", Computers and Chemical Engineering, Volume 28, Issue 9, pp. 1871-1849.

[16] Zhang, R. and Wu, C. (2010) "A hybrid immune simulated annealing algorithm for the job shop scheduling problem", Applied Soft Computing, 10, pp. 79-89.

[17] Aida Khayabani, Jamal Shahrabi, Rasool Aliannejad, Arash Sabbaghi, "The use of data mining in the diagnosis of tuberculosis", 3rd Iran data mining conference, 2018. 1388.

[19] J. C.

How To Access The File

Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

Number of pages: 103 Category: Industrial Engineering

Master's thesis in the field of automation engineering and instrumentation, abstract presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means. The similarity between data within each cluster is maximum and the similarity between data within different clusters is minimum. Fuzzy c-means is also a fuzzy clustering technique that ...

Using an improved colonial competition algorithm for image segmentation

Number of pages: 89 Category: Computer Engineering

Dissertation for Master's Degree in Computer Engineering - Artificial Intelligence Abstract Image segmentation is a basic process in many applications of image processing and machine vision, which can be considered as the first low-level processing step in digital image processing. Image segmentation has various applications such as medical image processing, face recognition, ...

Cluster optimization using evolutionary algorithms for web personalization

Number of pages: 79 Category: Computer Engineering

Master's Thesis Field: Computer Engineering Major: Software Abstract Expensiveness of information is a major problem in the current web. To deal with this problem, web personalization systems have been provided that adapt the content and services of a website to people based on their interests and browsing behavior. A fundamental component of any web personalization system is ...

Fuzzy clustering of data based on fuzzy logic

Number of pages: 46 Category: Computer Engineering

Dissertation for Master's Degree in Computer Engineering - Artificial Intelligence Abstract Data clustering is a method for classifying similar data, which has been used for many years in various sciences and many algorithms have been designed in this field. Recent clustering research leads to hybrid methods that are more robust and accurate. Hybrid clustering tries to first ...

Achieving quality of service in wireless sensor networks using cellular learning automata

Number of pages: 207 Category: Computer Engineering

Master's Thesis in Computer-Software (M.Sc) Abstract The quality of service in wireless sensor networks is very different compared to traditional networks. Some of the parameters that are used in evaluating the quality of service in these networks are: network coverage, optimal number of active nodes in the network, network lifetime and energy consumption. In this thesis, three ...

Presenting an ant community algorithm to improve the time of doing tasks in the grid environment

Number of pages: 85 Category: Computer Engineering

Dissertation for M.Sc. Abstract In this thesis, we have presented a new method in network processing with Ant algorithm. The model we used in the network space is a continuous two-way auction. Due to their simplicity and dynamics, these models are used in many algorithms used to control resources and schedule tasks. Many of these models have weaknesses in their response time ...

Presenting a dynamic target tracking algorithm based on prediction in wireless sensor network

Number of pages: 104 Category: Computer Engineering

Master's Thesis of Computer Engineering - Computer Architecture Abstract With the advancement of electronic device manufacturing technology and the cost-effectiveness of large-scale sensor networks, wireless sensor networks provide research fields with rapid growth and great interest, which have attracted a lot of attention in recent years. Large-scale wireless sensor networks ...

Predicting exploitation and clustering of vulnerabilities by means of text mining

Number of pages: 105 Category: Computer Engineering

A master's thesis in the field of computer-software engineering. Software vulnerabilities can lead to financial and information losses. Due to the limited financial and human resources, prioritizing the damages is very important. Before this research, a large number of researchers have classified vulnerabilities based on empirical and statistical knowledge. However, the variable ...

Presenting a model to identify the influencing factors and their impact factor in the profit and loss of the third party car insurance of insurance companies by means of data mining methods, a case study of Iran Insurance Company.

Number of pages: 100 Category: Computer Engineering

Master's thesis in the field of computer - software engineering. Abstract: The review of car insurance information has shown that factors such as the type of car used, having a driver's license, the type of license and its compatibility or non-compatibility with the vehicle, the amount of the insurance premium, the amount of insurance policy obligations, the quality of the car ...

Creating a recommender system on the web using user profiles and machine learning methods

Number of pages: 85 Category: Computer Engineering

Computer Engineering Master's Thesis Abstract Web development that lacks an integrated structure creates many problems for users. Not finding the information needed by users in this huge warehouse is one of the problems of web users. In order to deal with these problems, web personalization systems have been provided, which by finding the behavior patterns of users without their ...

Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

Summary of Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

Contents & References of Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means