Fuzzy clustering of data based on fuzzy logic

Number of pages: 46 File Format: word File Code: 31064
Year: 2014 University Degree: Master's degree Category: Computer Engineering
  • Part of the Content
  • Contents & Resources
  • Summary of Fuzzy clustering of data based on fuzzy logic

    Dissertation for Master's Degree in Computer Engineering - Artificial Intelligence

    Abstract

    Data clustering is a method to categorize similar data. This method has been used for years in various sciences and many algorithms have been designed in this field. Recent clustering research leads to hybrid methods that are more robust and accurate. Hybrid clustering tries to first generate initial clusters that have as much dispersion as possible, then combines the results by applying a consensus function. In this research, the combination of fuzzy clustering and support vector machine is used for classification.

    SVM is one of the supervised learning methods that is used for data classification. SVM is a new and powerful network whose formula for learning is based on minimizing the amount of error. SVM training is directly related to the number of training data, and if the number of cluster centers is large, the training time and memory volume increase greatly. The combined network (FS-FCSVM) is such that fuzzy clustering is performed on the input data, then the network parameters are trained with SVM, thus achieving a network with high generalizability. The number of rules in such systems is smaller compared to fuzzy systems and its calculation time is less. In this research, the deductive clustering method is used before fuzzy clustering. The main idea of ??deductive clustering is to search for areas with high density in the characteristic space of data information. Any point that has the largest number of neighbors is selected as the center of the cluster. In other words, the deductive clustering technique has been used to select feature points that have more differentiation and less similarity than other points. In this thesis, the idea of ??using differential clustering is to accurately find the central points of clusters and the number of clusters, which reduces the number of repetitions of fuzzy clustering and also from these central points as We use part of the training data, and the second part of the work is related to the selection of the other part of the training data, for which we also used the membership matrix obtained from the fuzzy clustering. It shows that in addition to reducing the training time, with the appropriate selection of data, it strengthens the ability of SVM to be resistant to noisy and outlier data, as well as reducing the number of selected support vectors of SVM in the large data space. Keywords: support vector machine, fuzzy clustering, differential clustering. Chapter 1: Introduction

    Clustering

    Data clustering is one of the most common data mining techniques[1]. Clustering is one of the most widely used methods in data analysis. Clustering is an automatic process that divides samples into groups whose members are similar, and these groups are called clusters. In other words, a cluster is a set of objects in which the objects are similar to each other and are dissimilar to the objects in other clusters. Clustering is used in many fields, including pattern recognition[2], machine learning, data mining, information retrieval, and bioinformatics. The purpose of clustering is to provide the end user with a proper perspective of what is happening in the database. Another application of clustering can be defined as determining data that are significantly different from other data. In clustering, a group of data is tried to be divided into clusters in an unsupervised manner to maximize the similarity of data within each cluster and minimize the similarity between data within different clusters [5,6]. Clustering algorithms [7,8] data objects (plans, entities, samples, observations, units) into a certain number of clusters (groups, subsets or articles).  Everett [3] (2001) about clustering states that clustering is a set of similar institutions, but the institutions of different clusters are not similar..

    For similarity, different criteria can be considered, for example, the distance criterion can be used for clustering and objects that are closer to each other are considered as a cluster, which is also called distance-based clustering.

    In Figure 1-1, each of the input samples belongs to one of the clusters, and there is no sample that belongs to more Be from a cluster. As another example, consider Figure 2-1. In this figure, each of the small circles shows a vehicle (object) characterized by its weight and maximum speed characteristics. Each of the ovals is a cluster and the expression next to each oval shows the label of that cluster. The entire coordinate system in which the samples are represented is called the feature space.

    As you can see in the figure, the vehicles are divided into three clusters. For each of these clusters, a representative can be considered, for example, the average of cargo vehicles can be calculated and introduced as a representative of the cluster of cargo vehicles. In fact, clustering algorithms are often such that a series of initial representatives are considered for the input samples, and then based on the similarity of the samples with these representatives, it is determined which cluster the sample belongs to, and after this step, new representatives are calculated for each cluster and again. Examples are compared with these representatives to determine which cluster they belong to, and this process is repeated until the representatives of the clusters do not change.

    Similarity criterion in clustering: If the similarity criterion in the objective function is defined based on distance, different definitions of distance can be used, which are given below some examples of these functions.

    Fuzzy clustering

    Fuzzy clustering can be considered a part of fuzzy data analysis, which has two parts: one is fuzzy data analysis and the other is deterministic data analysis using fuzzy techniques.

    Fuzzy clustering explores fuzzy models from data. The basic idea in fuzzy clustering is to assume that each cluster is a set of elements. Then, by changing the definition of the membership of the elements in this collection from a state where an element can only be a member of one cluster to a state where each element can be placed in several clusters with different degrees of membership, we will provide categories that are more consistent with reality. In classical clustering, each input sample belongs to one and only one cluster and cannot be a member of two or more clusters, and in other words, the clusters do not overlap. Now consider a situation where the degree of similarity of a sample is the same with two or more clusters. In classical clustering, it is necessary to decide which cluster this sample belongs to. The main difference between classical clustering and fuzzy clustering is that a sample can belong to more than one cluster [1]

    The numerous applications of fuzzy clustering in data analysis and pattern recognition as well as the existing research fields in this field including its use in solving routing, allocation and scheduling problems make the need to study existing algorithms and improve and modify them more obvious [4].

    1-2-1 Basic Fuzzy Clustering Algorithms One of the first fuzzy clustering methods based on the objective function and using the Euclidean distance was presented by Dunn in 1974 and then generalized by Badek. The resulting algorithm identifies spherical clouds of points in a P-dimensional space. These clusters are assumed to be approximately the same size. Each cluster is displayed with its centroid. This representation of clusters is also called model or sample because it is often considered representative of all the data assigned to the cluster. In choosing the center of the cluster, the mean value is used. To calculate the center of the cluster, the sum of membership degrees of each element is divided to the power of m in itself by the product of the membership degrees to the power of m. The problem related to this algorithm is that the algorithm cannot identify clusters with different shapes, sizes and densities. To identify other shapes, instead of the same matrix in determining the distance, other matrices can be used, such as the diagonal matrix to identify elliptical clusters. One of the advantages of this algorithm is its simplicity, which leads to a reduction in computing time. In practice, an almost final solution can be reached with few repetitions.

  • Contents & References of Fuzzy clustering of data based on fuzzy logic

    List:

    First chapter: Introduction

    Clustering ..  2

    Fuzzy clustering.    5

    Basic fuzzy clustering algorithms.    5

    Fuzzy clustering method.   9

    A review of fuzzy clustering articles in recent years.     8

    Differential clustering.   11

    Backup vector machine.    12

    Working method of support vector machine.    12

    Separable support vector machine.     14

    Nonlinear support vector machine.      15

    Chapter Two: An overview of the work done

    2-1 Introduction 19

    2-2 Work done.      19

    Chapter Three: The Proposed Method

    3-1 Introduction .. 24

    3-2 The general framework of the proposed method.       24

    Chapter Four: Simulation Results

    4-1 Introduction 28

    4-2 Database and simulation parameters.       28

    Chapter Five: Conclusion and Future Work

    5-1 Observation..33

    5-2 Future Work... Knowledge Discovery in Databases - Chapter 8: Data Clustering".

    [2] Pier Luca Lanzi: "Ingegneria della Conoscenza e Sistemi Esperti - Lezione 2: Apprendimento non supervisionato". and Hornick M.F. and Meyer G., Data mining standards initiatives, Communications of the ACM, Vol 45, No 8, 2002.

    [4] F. Hoppner, F. Klawonn, R. Kruse, T. Runkler; Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition, John Wiley & Sons, 2000.

    [5] H.Timm, C.Borgelt, C.Do¨Ring,R.Andkruse, “An Extension To Possibilistic Fuzzy Cluster Analysis”, Fuzzy sets And Systems 147, 3–16, 2004.

    [6] Chiu, S., "Fuzzy Model Identification Based on Cluster Estimation," Journal of Intelligent & Fuzzy Systems, Vol. 2, No. 3, Sept. 1994. [7] R.P. Paiva, A. Dourado, Interpretability and learning in neuro-fuzzy systems, Fuzzy Sets Syst. 147 (1) 17–38.2004.

    [8] J.C. Dunn; "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well Separated Clusters". Journ. Cybern. 3, 95-104, 1974. [9] J. M. Geusebroek, G. J. Burghouts, and A.W. M. Smeulders. "The Amsterdam library of object images", Int. J. Comput. Vision, 61(1):103–112, January 2005.

    [10] C. Cortes, V. Vapnik, “Support-VectorNetworks”, Machine Learning, Vol. 20, pp. 273-297, 1995. [11] C.J.C. Burges, "A Tutorial on Support Vector Machines for Pattern Recognition", Data Mining and Knowledge Discovery, Vol. 2, pp. 121-167, 1998. [12] B. Sch?lkopf, A.J. Smola, Learning with Kernels, MIT Press, Cambridge, MA, 2002.

     

    [13] B. Sch?lkopf et al., “Comparing SupportVector Machines with Gaussian Kernels to RadialBasis Function Classifiers”, IEEE Trans. on Signal Processing, Vol. 45, No. 11, pp. 2758-2765, Nov. 1997. [14] C.-F. Lin, S.-D. Wang, “Fuzzy Support Vector Machines”, IEEE Trans. on Neural Networks, Vol.13, No. 2, pp. 464-471, March 2002.

     

    [15] S. Abe, T. Inoue, "Fuzzy Support Vector Machines for Multiclass Problems", European Symposium on Artificial Neural Networks (ESANN'2002), pp. 113-118, Bruges, Belgium, April 2002.

     

    [16] C. Juang, Member, IEEE, S. Chiu, and S. Shiu, “Fuzzy System Learned Through Fuzzy Clustering and Support Vector Machine for Human Skin Color Segmentation,” IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 37, NO. 6, NOVEMBER 2007.

    [17] S. Chen, Senior Member, IEEE, and Y. Chang "A New Method for Weighted Fuzzy Interpolative Reasoning Based on Weights-Learning Techniques" vol. 12, no. 12, pp. 820-832, 2004.

    [18] E. I. Papageorgiou, Ath. Markinos, and Th. Gemtos, "Learning Algorithms for Fuzzy Cognitive," IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 2, MARCH 2012.

    [19] H.-P. Huang, Y.-H. Liu, "Fuzzy Support Vector Machines for Pattern Recognition and Data Mining", International Journal ofLiu, "Fuzzy Support Vector Machines for Pattern Recognition and Data Mining", International Journal of Fuzzy Systems, Vol. 4, No. 3, pp. 826-835, Sep. 2002.

     

    [20] J. Platt, “Fast training of support vector machines using sequential minimal optimization,” in Advances in Kernel Methods-Support Vector Learning, B. Sch?lkopf, C. Burges, and A. Smola, Eds. Cambridge, MA: MIT Press, 1999, pp.185–208.

    [21] Ftp.ics.uci.edu/pub.

Fuzzy clustering of data based on fuzzy logic