Collective decision-making method to improve the performance of the nearest neighbor algorithm

Number of pages: 60 File Format: word File Code: 31069
Year: 2011 University Degree: Master's degree Category: Computer Engineering
  • Part of the Content
  • Contents & Resources
  • Summary of Collective decision-making method to improve the performance of the nearest neighbor algorithm

    Master's Thesis in Computer Engineering (Artificial Intelligence)

    Abstract

    The results of experiments have shown that the combination of several categories[1] can increase the efficiency of various algorithms. Many collective decision-making methods [2] have been presented, by using them, the error of different classification methods [3] has been reduced. However, such methods could not increase the efficiency of the nearest neighbor algorithm [4]. In this thesis, a collective decision-making method is presented to improve efficiency by using a weighted combination of several categories. In this method, each of these categories is a nearest neighbor category that uses only subsets of the feature set [5] of samples. Next, the algorithm assigns a weight to each of them and finally uses a weighted voting mechanism [6] to determine the output of the collective model.

    Chapter One

    Introduction

    In today's world, the amount of digital information is increasing day by day. In this regard, in order to manage and scientifically examine this information, the need for intelligent and automatic processing of this information is felt more than ever.

    One of the most important processes that is needed in information and communication technology is the automatic classification of this information. Classification is used in various issues in information technology, in issues such as information security, network intrusion detection, user classification based on personal information, image processing, and actually identifying any pattern based on samples and previous information. This process can predict the category [1] of new samples that will be added to the information set. Therefore, in artificial intelligence, special attention has been paid to the development of various intelligent and automatic classification methods.

    Classification is one of the most important branches of machine learning [2]. Classification refers to the prediction of category labels [3] samples [4] without labels, based on the set of labeled training samples (which have already been categorized with the help of an expert). In fact, classification is a method whose purpose is to group objects into a number of categories or groups. In classification methods, by using the information obtained from the set of training samples, a mapping is obtained from the feature space [5] to the set of category labels, based on which, unlabeled samples are attributed to one of the categories. Based on this vector, a sample X has m features or characteristics. These features can be assigned integer, decimal or nominal values[7]. Also, this sample has a label C, which represents the hands to which sample X belongs. In some of them, using training data, a model is created based on which the feature space is divided into different parts, where each part represents a category. In such classification methods, the model is used to predict a group of unlabeled samples and the training samples are not used directly. An example of these categories is possible categories [8]. Such algorithms use statistical inference to find the best category; Unlike other categories that only specify the best class, probabilistic algorithms specify a probability for each existing category as the sample belonging to it, and the winning class is selected based on the highest probability. Probabilistic methods in machine learning are usually known as statistical algorithms. In another group of classification methods, the sample predicts the sample group based on the set of samples itself and without building a model. Such classification algorithms are called sample-base[9].

    So far, different algorithms have been presented as classification. Among them, we can mention the nearest neighbors algorithm [10] [1], Bayes classification [11] [2], support vector machine [3] and neural network [12] [4].In artificial intelligence, there are different criteria that are used in different problems and sub-branches of this science. Regarding the efficiency of a category, as one of the main issues of artificial intelligence, there are various methods that have been reviewed in this section. Therefore, in different problems, different criteria may be considered to measure the efficiency of the algorithm. Also, as it is known, there is no category that can provide the best answer for all existing problems.

    In the statistical evaluation of the efficiency of a category, a set that includes a certain number of educational samples with labels is used. For this purpose, a part of these samples or the whole set, as a training set[13], is provided to the classifier for training. After training, the classifier is tested by subsets of samples, as test samples. The samples in a test set, depending on the type of performance test, can be members of the training set or be different from it.

    Classification rate [14] or accuracy [15] is the most widely used and the simplest measure of performance of each category. This criterion is equal to the ratio of the number of correctly classified samples to the total number of samples. Based on this definition, the classification error rate is obtained from the following relationship:

    1

    Accuracy values ??[16] and readability [17] are also suitable criteria for evaluating categories. which is recently used to evaluate the competition[18] between false-positive[19] and true-positive[20]. These criteria are introduced in the following.

    Accuracy criterion: the probability of positive samples that have been declared positive.

    2. In these criteria, the positive category is the category under investigation, and the negative category refers to other categories.

    (Images and formulas are available in the main file)

    One method for statistical evaluation of categories is cross-validation [5]. In this technique, to evaluate the efficiency of the batch, the samples are randomly divided into two groups that are complementary to each other. They train the system with one group and test the trained system with another group. By doing this, overfitting of the model [23] on the training data is prevented and the results obtained from the evaluation will have a higher degree of confidence. To be more sure of the results, mutual confirmation is repeated in several stages and in each stage, a different division is used for the samples. At the end, the results of all repeated experiments are averaged.

    In the following, different methods of cross-matching are explained.

    Random subgroup validation [24]: In this method, the samples are randomly divided into two training [25] and experimental [26] groups. Then the category is trained by means of training examples and tested using another set and the efficiency is calculated. This operation is performed several times and finally their average is presented as the performance of the category. Considering the random selection of the training and experimental sets, the most important problem of this method is the possibility of not selecting some samples as members of one of the two groups or selecting some samples more than once.

    K part cross validation [27]: In the method, first, the sample sets are divided into K categories. In each step, k-1 samples of the category are considered as the training set, and the efficiency of the category system is evaluated using another category. Finally, the efficiency of the system is equal to the average efficiency in all stages.  In this method, all samples are used for training and testing.

    Verifying one against the others[28]: Another method is the verification of one against the others. In this method, each sample is selected once as a test sample and other samples are used for training. This method is performed on all samples. In the end, the efficiency of the algorithm is equal to the ratio of the number of correctly classified samples to the total.

  • Contents & References of Collective decision-making method to improve the performance of the nearest neighbor algorithm

    List:

    Chapter 1 1

    Introduction 1

    1-1- Introduction. 2

    1-2- Classification methods. 3

    1-3- category evaluation. 4

    1-4- Mutual acknowledgment. 6

    1-5- Nearest neighbor algorithm. 7

    1-7- Chapters 9

    Chapter 2 10

    Algorithm of the nearest neighbor and existing methods to improve it. 10

    2-1-nearest neighbor algorithm. 11

    2-2- Limitations of the nearest neighbor method. 14

    2-3- An overview of solutions presented in the past to improve the nearest neighbor algorithm. 15

    Chapter 18

    Collective decision-making methods. 18

    3-1- Introduction. 19

    3-2- Different methods to create a collective decision maker. 21

    3-3- Different structures in collective decision-making method. 22

    3-4- Voting between categories 23

    3-5- Introduction of some widely used collective decision-making methods. 24

    Chapter 4 28

    Proposed method for grouping the nearest neighbor algorithm. 28

    4-1- Introduction. 29

    4-2- Main idea. 30

    4-3- Grouping the set of nearest neighbor weighted groups. 31

    Chapter 5 39

    Results of implementation tests and conclusions. 39

    5-1- Results. 40

    Sixth chapter 45

    Conclusion 45

    List of sources. 48

    Abstract 1

     

    Source:

    Nearest Neighbor Pattern Classification, T. Cover, and P. Hart, , IEEE Transactions on Information Theory, 1967, 13(1): 21-27.

    Full Bayesian network Classifiers. Zhang, Jiang Su and Harry. ACM, 2006, International Conference on Machine Learning, Vol. 148, pp. 897-904.

    A Simple Decomposition Method for Support Vector Machines. Chih-Wei Hsu and Chih-Jen Lin.  ACM, 2002, Machine Learning, Vol. 46.

    Neural networks for classification: a survey. Zhang, Guoqiang Peter. 2000, IEEE Transactions on Systems, Man, and Cybernetics, Part C, pp. 451-462.

    Kohavi, Ron. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. 1995. pp. 1137-1145.

    Locally adaptive metric nearest neighbor classification, C. Domeniconi, J. Peng, D. Gunopulos, IEEE Transaction on Pattern Analysis and Machine Intelligence 24 (2002) 1281–1285.

    Improving nearest neighbor rule with a simple adaptive distance measure, J. Wang, P. Neskovic, L.N. Cooper, Pattern Recognition Letters 28 (2007) 207–213.

    Flexible Metric Nearest-neighbor Classification, J. Friedman, Technical Report 113, Department of Statistics, Stanford University, 1994.

    A method of learning weighted similarity function to improve the performance of nearest neighbor, M. Zolghadri Jahromi, E. Parvinnia, R. John, Information Sciences 179 (2009)

    Neighborhood rough set based heterogeneous feature subset selection, Q. Hu, D. Yu, J. Liu, C. Wu, Information Sciences 178 (2008) 3577–3594.

    A new approach to attribute reduction of consistent and inconsistent covering decision systems with covering rough sets, G. Chen, C.Z. Wang, Q.H. Hu Information Sciences 177 (2007) 3500–3518.

    A novel feature selection approach: combining feature wrappers and filters, O. Uncu, I.B. Turks, Information Sciences 177 (2007) 449–466.

    A cost sensitive learning algorithm for intrusion detection, S. Ghodratnama, M. R. Moosavi, M. Taheri, and M. Zolghadri Jahromi, Proceedings of ICEE 2010, art. no. 5507006: 559-565.

    Bagging predictors, L.Brreima, Machine Learning 24:123-140, 1996.

    Experiments with a new boosting algorithm. Y. Freund, and R. Schapire. Thirteenth International Conference on Machine Learning, 1996.

    Active Learning for kNN based on Bagging Features, Shuo, S., Yuhai, L., Yuehua, H., Shihua, Z., and Yong, L., Fourth International Conference on Natural Computation ICNC 2008 7, art. no. 4667945: 61-64.

    Boosting k-nearest neighbor classifier by means of input space projection, Garc?a-Pedrajas, and Ortiz-Boyer, Expert Systems with

    Boosting k-nearest neighbor classifier by means of input space projection, Garc?a-Pedrajas, and Ortiz-Boyer, Expert Systems with Applications, 2009 36(7): 10570-10582.

    Nearest Neighbor Ensemble, C. Domeniconi, and B. Yan, in Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, August 23-26, 2004.

    A Proposed Method for Learning Rule Weights in Fuzzy Rule Based Classification Systems, M. Zolghadri Jahromi, and M. Taheri, Fuzzy Sets and Systems, 2008, 159 (4): 449-459.

    A proposed method for learning rule weights in fuzzy rule based classification systems, M. Zolghadri Jahromi and M. Taheri, Fuzzy Sets and Systems 159, 449–459, 2008.

    A method of learning weighted similarity function to improve the performance of nearest neighbor, M. Zolghadri Jahromi and E. Parvinnia and R John, Information Sciences 179, 2964–2973, 2009.

    A cost sensitive learning algorithm for intrusion detection, S. Ghodratnama and M. R. Moosavi and M. Taheri and M. Zolghadri Jahromi, Proceedings of ICEE 2010, May 11-13, 2010.

    A Novel Piecewise Linear Clustering Technique Based on Hyper Plane Adjustment, M. Taheri and E. Chitsaz and S. D. Katebi and M. Zolghadri Jahromi, Communications in Computer and Information Science, 2009, Volume 6, Part 1. 1-8.

    UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/databases/

    A Novel Prototype Reduction Method for the K-Nearest Neighbor Algorithm with K ? 1,T. Yang, L. Cao, and C. Zhang, Lecture Notes in Computer Science, 2010, vol. 6119: 89-100.

    http://www.cs.waikato.ac.

Collective decision-making method to improve the performance of the nearest neighbor algorithm