Word Files
Reference for Downloading Educational Files

Collective decision-making method to improve the performance of the nearest neighbor algorithm

Number of pages: 60 File Format: word File Code: 31069
Year: 2011 University Degree: Master's degree Category: Computer Engineering

Tags/Keywords: Algorithm - artificial intelligence - Computer - handle - Information technology - machine learning - Nearest neighbor algorithm

Part of the Content
Contents & Resources

Summary of Collective decision-making method to improve the performance of the nearest neighbor algorithm

Master's Thesis in Computer Engineering (Artificial Intelligence)

Abstract

The results of experiments have shown that the combination of several categories[1] can increase the efficiency of various algorithms. Many collective decision-making methods [2] have been presented, by using them, the error of different classification methods [3] has been reduced. However, such methods could not increase the efficiency of the nearest neighbor algorithm [4]. In this thesis, a collective decision-making method is presented to improve efficiency by using a weighted combination of several categories. In this method, each of these categories is a nearest neighbor category that uses only subsets of the feature set [5] of samples. Next, the algorithm assigns a weight to each of them and finally uses a weighted voting mechanism [6] to determine the output of the collective model.

Chapter One

Introduction

In today's world, the amount of digital information is increasing day by day. In this regard, in order to manage and scientifically examine this information, the need for intelligent and automatic processing of this information is felt more than ever.

One of the most important processes that is needed in information and communication technology is the automatic classification of this information. Classification is used in various issues in information technology, in issues such as information security, network intrusion detection, user classification based on personal information, image processing, and actually identifying any pattern based on samples and previous information. This process can predict the category [1] of new samples that will be added to the information set. Therefore, in artificial intelligence, special attention has been paid to the development of various intelligent and automatic classification methods.

Classification is one of the most important branches of machine learning [2]. Classification refers to the prediction of category labels [3] samples [4] without labels, based on the set of labeled training samples (which have already been categorized with the help of an expert). In fact, classification is a method whose purpose is to group objects into a number of categories or groups. In classification methods, by using the information obtained from the set of training samples, a mapping is obtained from the feature space [5] to the set of category labels, based on which, unlabeled samples are attributed to one of the categories. Based on this vector, a sample X has m features or characteristics. These features can be assigned integer, decimal or nominal values[7]. Also, this sample has a label C, which represents the hands to which sample X belongs. In some of them, using training data, a model is created based on which the feature space is divided into different parts, where each part represents a category. In such classification methods, the model is used to predict a group of unlabeled samples and the training samples are not used directly. An example of these categories is possible categories [8]. Such algorithms use statistical inference to find the best category; Unlike other categories that only specify the best class, probabilistic algorithms specify a probability for each existing category as the sample belonging to it, and the winning class is selected based on the highest probability. Probabilistic methods in machine learning are usually known as statistical algorithms. In another group of classification methods, the sample predicts the sample group based on the set of samples itself and without building a model. Such classification algorithms are called sample-base[9].

So far, different algorithms have been presented as classification. Among them, we can mention the nearest neighbors algorithm [10] [1], Bayes classification [11] [2], support vector machine [3] and neural network [12] [4].In artificial intelligence, there are different criteria that are used in different problems and sub-branches of this science. Regarding the efficiency of a category, as one of the main issues of artificial intelligence, there are various methods that have been reviewed in this section. Therefore, in different problems, different criteria may be considered to measure the efficiency of the algorithm. Also, as it is known, there is no category that can provide the best answer for all existing problems.

In the statistical evaluation of the efficiency of a category, a set that includes a certain number of educational samples with labels is used. For this purpose, a part of these samples or the whole set, as a training set[13], is provided to the classifier for training. After training, the classifier is tested by subsets of samples, as test samples. The samples in a test set, depending on the type of performance test, can be members of the training set or be different from it.

Classification rate [14] or accuracy [15] is the most widely used and the simplest measure of performance of each category. This criterion is equal to the ratio of the number of correctly classified samples to the total number of samples. Based on this definition, the classification error rate is obtained from the following relationship:

1

Accuracy values ??[16] and readability [17] are also suitable criteria for evaluating categories. which is recently used to evaluate the competition[18] between false-positive[19] and true-positive[20]. These criteria are introduced in the following.

Accuracy criterion: the probability of positive samples that have been declared positive.

2. In these criteria, the positive category is the category under investigation, and the negative category refers to other categories.

(Images and formulas are available in the main file)

One method for statistical evaluation of categories is cross-validation [5]. In this technique, to evaluate the efficiency of the batch, the samples are randomly divided into two groups that are complementary to each other. They train the system with one group and test the trained system with another group. By doing this, overfitting of the model [23] on the training data is prevented and the results obtained from the evaluation will have a higher degree of confidence. To be more sure of the results, mutual confirmation is repeated in several stages and in each stage, a different division is used for the samples. At the end, the results of all repeated experiments are averaged.

In the following, different methods of cross-matching are explained.

Random subgroup validation [24]: In this method, the samples are randomly divided into two training [25] and experimental [26] groups. Then the category is trained by means of training examples and tested using another set and the efficiency is calculated. This operation is performed several times and finally their average is presented as the performance of the category. Considering the random selection of the training and experimental sets, the most important problem of this method is the possibility of not selecting some samples as members of one of the two groups or selecting some samples more than once.

K part cross validation [27]: In the method, first, the sample sets are divided into K categories. In each step, k-1 samples of the category are considered as the training set, and the efficiency of the category system is evaluated using another category. Finally, the efficiency of the system is equal to the average efficiency in all stages. In this method, all samples are used for training and testing.

Verifying one against the others[28]: Another method is the verification of one against the others. In this method, each sample is selected once as a test sample and other samples are used for training. This method is performed on all samples. In the end, the efficiency of the algorithm is equal to the ratio of the number of correctly classified samples to the total.
Contents & References of Collective decision-making method to improve the performance of the nearest neighbor algorithm

List:

Chapter 1 1

Introduction 1

1-1- Introduction. 2

1-2- Classification methods. 3

1-3- category evaluation. 4

1-4- Mutual acknowledgment. 6

1-5- Nearest neighbor algorithm. 7

1-7- Chapters 9

Chapter 2 10

Algorithm of the nearest neighbor and existing methods to improve it. 10

2-1-nearest neighbor algorithm. 11

2-2- Limitations of the nearest neighbor method. 14

2-3- An overview of solutions presented in the past to improve the nearest neighbor algorithm. 15

Chapter 18

Collective decision-making methods. 18

3-1- Introduction. 19

3-2- Different methods to create a collective decision maker. 21

3-3- Different structures in collective decision-making method. 22

3-4- Voting between categories 23

3-5- Introduction of some widely used collective decision-making methods. 24

Chapter 4 28

Proposed method for grouping the nearest neighbor algorithm. 28

4-1- Introduction. 29

4-2- Main idea. 30

4-3- Grouping the set of nearest neighbor weighted groups. 31

Chapter 5 39

Results of implementation tests and conclusions. 39

5-1- Results. 40

Sixth chapter 45

Conclusion 45

List of sources. 48

Abstract 1

Source:

Nearest Neighbor Pattern Classification, T. Cover, and P. Hart, , IEEE Transactions on Information Theory, 1967, 13(1): 21-27.

Full Bayesian network Classifiers. Zhang, Jiang Su and Harry. ACM, 2006, International Conference on Machine Learning, Vol. 148, pp. 897-904.

A Simple Decomposition Method for Support Vector Machines. Chih-Wei Hsu and Chih-Jen Lin. ACM, 2002, Machine Learning, Vol. 46.

Neural networks for classification: a survey. Zhang, Guoqiang Peter. 2000, IEEE Transactions on Systems, Man, and Cybernetics, Part C, pp. 451-462.

Kohavi, Ron. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. 1995. pp. 1137-1145.

Locally adaptive metric nearest neighbor classification, C. Domeniconi, J. Peng, D. Gunopulos, IEEE Transaction on Pattern Analysis and Machine Intelligence 24 (2002) 1281–1285.

Improving nearest neighbor rule with a simple adaptive distance measure, J. Wang, P. Neskovic, L.N. Cooper, Pattern Recognition Letters 28 (2007) 207–213.

Flexible Metric Nearest-neighbor Classification, J. Friedman, Technical Report 113, Department of Statistics, Stanford University, 1994.

A method of learning weighted similarity function to improve the performance of nearest neighbor, M. Zolghadri Jahromi, E. Parvinnia, R. John, Information Sciences 179 (2009)

Neighborhood rough set based heterogeneous feature subset selection, Q. Hu, D. Yu, J. Liu, C. Wu, Information Sciences 178 (2008) 3577–3594.

A new approach to attribute reduction of consistent and inconsistent covering decision systems with covering rough sets, G. Chen, C.Z. Wang, Q.H. Hu Information Sciences 177 (2007) 3500–3518.

A novel feature selection approach: combining feature wrappers and filters, O. Uncu, I.B. Turks, Information Sciences 177 (2007) 449–466.

A cost sensitive learning algorithm for intrusion detection, S. Ghodratnama, M. R. Moosavi, M. Taheri, and M. Zolghadri Jahromi, Proceedings of ICEE 2010, art. no. 5507006: 559-565.

Bagging predictors, L.Brreima, Machine Learning 24:123-140, 1996.

Experiments with a new boosting algorithm. Y. Freund, and R. Schapire. Thirteenth International Conference on Machine Learning, 1996.

Active Learning for kNN based on Bagging Features, Shuo, S., Yuhai, L., Yuehua, H., Shihua, Z., and Yong, L., Fourth International Conference on Natural Computation ICNC 2008 7, art. no. 4667945: 61-64.

Boosting k-nearest neighbor classifier by means of input space projection, Garc?a-Pedrajas, and Ortiz-Boyer, Expert Systems with

Boosting k-nearest neighbor classifier by means of input space projection, Garc?a-Pedrajas, and Ortiz-Boyer, Expert Systems with Applications, 2009 36(7): 10570-10582.

Nearest Neighbor Ensemble, C. Domeniconi, and B. Yan, in Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, August 23-26, 2004.

A Proposed Method for Learning Rule Weights in Fuzzy Rule Based Classification Systems, M. Zolghadri Jahromi, and M. Taheri, Fuzzy Sets and Systems, 2008, 159 (4): 449-459.

A proposed method for learning rule weights in fuzzy rule based classification systems, M. Zolghadri Jahromi and M. Taheri, Fuzzy Sets and Systems 159, 449–459, 2008.

A method of learning weighted similarity function to improve the performance of nearest neighbor, M. Zolghadri Jahromi and E. Parvinnia and R John, Information Sciences 179, 2964–2973, 2009.

A cost sensitive learning algorithm for intrusion detection, S. Ghodratnama and M. R. Moosavi and M. Taheri and M. Zolghadri Jahromi, Proceedings of ICEE 2010, May 11-13, 2010.

A Novel Piecewise Linear Clustering Technique Based on Hyper Plane Adjustment, M. Taheri and E. Chitsaz and S. D. Katebi and M. Zolghadri Jahromi, Communications in Computer and Information Science, 2009, Volume 6, Part 1. 1-8.

UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/databases/

A Novel Prototype Reduction Method for the K-Nearest Neighbor Algorithm with K ? 1,T. Yang, L. Cao, and C. Zhang, Lecture Notes in Computer Science, 2010, vol. 6119: 89-100.

http://www.cs.waikato.ac.

How To Access The File

A fuzzy k-nearest neighbor data classification algorithm for privacy in cloud computing

Number of pages: 104 Category: Computer Engineering

Master's Thesis in Computer Engineering Major: Software Abstract: Cloud processing and cloud environment and cloud databases are the place to store information on the web and the best solution should be used to increase their security. Our problem here is the classification of confidential and top-secret data and then their encryption for storage in the cloud. For this, speed ...

Implementation of Ant Colony Optimization Algorithm (ACO) in locating temporary accommodation shelters after earthquake (Kerman case study)

Number of pages: 129 Category: Geography - Urban Planning

Dissertation for M.Sc Master's Degree in Remote Sensing and Geographical Information System - Water and Soil Resources Abstract One of the important issues in crisis management of unexpected natural events, especially earthquakes, is optimal location in order to accommodate citizens during or after an accident. One of the major problems of the organizations involved in urban ...

Fuzzy clustering of data based on fuzzy logic

Number of pages: 46 Category: Computer Engineering

Dissertation for Master's Degree in Computer Engineering - Artificial Intelligence Abstract Data clustering is a method for classifying similar data, which has been used for many years in various sciences and many algorithms have been designed in this field. Recent clustering research leads to hybrid methods that are more robust and accurate. Hybrid clustering tries to first ...

Using an improved colonial competition algorithm for image segmentation

Number of pages: 89 Category: Computer Engineering

Dissertation for Master's Degree in Computer Engineering - Artificial Intelligence Abstract Image segmentation is a basic process in many applications of image processing and machine vision, which can be considered as the first low-level processing step in digital image processing. Image segmentation has various applications such as medical image processing, face recognition, ...

Increasing the accuracy of face recognition by selecting the optimal subset of face features using the cuckoo algorithm

Number of pages: 75 Category: Computer Engineering

Dissertation for Master's Degree in Computer Science Abstract Nowadays, in many fields, we need devices that can identify people's computers and recognize them based on their body characteristics. Face recognition system as a biometric system is basically a pattern recognition system that recognizes a person based on the vector of specific physiological characteristics or ...

Stock market pattern prediction using perceptron multilayer artificial neural networks

Number of pages: 81 Category: IT Information Technology Engineering

Master's Thesis of Information Technology Engineering, Information Systems Management, Abstract In today's world, due to the change in lifestyle, people are looking for a way to improve and improve their economic situation, one of the most important ways to improve their financial situation is to increase their income. One of the easiest ways is investment, which has different ...

Presenting an efficient model based on the subcombinations extracted from the feature to recognize human physical activities

Number of pages: 140 Category: Computer Engineering

Doctoral thesis in the field of computer engineering (artificial intelligence) Abstract Understanding and extracting information from images and videos is the common thread of the majority of problems related to machine vision. Finding the main and useful parts of a movie and modeling the actions between these parts is one of the main goals of movie analysis. In the last decade, ...

Providing a framework to improve the prediction of the traffic situation

Number of pages: 92 Category: Computer Engineering

Master's thesis in the field of computer engineering (artificial intelligence) Today, the success of intelligent transportation systems depends not only on the information of the current traffic situation, but also to a large extent on the knowledge of the traffic situation in the next few minutes. Therefore, a lot of research has been done in the field of short-term prediction ...

Presenting a model to identify the influencing factors and their impact factor in the profit and loss of the third party car insurance of insurance companies by means of data mining methods, a case study of Iran Insurance Company.

Number of pages: 100 Category: Computer Engineering

Master's thesis in the field of computer - software engineering. Abstract: The review of car insurance information has shown that factors such as the type of car used, having a driver's license, the type of license and its compatibility or non-compatibility with the vehicle, the amount of the insurance premium, the amount of insurance policy obligations, the quality of the car ...

Presenting a feature-based model to analyze the sentiment in texts

Number of pages: 74 Category: Computer Engineering

Master's Thesis in Computer Engineering (Software) First Chapter Preface 1-1- Introduction Some authors define data mining as a tool to search for useful information in a large amount of data. To perform the data mining process, we encounter various research fields, such as database, machine learning and statistics. Databases are essential for analyzing large amounts of data. ...

Collective decision-making method to improve the performance of the nearest neighbor algorithm

Summary of Collective decision-making method to improve the performance of the nearest neighbor algorithm

Contents & References of Collective decision-making method to improve the performance of the nearest neighbor algorithm