A fuzzy k-nearest neighbor data classification algorithm for privacy in cloud computing

Number of pages: 104 File Format: word File Code: 31031
Year: 2014 University Degree: Master's degree Category: Computer Engineering
  • Part of the Content
  • Contents & Resources
  • Summary of A fuzzy k-nearest neighbor data classification algorithm for privacy in cloud computing

    Computer Engineering Master's Thesis

    Trend: Software

    Abstract:

    Cloud processing and cloud environment and cloud databases are places for storing information on the web, and the best solution should be used to increase their security. Our problem here is the classification of confidential and top-secret data and then their encryption for storage in the cloud. For this, speed and accuracy are very important. In this research, a fuzzy data classification algorithm has been used in the cloud environment, which was finally simulated with the Java language and in Cloudsim simulator and was able to create an effective classification for data in the cloud environment. The proposed classification algorithm is the fuzzy k-nearest neighbor algorithm. Due to the good features of the fuzzy k-nearest neighbor algorithm, it was able to classify confidential, top secret, and public data more quickly and accurately and prepare it for proper encryption for cloud storage and improve the efficiency of data classification for cloud storage. They are stored in the cloud, and on the other hand, the same database is classified by the fuzzy k-nearest neighbor algorithm and sent to the next step, which is encryption and storage in the cloud, into three classes: top secret, confidential and public. In the implementation of this method, the Java language and CloudSim simulator have been used, and the results show the better performance of the fuzzy k-nearest neighbor algorithm, which produces better results than the normal k-nearest neighbor algorithm.

    Keywords:

    cloud processing, security, k-nearest neighbor classification, fuzzy k-nearest neighbor classification

    Chapter One

    Introduction

     

     

     

     

     

    1-1 Introduction

    Cloud computing emerged as one of the most famous and hottest topics in the field of information technology. Today, Internet users have access to its services through very light electronic tools, in such a case, users request their needs that may require heavy processing on demand and view the returned results regardless of the delivery of the service and how it is provided. Cloud computing is based on TCP/IP and based on the Internet[1] and includes processors with huge memories, fast data transmission network and reliable systems architecture, and without standard protocols governing the network, this technology cannot exist (Gong, Liu, Rong, Chen, Gong[2], 2010). The services of this technology are divided into 3 major categories: infrastructure as a service [3], platform as a service [4] and software as a service [5]. Cloud computing is divided into 5 layers, client, application, platform, infrastructure and servers. The extraordinary fault tolerance of this technology increases its compatibility with the network infrastructure. Its easy-to-use feature hides all the complexity of services and connects users to the data center with a simple interface. Virtualization and high security are also other features of this technology (Rikiaks, Paliz, Katasers, Mehra, Vakali[6], 2009).

    Given the great importance of the cloud processing process and its security issue, in this research we try to achieve this goal by classifying confidential data using fuzzy nearest neighbor classification algorithm. In this section, we will define the problem and introduce our idea. 1-2 Defining the problem and stating the main research questions According to the definition of the National Institute of Standards and Technology (NIST)[7] cloud computing is a model for providing easy access based on user demand, through the network to sets of changeable and configurable computing resources (such as servers, networks, storage spaces, applications and services) that this access can be with the least need for resource management or The need for direct intervention of the service provider is quickly provided or released (Kechin, Hess[8], 2010).

    Cloud computing is a structure that allows us to access applications that are located in a place other than computers or in other machines connected to the Internet. Most of the time this place is a remote data center.Cloud computing [9] promises operational and capital reductions, and more importantly, it allows IT departments to focus on strategic projects instead of maintaining data center operations. The cost of resource management is much higher than the actual cost of the resources themselves, so it is better to rent resources through the cloud from the resource owner.

    Given that cloud processing will have many advantages, aspects that still cause many organizations to retreat from this technology. How to secure data in the cloud and ensure the security of the environment. Of course, security in cloud computing can be assumed to be relative. In fact, security is the biggest weakness of cloud processing, and many solutions have been prepared in this field.

    Encryption provides data security almost to a certain extent, but with the problems it causes in data recovery, the security issue still remains as a scary word in the cloud. So far, many methods have not been prepared to ensure security, but the worrying issue is the access and manipulation of customer data by internal employees.

    When using the cloud, there are security risks, but reputable and famous companies try to maintain safety and security. Many techniques have been used for data security. Data encryption is a widely used technique. Encrypting data before sending protects it. Before implementing any data security measures in the cloud, it is better to understand the data security requirements. What data needs security and what data does not need security. In this research, using fuzzy k-nearest neighbor [10] and normal classification algorithm, we classify the data and then encrypt the data that needs security with encryption algorithms. 1-3 background and necessity of research. The normal k-nearest neighbor algorithm classifies data based on confidentiality. Data classification technique or KNN is used in the cloud environment. Data are divided into two classes, sensitive and non-sensitive. Non-sensitive data does not need security, but sensitive data is encrypted with the help of RSA algorithm [11]. Knn classification is a machine learning algorithm that has been studied in the pattern recognition method for decades [12]. (Manour, Tang Zhang, Turdin, 2013).

    KNN classification has two basic problems. First, the determination of k-value by the second application considers the closest neighbors of the test sample with the same degree of importance.

    For this reason and the problems that the normal Knn algorithm has (Killer, Guy and Jeeves [13]) used the fuzzy Knn classification algorithm [14]. Most of the works done with the fuzzy approach in the field of supervised learning, while a few of them have been done for semi-supervised learning and unsupervised learning, while labeling all the data is a difficult, costly and time-consuming task. In this research, we introduce the fuzzy nearest neighbor algorithm in the field of supervised learning. Unlike many existing classification methods and algorithms, the proposed method does not require any user-adjustable threshold. Also, the experimental results have shown that the fuzzy Knn algorithm has a higher efficiency than the compared methods.

    It should be noted that the use of the normal Knn algorithm for data classification for the security of confidential data[15] has been done in cloud computing.[16] But the fuzzy Knn algorithm has not been used in cloud computing data classification. In this research, we will try to use this algorithm to check if there will be an improvement in the security of the confidential data of cloud computing or not.

    Cloud processing reduces operational management by the organization, but the organization will still be responsible, even if the operational responsibility is performed by one or more external departments in the cloud. As a result, when using cloud technology, your chosen cloud service provider should be someone you can trust. Someone who is completely transparent and provides you with the information you need, answers all your questions and does not leave anything behind.

  • Contents & References of A fuzzy k-nearest neighbor data classification algorithm for privacy in cloud computing

    List:

    Abstract: 1

    Chapter One: Introduction. 2

    1-1 Introduction. 2

    1-2 Defining the problem and stating the main research questions. 3

    1-3 background and necessity of research. 4

    1-4 objectives 8

    1-5 aspects of research innovation: 9

    1-6 stages of research. 9

    1-7 thesis structure. 9

    Chapter Two: General topics of cloud processing, security and simulation. 10

    2-1 Introduction. 10

    2-2 A brief history of cloud computing. 11

    2-3 The current state of cloud computing. 12

    2-4 characteristics of cloud computing. 13

    2-4-1 key feature of cloud computing. 17

    2-4-2 The main advantages of cloud computing. 18

    2-4-3 possible tasks in cloud computing. 18

    2-5 cloud computing architecture. 19

    2-6 Security and challenges of cloud computing. 21

    2-7 Security in cloud computing. 22

    2-8 Weaknesses of cloud computing. 22

    2-8-1 The need for a permanent Internet connection. 22

    2-8-2 not working with low speed internet. 23

    2-8-3 Privacy. 23

    2-9 security disadvantages in cloud environments. 23

    2-9-1 Data location 24

    2-9-2 Data separation 24

    2-10 Data security 24

    2-10-1 Control and access. 25

    2-10-2 Encryption. 25

    2-11 Introduction to simulation. 26

    2-12 Some computer network simulation software. 28

    2-13 Getting to know Cloudsim tool. 29

    2-13-1 Clodsim architecture. 30

    2-14 virtual machine allocation models. 31

    2-15 classes available in Cloudsim. 32

    2-16 Summary. 35

    Chapter three: Review of the past works of encryption algorithms. 37

    3-1 Introduction. 37

    2-3 Introduction of the method. 38

    3-3 past work records. 39

    3-4 Objectives of the method. 41

    3-5 data classification 42

    3-5-1 machine learning. 42

    3-6 Definition of sensitive and non-sensitive data. 46

    3-7 classifier-Knearest neighbor. 48

    3-8 encryption with RSA method. 49

    9-3 Cryptography. 49

    3-9-1 Encryption algorithms. 50

    3-10 Arasai. 52

    3-10-1 RSA algorithm steps. 51

    3-11 Advanced Cryptography Standard. 54

    3-11-1 Description of cryptography. 55

    3-12 Summary. 56

    Chapter four: Introduction of the proposed method. 57

    4-1 Introduction. 57

    4-2 Introduction of the new fuzzy K-nearest neighbor method for data classification in cloud computing. 58

    1-4-2 Theory of fuzzy sets. 58

    4-3 Differences in the results of classification algorithms. 58

    4-4 framework used 59

    4-5 proposed method. 59

    4-5-1 Training data and test data. 61

    4-5-2 Saving in the cloud. 62

    4-5-3 working method of KNN algorithm. 62

    4-5-4 working method of F-KNN algorithm. 64

    6-4 Summary. 66

    Chapter Five: Tests and evaluation of results. 67

    5-1 Introduction. 67

    5-2 Test data location and Vajra implementation environment 68

    5-3 Comparison of the results obtained from K-nearest normal and fuzzy neighbor algorithm 72

    5-4 characteristics of the software layer as a service. 76

    5-5 Features of the Platform-as-a-Service layer for virtual management. 77

    5-6 Properties of the infrastructure-as-a-service layer in cloud simulation. 78

    5-7 identification rate. 79

    5-8 simulation results. 80

    5-9 simulation time of work steps. 81

    5-10 summary. 83

    Chapter Six: Conclusion and Suggestions 84

    6-1 Introduction. 84

    6-2 The results of the research. 84

    6-3 suggestions 85

    References: 86

    English dictionary. 89

    English abstract.93                                                

    Source:

    Pakize R, "Simulation of cloud computing". Pardis Danesh Publications, Tehran, (2013). Hosseinpour Fazalahi S., "Machine Learning", Master's Thesis, Faculty of Computer Engineering, Tabriz University, (2013). Khushkwan M., "An Introduction to Number Theory", Khushkwan Publications, Tehran, (2011). Latfis, Fakhri P., "Cloud Computing". Parse Publications, Tehran, (1392).S, Brobergh. S, and Brandic. I "Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility", Vol.2518, (2012).

    Cachin.C, and Haas.R, "Dependable Storage in the Intercloud", IBM Research Report RZ, Vol.3783, (2010). Hogben.G, "Cloud Computing: Benefits, risks and recommendations for information security", ENISA, pp 147-152, (2009).

    Chang .Y, and Liu .H, "Semi-supervised classification algorithm based on the KNN", Conf. on Communication Software and Networks (ICCSN).pp 9-12, (2011).

    Nosing around the neighborhood: A new system structure and classification rule in partially exposed environments", IEEE Trans. PAMI-2, (2008).

    ] Gong.k, Zhang.Q, and Gong. Zh computing", 39th International Conference on Parallel Processing Workshops, 2010.

     

    Gorgen.D,"Future Generation Computer Systems", Vol.25 (6), pp.599-616. (2009).

     

    Hathaway .L, "National Policy on the use of the advanced encryption standard (AES) to protect national security systems and national security information", 2003-02-15, Retrieved June. (2011).

     

    Hunt. E," Artificial Intelligence Federal Information Processing Standards Publication", New York Academic, November 26, 2001, Retired October 2, (2002).

     

    Ian. F, Yong. Z, Loan .R, Shiyong. L. "Cloud computing and grid computing 360-Degree compared". IEEExplore Austin (TX), 06 January, (2009).

     

    James.J, Keller.M, Michael.R. Gray.N, James, A and Givens.B, "A Fussy-K-Nearest Neighbor Algorithm", IEEE Transaction on Systems, Man, and Cybernetics, Vol. SMC-15, No.4, July/Aug (2008).

     

    Jensen .R, and Conelis .C, "Fuzzy-rough nearest neighbor classification and prediction", Theoretical Computer Science, Vol. 412.No. 42, pp. 5871-5884, (2011).

    Jensen.R, and Parthalain.N, "Fuzzy rough set based semi-supervised learning", Conf on Fuzzy Systems (Fuzz-IEEE11), pp.52465-2472, (2011). for dimensionality reduction", IEEE Transactions on Knowledge and Data, (2001).

     

    Keller.M, Gray .M, and Givens.J, JR.A, "A Fussy-K-Nearest Neighbor Algorithm", IEEE Trans System, Man, and Cybernetics vol.SMC-15,No.4,pp.580-585, july/aug (1985).

    Kelsey.J, Lucks.S, Schneier.B, Stay.M, Wagner.D, and Whiting.D,"Improved cryptanalysis of rijndeal fast software encryption",PP213-230, (2000).

     

     Lecture Notes in Computer Science, "Efficient Softwares implementation of AES on 32-bit Platforms", 2523, (2003).

     Li .Y, Guan .C, Li .H, and Chin .Z, "A self-training semi-supervised SVM algorithm and its application in an EEG-based brain computer interface speller system", Pattern Recognition, vol. 29, No. 9, pp. 1285-1294, (2008)

    Nurmi.D, Wolski.R, Grzegorczydk.C, Obertelli.G. Soman.S, Youseff.L, and Zagorodnov.D, "Cloud Computing and its Applications". Eucalyptus opensource cloud computing system In CCA08, (2008).

     

    Ram.C, and Sreenivaasan.G, "Security as a Service (SaaS): Securing user data by coprocessor and distributing the data", Trends in information Sciences & Computing (TISC2010), pp. 152-155, (2010).

     

    Schneier.B, Kelsey. J, Whiting. D, Wagner. D, Hall. C, Ferguson. N, Kohno.T, Stay. M "The Two Fish Team's Final Comments on AES Selection", (2000).

    Schwartz

A fuzzy k-nearest neighbor data classification algorithm for privacy in cloud computing