Word Files
Reference for Downloading Educational Files

Predicting exploitation and clustering of vulnerabilities by means of text mining

Number of pages: 105 File Format: word File Code: 31053
Year: 2011 University Degree: Master's degree Category: Computer Engineering

Tags/Keywords: Algorithm - Clustering - Computer - Computer security - Data mining algorithms - Database - Exploitation - Implementation - Operating system - Pattern recognition algorithms - Security systems - Software vulnerabilities - Support vector machine - Text mining

Part of the Content
Contents & Resources

Summary of Predicting exploitation and clustering of vulnerabilities by means of text mining

Master thesis in the field of computer-software engineering

Abstract

Software vulnerabilities can lead to financial and information losses. Due to the limited financial and human resources, prioritizing the damages is very important. Before this research, a large number of researchers have classified vulnerabilities based on empirical and statistical knowledge. However, the variable nature of vulnerabilities makes it impossible to provide a ranking criterion for them.

Vulnerability reports are continuously recorded in various databases. The textual information of vulnerabilities is not fully exploited by existing automatic tools. This research showed that the information in the texts can be used to build predictive models. Text mining is a suitable tool for obtaining information that is effective in making important management decisions.

In the field of predicting utilization using text mining, only one research has been done so far. This research was presented at KDD2010, entitled "Beyond Heuristics: Training for Vulnerability Classification and Exploitation Prediction". This research has answered the following questions using text mining: Will the vulnerability be exploited? When will the vulnerability be exploited? This paper has achieved good results compared to CVSS (which is one of the famous vulnerability metrics). In this research, the above questions and the following new questions have been answered with high accuracy:

If a system has been exploited, when did this exploitation begin? (Accuracy of answers between 94.5-84%)

If a system is vulnerable, when will its fixed package be provided by the developers? (Accuracy of answers between 68-91%)

In the field of vulnerability clustering, a lot of research has been done so far. The OSVDB database has various categories for vulnerabilities, but none of these categories are based on the description of the vulnerabilities. In this research, vulnerabilities are clustered using their descriptions, and the resulting categories are: buffer overflow, denial of service, data manipulation, remote control, improper configuration, crack in password, unauthorized access to information, and unauthorized access to service. Manually assigning vulnerabilities to appropriate categories requires human experience and is very tedious. The classification presented in this research provides the possibility of creating software that can automatically assign vulnerabilities to appropriate categories.

In this research, two famous databases of vulnerabilities (OSVDB and CVE), and information on the history of vulnerabilities provided by Stephen Frey, have been used. To predict the use of support vector machine and random forest classifiers, and to perform clustering, the emerging self-organizing mapping method has been used.

Chapter One

Introduction

1-1- Vulnerability[1]

In computer security issues, a vulnerability is a weakness that allows an attacker to exploit It provides the information of a system. Every year, thousands of vulnerabilities are discovered and reported, and millions of dollars are spent all over the world to deal with vulnerabilities. In order to take advantage of the vulnerability of a system, three factors are generally needed: sensitivity or a flaw in the system, the attacker's access to the flaw, and the ability of the attacker to exploit the flaw (1). Among these definitions, the following can be mentioned:

ISO 27005: The weakness of an asset or a group of assets that can be exploited by an individual or a group of individuals (2). In this definition, an asset means anything of value to the organization, for example information resources supported by the organization.

IETF RFC 2828: A flaw or weakness in the design, implementation, operation, or management of a system, which can be exploited to violate the system's security policy (3).

United States National Security Systems Committee[2], in CNSS Instruction No. 4009, dated April 26, 2010, National Information Assurance Glossary: ??Vulnerability is a weakness in an IS, system security procedures, internal controls, or implementation, that can lead to exploitation (4). Compromise the network, program or protocol (5).

Open group[3]: A situation where the attacker's power is greater than the power to resist it (6).

Factor analysis of information risk[4] (FAIR): The probability that an asset will not be able to withstand the risk factors (7).

Data and computer security, a dictionary of concepts and standard terms, authors Dennis Langley[5] and Michael Sheen[6], Stockton Press[7], ISBN 0-935859-17-9:

In computer security, a weakness in the security function of automated systems, supervisor controls, Internet controls, etc., which can be disrupted by an attacker with unauthorized access to information, information processing.

In computer security, a weakness in the physical layer, organization, function, staff, management, supervision, hardware or software that could be exploited with the aim of damaging the system or activity.

In computer security, any weakness or defect in a system, attack, harmful event or access opportunity for a threatening agent, which provides the threat to the agent, is called a vulnerability.

1-1-2- Classification of vulnerabilities

Vulnerabilities are divided into the following categories based on the type of asset (2):

Hardware, for example: susceptibility to Humidity, sensitivity to dust, susceptibility to unprotected storage.

Software, for example: inadequate testing, lack of follow-up.

Network, for example: unprotected communication lines, insecure network architecture.

Employees, for example: inadequate onboarding process, inadequate security awareness.

Location, for example: flood-prone area, unreliable power source.

Organization, for example: lack of regular follow-up, lack of security awareness. Continuity of programs.

1-1-3- Causes of creating vulnerabilities

Some sources and causes of creating vulnerabilities are:

System complexity: the possibility of defects and unwanted access points in large complex systems is more (8). Famously, it increases the probability that an attacker can gain access to knowledge and tools in order to exploit existing defects (9).

Connection: physical connections, privileges[8], ports, protocols and more services and increasing the duration of each of them increases the accessibility to vulnerabilities (7).

Flaws in password management: computer users use weak passwords that can be discovered with little effort or save them in some programs, and These passwords are shared by many applications and web pages (8).

Design flaws in major operating systems: Operating system designers generally choose policies that involve the least user/administrator. For example, operating systems have policies such as the defaults of granting permission to each program and full access of users to the system (8). These defects of operating systems allow viruses and malware to execute commands from the administrator (1).

Browsing Internet websites: Some Internet websites contain spyware or dangerous advertisements, which can be automatically installed on computer systems. After visiting these websites, systems are infected, personal information is collected and sent to a third party (10).

Software bugs: There are exploitable bugs in many software programs. Software bugs may allow attackers to exploit the application (8).

Uncontrolled user input: Applications assume that all user input is secure. Programs that do not check the user's inputs, in fact, provide the possibility of directly executing unwanted commands and manipulating the database (8). Although existing software can provide a good overview of system vulnerabilities in some cases, they cannot replace human investigation of vulnerabilities.
Contents & References of Predicting exploitation and clustering of vulnerabilities by means of text mining

List:

Chapter One: Introduction

1

1-1- Vulnerability

2

1-1-1- Definition of vulnerability

2

1-1-2- Classification of vulnerabilities

3

1-1-3- Causes of creation Vulnerabilities

4

1-1-4- Identification and removal of vulnerabilities

5

1-2- Required basic concepts

5

1-2-1- Text mining

5

1-2-2- Classification and prediction

8

1-2-3- Clustering

12

1-2-4- Feature selection

14

1-3- Research objective

16

Chapter two: Review of previous research

18

2-1- Role of people and different processes on vulnerabilities

19

2-2- Vulnerability assessment and classification methods

24

2-2-1- Conventional vulnerability scoring system

25

2-3- Classification of vulnerabilities

30

2-4- Security predictions using vulnerability reports

36

2-5- Detection of vulnerabilities using software source code

36

Chapter three: data and feature extraction method

39

3-1- Research data

40

3-2- Feature extraction method for classification and prediction

44

3-3- The method of extracting features for clustering

47

Chapter four: Method and results of tests

50

4-1- Method and results of classification and prediction tests

51

4-1-1- Prediction of offline usage

51

4-1-2- Online Exploitation Prediction

54

4-1-3- Time Prediction

56

4-2- Comparison of OSVDB and CVE

62

4-3- Evaluation of Features

64

4-4- Vulnerability Clustering

66

4-4-1- Analysis of categories in the OSVDB database

68

4-4-2- Presentation of vulnerability categories

78

4-4-3- Evaluation of the presented category

84

Chapter five: Discussion and conclusion

87

5-1- Prediction of exploitation Of vulnerabilities

88

5-2- Vulnerability clustering

89

Conclusion

89

Suggestions for future research

90

Resources

91

Source:

1. The Three Tenants of Cyber ??Security, U.S. Air Force Software Protection Initiative. http://www.spi.dod.mil/tenets.htm. (Last visited 2011-07-10).

2. ISO/IEC, Information technology - Security techniques - Information security risk management, ISO/IEC FIDIS 27005:2008.

3. Internet Engineering Task Force RFC 2828 Internet Security Glossary

4. CNSS Instruction No. 4009, dated April 26, 2010.

5. Risk Management Glossary Vulnerability, (Last visited 2011-08-23) http://www.enisa.europa.eu/act/rm/cr/risk-management-inventory/glossary#G52 .

6. Technical Standard Risk Taxonomy ISBN 1-931624-77-1, Document Number: C081 Published by the Open Group, January 2009.

7. An Introduction to Factor Analysis of Information Risk (FAIR), Risk Management Insight LLC, November 2006. URL: www.riskmanagementinsight.com.

8. Vacca, J.R., 2009. Computer and Information Security Handbook, Morgan Kaufmann Publications Elsevier Inc p. 393, ISBN 978-0-12-374354-1.

9. Krsul, I., 1997, Computer Vulnerability Analysis: Thesis Proposal, The COAST Laboratory Department of Computer Sciences, Purdue University.

10. The Web Application Security Consortium Project, Web Application Security Statistics (Last visited 2011-08-23),

http://projects.webappsec.org/w/page/13246989/Web-Application-Security-Statistics.

11. Han, J., AND Kamber, M., 2001. Data Mining: Concepts and Techniques. Morgan Kaufman.

12. Witten, I.H., AND Frank, E., 2000. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco.

13. Kohonen T. Self-organizing maps. Berlin, Germany: SpringerVerlag; 1995.

14.Ultsch, A., and Morchen F., 2005. ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM. Technical Report 46, CS Department, Philipps-University Marburg, Germany.

15. Duan KB, Rajapakse JC, Wang H, Azuaje F, 2005. Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobioscience 4: 228–234. URL http://view.ncbi.nlm.nih.gov/pubmed/1622?0686.

16. Frei, S., Schatzmann, D., Plattner B., AND Trammel., B., 2009. Modeling the Security Ecosystem — The Dynamics of (In)Security. In Proc. of the Workshop on the Economics of Information Security (WEIS).

17. Arora, A., Krishnan, R., Telang, R., AND Yang, Y. 2010. An Empirical Analysis of Software Vendors' Patch Release Behavior: Impact of Vulnerability Disclosure. Information Systems Research Vol. 21, No. 1,115–132.

18. Schryen, G., 2009. A Comprehensive and Comparative Analysis of the Patching Behavior of Open Source and Closed Source Software Vendors. Fifth International Conference on IT Security Incident Management and IT Forensics, 153-168.

19. Joh, H.C., AND Malaiya, YK., 2009. Seasonal variation in the vulnerability discovery process. Proc. International Conference on Software Testing Verification and Validation, 191-200.

20. United States Computer Emergency Readiness Team (US-CERT). US-CERT Vulnerability Note Field Descriptions, (last visited 2011-07-10). http://www.kb.cert.org/vuls/html/fieldhelp.

21. SANS Institute. SANS Critical Vulnerability Analysis Archive. (last visited 2011-07-10). http://www.sans.org/newsletters/cva/.

22. Microsoft Corporation. Microsoft Security Response Center Security Bulletin Severity Rating System. (last visited 2011-07-10). http://www.microsoft.com/technet/security/bulletin/rating.mspx.

23. Forum of Incident Response and Security Teams (FIRST). Common Vulnerabilities Scoring System (CVSS). http://www.first.org/cvss/ (last visited 2011-07-10).

24. Mell P., Scarfone K., and Romanosky S., 2007. The Common Vulnerability Scoring System (CVSS) and Its Applicability to Federal Agency Systems. NIST Interagency Report 7435.

25. Mell, P., Scarfone, K., Romanosky, S., 2006. Common Vulnerability Scoring System. IEEE Security and Privacy 4(6). 85-89.

26. Gallon, L., 2010, On the impact of environmental metrics on CVSS scores, IEEE International Conference on Privacy, Security, Risk and Trust, 987-992.

27. Fruhwirth, C., and Mannisto, T., 2009, Improving CVSS-based vulnerability prioritization and response with context information, Third International Symposium on Empirical Software Engineering and Measurement, 535-544.

28. Gallon, L., 2011, Vulnerability discrimination using CVSS framework, New Technologies, Mobility and Security (NTMS).

29. Joh, H.C. and Malaiya, Y.K., A framework for software security risk evaluation using the vulnerability lifecycle and CVSS metrics, 430-434.

30. Bishop M. A taxonomy of UNIX system and network vulnerabilities. Technical Report CSE-9510. Davis: Department of Computer Science, University of California; 1995.

31. Krsul IV. Software vulnerability analysis. Available from: http://www.krsul.org/ivan/articles/main.pdf; May 1998.

32. Venter HS, Eloff JHP. Harmonizing vulnerability categories. South African Computer Journal 2002;29. ISSN: 1015-7999:24–31. Computer Society of South Africa.

33. Kujawski P. Why networks must be secured. Cisco Systems, Inc.; 2003.

34. Microsoft Commerce Server 2002. The STRIDE threat model. Available from: http://msdn2.microsoft.com/en-us/library/ms954176.aspx; (last visited 2011-07-10).

35. SAINT Corporation. Available from: http://www.saintcorporation.com/; (Last visited 2011-07-10).

36. SFProtect. Available from:

http://www.winnetmag.com/Article/ArticleID/8401/8401.html; (Last visited 2011-07-10).

How To Access The File

Presenting a model to identify the influencing factors and their impact factor in the profit and loss of the third party car insurance of insurance companies by means of data mining methods, a case study of Iran Insurance Company.

Number of pages: 100 Category: Computer Engineering

Master's thesis in the field of computer - software engineering. Abstract: The review of car insurance information has shown that factors such as the type of car used, having a driver's license, the type of license and its compatibility or non-compatibility with the vehicle, the amount of the insurance premium, the amount of insurance policy obligations, the quality of the car ...

Fuzzy clustering of data based on fuzzy logic

Number of pages: 46 Category: Computer Engineering

Dissertation for Master's Degree in Computer Engineering - Artificial Intelligence Abstract Data clustering is a method for classifying similar data, which has been used for many years in various sciences and many algorithms have been designed in this field. Recent clustering research leads to hybrid methods that are more robust and accurate. Hybrid clustering tries to first ...

Dematting face images for use in a face recognition system

Number of pages: 75 Category: Electronic Engineering

The master's thesis for obtaining a master's degree is an abstract of facial recognition in the fields of biometrics, machine vision, and pattern recognition, and has a wide range of applications, including issues related to security systems. Since various factors such as ambient lighting, noise, and image opacity are more or less effective in the performance of face recognition ...

A fuzzy k-nearest neighbor data classification algorithm for privacy in cloud computing

Number of pages: 104 Category: Computer Engineering

Master's Thesis in Computer Engineering Major: Software Abstract: Cloud processing and cloud environment and cloud databases are the place to store information on the web and the best solution should be used to increase their security. Our problem here is the classification of confidential and top-secret data and then their encryption for storage in the cloud. For this, speed ...

Consensus clustering on heterogeneous distributed data

Number of pages: 120 Category: Computer Engineering

Master's Thesis in Computer Engineering - Software Orientation Abstract Clustering can be considered one of the most important steps in data analysis. Many clustering methods have been developed and presented so far. One of these methods that has been studied in recent studies is consensus clustering method. The goal of consensus clustering is to combine several initial ...

Speaker recognition in multi-speaker environment using support vector machine

Number of pages: 117 Category: Electronic Engineering

Electronics Department of Master's Thesis Abstract: Speaker identification is one of the topics discussed in speech processing. Speaker identification is the process of identifying who is really speaking and when using the speech signal. The goal is to design a system that can identify the change in the speaker and tag each speaker's speech for the system. It means to specify ...

Speaker recognition in multi-speaker environment using support vector machine

Number of pages: 118 Category: Electronic Engineering

Scheduling real-time tasks in cloud computing environment using colonial competition algorithm

Number of pages: 90 Category: Computer Engineering

Dissertation for obtaining a master's degree in computer engineering, majoring in software. Abstract: Task scheduling algorithm, which is an NP-complete problem, plays a key role in cloud computing systems. The colonial competition algorithm is one of the newest evolutionary optimization algorithms. As its name suggests, this algorithm is based on the modeling of the ...

Assessment of transient stability of power systems using data from phasor measurement units

Number of pages: 123 Category: Electronic Engineering

Master's Thesis in the field of electrical engineering- control, abstract assessment of the transient stability of power systems using the data of phasor measurement units, quick assessment of security in power networks in emergency situations and the occurrence of various errors, is a vital thing to prevent collapse and create nationwide outages. Therefore, the evaluation ...

Assessment of transient stability of power systems using data from phasor measurement units

Number of pages: 122 Category: Electrical Engineering

Master's Thesis in the field of Electrical Engineering - Control Abstract Evaluation of the transient stability of power systems using the data of phasor measurement units by the efforts of Hanieh Mohammadi Rapid evaluation of security in power networks in emergency situations and the occurrence of various errors is a vital thing to prevent collapse and create nationwide ...

Predicting exploitation and clustering of vulnerabilities by means of text mining

Summary of Predicting exploitation and clustering of vulnerabilities by means of text mining

Contents & References of Predicting exploitation and clustering of vulnerabilities by means of text mining