Predicting exploitation and clustering of vulnerabilities by means of text mining

Number of pages: 105 File Format: word File Code: 31053
Year: 2011 University Degree: Master's degree Category: Computer Engineering
  • Part of the Content
  • Contents & Resources
  • Summary of Predicting exploitation and clustering of vulnerabilities by means of text mining

    Master thesis in the field of computer-software engineering

    Abstract

    Software vulnerabilities can lead to financial and information losses. Due to the limited financial and human resources, prioritizing the damages is very important. Before this research, a large number of researchers have classified vulnerabilities based on empirical and statistical knowledge. However, the variable nature of vulnerabilities makes it impossible to provide a ranking criterion for them.

    Vulnerability reports are continuously recorded in various databases. The textual information of vulnerabilities is not fully exploited by existing automatic tools. This research showed that the information in the texts can be used to build predictive models. Text mining is a suitable tool for obtaining information that is effective in making important management decisions.

    In the field of predicting utilization using text mining, only one research has been done so far. This research was presented at KDD2010, entitled "Beyond Heuristics: Training for Vulnerability Classification and Exploitation Prediction". This research has answered the following questions using text mining: Will the vulnerability be exploited? When will the vulnerability be exploited? This paper has achieved good results compared to CVSS (which is one of the famous vulnerability metrics). In this research, the above questions and the following new questions have been answered with high accuracy:

               If a system has been exploited, when did this exploitation begin? (Accuracy of answers between 94.5-84%)

               If a system is vulnerable, when will its fixed package be provided by the developers? (Accuracy of answers between 68-91%)

    In the field of vulnerability clustering, a lot of research has been done so far. The OSVDB database has various categories for vulnerabilities, but none of these categories are based on the description of the vulnerabilities. In this research, vulnerabilities are clustered using their descriptions, and the resulting categories are: buffer overflow, denial of service, data manipulation, remote control, improper configuration, crack in password, unauthorized access to information, and unauthorized access to service. Manually assigning vulnerabilities to appropriate categories requires human experience and is very tedious. The classification presented in this research provides the possibility of creating software that can automatically assign vulnerabilities to appropriate categories.

    In this research, two famous databases of vulnerabilities (OSVDB and CVE), and information on the history of vulnerabilities provided by Stephen Frey, have been used. To predict the use of support vector machine and random forest classifiers, and to perform clustering, the emerging self-organizing mapping method has been used.

    Chapter One

    Introduction

    1-1- Vulnerability[1]

    In computer security issues, a vulnerability is a weakness that allows an attacker to exploit It provides the information of a system. Every year, thousands of vulnerabilities are discovered and reported, and millions of dollars are spent all over the world to deal with vulnerabilities. In order to take advantage of the vulnerability of a system, three factors are generally needed: sensitivity or a flaw in the system, the attacker's access to the flaw, and the ability of the attacker to exploit the flaw (1). Among these definitions, the following can be mentioned:

    ISO 27005: The weakness of an asset or a group of assets that can be exploited by an individual or a group of individuals (2). In this definition, an asset means anything of value to the organization, for example information resources supported by the organization.

    IETF RFC 2828: A flaw or weakness in the design, implementation, operation, or management of a system, which can be exploited to violate the system's security policy (3).

    United States National Security Systems Committee[2], in CNSS Instruction No. 4009, dated April 26, 2010, National Information Assurance Glossary: ??Vulnerability is a weakness in an IS, system security procedures, internal controls, or implementation, that can lead to exploitation (4). Compromise the network, program or protocol (5).

    Open group[3]: A situation where the attacker's power is greater than the power to resist it (6).

    Factor analysis of information risk[4] (FAIR): The probability that an asset will not be able to withstand the risk factors (7).

    Data and computer security, a dictionary of concepts and standard terms, authors Dennis Langley[5] and Michael Sheen[6], Stockton Press[7], ISBN 0-935859-17-9:

    In computer security, a weakness in the security function of automated systems, supervisor controls, Internet controls, etc., which can be disrupted by an attacker with unauthorized access to information, information processing.

    In computer security, a weakness in the physical layer, organization, function, staff, management, supervision, hardware or software that could be exploited with the aim of damaging the system or activity.

    In computer security, any weakness or defect in a system, attack, harmful event or access opportunity for a threatening agent, which provides the threat to the agent, is called a vulnerability.

    1-1-2- Classification of vulnerabilities

    Vulnerabilities are divided into the following categories based on the type of asset (2):

    Hardware, for example: susceptibility to Humidity, sensitivity to dust, susceptibility to unprotected storage.

    Software, for example: inadequate testing, lack of follow-up.

    Network, for example: unprotected communication lines, insecure network architecture.

    Employees, for example: inadequate onboarding process, inadequate security awareness.

    Location, for example: flood-prone area, unreliable power source.

    Organization, for example: lack of regular follow-up, lack of security awareness. Continuity of programs.

    1-1-3- Causes of creating vulnerabilities

     

    Some sources and causes of creating vulnerabilities are:

    System complexity: the possibility of defects and unwanted access points in large complex systems is more (8). Famously, it increases the probability that an attacker can gain access to knowledge and tools in order to exploit existing defects (9).

    Connection: physical connections, privileges[8], ports, protocols and more services and increasing the duration of each of them increases the accessibility to vulnerabilities (7).

    Flaws in password management: computer users use weak passwords that can be discovered with little effort or save them in some programs, and These passwords are shared by many applications and web pages (8).

    Design flaws in major operating systems: Operating system designers generally choose policies that involve the least user/administrator. For example, operating systems have policies such as the defaults of granting permission to each program and full access of users to the system (8). These defects of operating systems allow viruses and malware to execute commands from the administrator (1).

    Browsing Internet websites: Some Internet websites contain spyware or dangerous advertisements, which can be automatically installed on computer systems. After visiting these websites, systems are infected, personal information is collected and sent to a third party (10).

    Software bugs: There are exploitable bugs in many software programs. Software bugs may allow attackers to exploit the application (8).

    Uncontrolled user input: Applications assume that all user input is secure. Programs that do not check the user's inputs, in fact, provide the possibility of directly executing unwanted commands and manipulating the database (8). Although existing software can provide a good overview of system vulnerabilities in some cases, they cannot replace human investigation of vulnerabilities.

  • Contents & References of Predicting exploitation and clustering of vulnerabilities by means of text mining

    List:

    Chapter One: Introduction

    1

    1-1- Vulnerability

    2

    1-1-1- Definition of vulnerability

    2

    1-1-2- Classification of vulnerabilities

    3

    1-1-3- Causes of creation Vulnerabilities

    4

    1-1-4- Identification and removal of vulnerabilities

    5

    1-2- Required basic concepts

    5

    1-2-1- Text mining

    5

    1-2-2- Classification and prediction

    8

    1-2-3- Clustering

    12

    1-2-4- Feature selection

    14

    1-3- Research objective

    16

    Chapter two: Review of previous research

    18

    2-1- Role of people and different processes on vulnerabilities

    19

    2-2- Vulnerability assessment and classification methods

    24

    2-2-1- Conventional vulnerability scoring system

    25

    2-3- Classification of vulnerabilities

    30

    2-4- Security predictions using vulnerability reports

    36

    2-5- Detection of vulnerabilities using software source code

    36

    Chapter three: data and feature extraction method

    39

    3-1- Research data

    40

    3-2- Feature extraction method for classification and prediction

    44

    3-3- The method of extracting features for clustering

    47

    Chapter four: Method and results of tests

    50

    4-1- Method and results of classification and prediction tests

    51

    4-1-1- Prediction of offline usage

    51

    4-1-2- Online Exploitation Prediction

    54

    4-1-3- Time Prediction

    56

    4-2- Comparison of OSVDB and CVE

    62

    4-3- Evaluation of Features

    64

    4-4- Vulnerability Clustering

    66

    4-4-1- Analysis of categories in the OSVDB database

    68

    4-4-2- Presentation of vulnerability categories

    78

    4-4-3- Evaluation of the presented category

    84

    Chapter five: Discussion and conclusion

    87

    5-1- Prediction of exploitation Of vulnerabilities

    88

    5-2- Vulnerability clustering

    89

    Conclusion

    89

    Suggestions for future research

    90

    Resources

    91

    Source:

    1. The Three Tenants of Cyber ??Security, U.S. Air Force Software Protection Initiative. http://www.spi.dod.mil/tenets.htm. (Last visited 2011-07-10).

    2. ISO/IEC, Information technology - Security techniques - Information security risk management, ISO/IEC FIDIS 27005:2008.

    3. Internet Engineering Task Force RFC 2828 Internet Security Glossary

    4. CNSS Instruction No. 4009, dated April 26, 2010.

    5. Risk Management Glossary Vulnerability, (Last visited 2011-08-23)      http://www.enisa.europa.eu/act/rm/cr/risk-management-inventory/glossary#G52 .

    6. Technical Standard Risk Taxonomy ISBN 1-931624-77-1, Document Number: C081 Published by the Open Group, January 2009.

    7. An Introduction to Factor Analysis of Information Risk (FAIR), Risk Management Insight LLC, November 2006. URL: www.riskmanagementinsight.com. 

    8. Vacca, J.R., 2009. Computer and Information Security Handbook, Morgan Kaufmann Publications Elsevier Inc p. 393, ISBN 978-0-12-374354-1.

    9. Krsul, I., 1997, Computer Vulnerability Analysis: Thesis Proposal, The COAST Laboratory Department of Computer Sciences, Purdue University.

    10. The Web Application Security Consortium Project, Web Application Security Statistics (Last visited 2011-08-23),

    http://projects.webappsec.org/w/page/13246989/Web-Application-Security-Statistics.

    11. Han, J., AND Kamber, M., 2001. Data Mining: Concepts and Techniques. Morgan Kaufman.

    12. Witten, I.H., AND Frank, E., 2000. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco.

    13. Kohonen T. Self-organizing maps. Berlin, Germany: SpringerVerlag; 1995.

    14.Ultsch, A., and Morchen F., 2005. ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM. Technical Report 46, CS Department, Philipps-University Marburg, Germany.

    15. Duan KB, Rajapakse JC, Wang H, Azuaje F, 2005. Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobioscience 4: 228–234. URL http://view.ncbi.nlm.nih.gov/pubmed/1622?0686.  

    16. Frei, S., Schatzmann, D., Plattner B., AND Trammel., B., 2009. Modeling the Security Ecosystem — The Dynamics of (In)Security. In Proc. of the Workshop on the Economics of Information Security (WEIS).

    17. Arora, A., Krishnan, R., Telang, R., AND Yang, Y. 2010. An Empirical Analysis of Software Vendors' Patch Release Behavior: Impact of Vulnerability Disclosure. Information Systems Research Vol. 21, No. 1,115–132.

    18. Schryen, G., 2009. A Comprehensive and Comparative Analysis of the Patching Behavior of Open Source and Closed Source Software Vendors. Fifth International Conference on IT Security Incident Management and IT Forensics, 153-168.

    19. Joh, H.C., AND Malaiya, YK., 2009. Seasonal variation in the vulnerability discovery process. Proc. International Conference on Software Testing Verification and Validation, 191-200.

    20. United States Computer Emergency Readiness Team (US-CERT). US-CERT Vulnerability Note Field Descriptions, (last visited 2011-07-10). http://www.kb.cert.org/vuls/html/fieldhelp.

    21. SANS Institute. SANS Critical Vulnerability Analysis Archive. (last visited 2011-07-10). http://www.sans.org/newsletters/cva/.

    22. Microsoft Corporation. Microsoft Security Response Center Security Bulletin Severity Rating System. (last visited 2011-07-10). http://www.microsoft.com/technet/security/bulletin/rating.mspx.

    23. Forum of Incident Response and Security Teams (FIRST). Common Vulnerabilities Scoring System (CVSS). http://www.first.org/cvss/ (last visited 2011-07-10).

    24. Mell P., Scarfone K., and Romanosky S., 2007. The Common Vulnerability Scoring System (CVSS) and Its Applicability to Federal Agency Systems. NIST Interagency Report 7435.

    25. Mell, P., Scarfone, K., Romanosky, S., 2006. Common Vulnerability Scoring System. IEEE Security and Privacy 4(6). 85-89.

    26. Gallon, L., 2010, On the impact of environmental metrics on CVSS scores, IEEE International Conference on Privacy, Security, Risk and Trust, 987-992.

    27. Fruhwirth, C., and Mannisto, T., 2009, Improving CVSS-based vulnerability prioritization and response with context information, Third International Symposium on Empirical Software Engineering and Measurement, 535-544.

    28. Gallon, L., 2011, Vulnerability discrimination using CVSS framework, New Technologies, Mobility and Security (NTMS).

    29. Joh, H.C. and Malaiya, Y.K., A framework for software security risk evaluation using the vulnerability lifecycle and CVSS metrics, 430-434.

    30. Bishop M. A taxonomy of UNIX system and network vulnerabilities. Technical Report CSE-9510. Davis: Department of Computer Science, University of California; 1995.

    31. Krsul IV. Software vulnerability analysis. Available from: http://www.krsul.org/ivan/articles/main.pdf; May 1998.

    32. Venter HS, Eloff JHP. Harmonizing vulnerability categories. South African Computer Journal 2002;29. ISSN: 1015-7999:24–31. Computer Society of South Africa.

    33. Kujawski P. Why networks must be secured. Cisco Systems, Inc.; 2003.

    34. Microsoft Commerce Server 2002. The STRIDE threat model. Available from: http://msdn2.microsoft.com/en-us/library/ms954176.aspx; (last visited 2011-07-10).

    35. SAINT Corporation. Available from: http://www.saintcorporation.com/; (Last visited 2011-07-10).

    36. SFProtect. Available from:

    http://www.winnetmag.com/Article/ArticleID/8401/8401.html; (Last visited 2011-07-10).

Predicting exploitation and clustering of vulnerabilities by means of text mining