Presenting a model to identify the influencing factors and their impact factor in the profit and loss of the third party car insurance of insurance companies by means of data mining methods, a case study of Iran Insurance Company.

Number of pages: 100 File Format: word File Code: 30601
Year: 2013 University Degree: Master's degree Category: Computer Engineering
  • Part of the Content
  • Contents & Resources
  • Summary of Presenting a model to identify the influencing factors and their impact factor in the profit and loss of the third party car insurance of insurance companies by means of data mining methods, a case study of Iran Insurance Company.

    Master's Thesis

    in the field of Computer - Software Engineering

    Abstract

    Reviewing car insurance information has shown factors such as the type of car used, having a driver's license, the type of license and whether or not it matches the vehicle, the amount of the insurance premium, the amount of insurance policy obligations, the quality of the car manufacturers, the age of the driver, the literacy of the driver, the mismatch of the insurance premium with the insurance case, the delay in renewing the insurance policy, the profit and loss of insurance companies. have had an impact.

    The aim of this thesis is to know the influencing factors and their coefficient of influence on the profit and loss of third party car insurance of insurance companies using data mining method and then choosing the algorithm that has the best prediction accuracy to detect these factors. Acceptable clusters will be able to provide a model to identify the influencing factors and determine the extent of their effect on the profit and loss of the third party car insurance policy.

    Keywords: data mining of the third party car insurance profit and loss

    Chapter 1

    Introduction

    Commercial and commercial companies must always focus on making profits and reducing losses in order to survive and maintain the market. and their losses are emphasized, therefore customer attraction methods as well as techniques to prevent or reduce losses are at the top of the agenda of these companies.

    Among the companies that are exposed to reduced profits or increased losses due to various reasons are insurance companies. Factors such as marketing, customer loyalty, insurance premium rates, advertising, fraud, can attract or repel customers, which has a direct and indirect effect on profit and loss.

    Payment of damages, as an obligation of insurance companies, leads to a decrease in profit and in some cases causes losses to an insurance company. Damage can occur for various reasons, or another action can be presented as damage, which is not the case [Derrig et. al 2006].

    Factors such as driving culture, having a driver's license, the type of license and its compatibility or non-compatibility with the vehicle, intercity roads and inner-city streets that challenge municipalities and road departments, fraud, weather conditions, car quality of car manufacturers, driver's age, driver's literacy, mismatch of insurance premium with the insurance item [Wilson 2003], holidays, trips and many other things can cause damage and ultimately increase the loss of a company. get insurance.

    Industrial insurance is useful, necessary and effective in economic development. Due to "increasing security in various areas of life and economic activities", "increasing investment and employment and economic growth" and "promoting economic justice and reducing poverty caused by risks", this industry has an important place in the progress and excellence of a country.

    Despite the important role of insurance in creating a foundation and providing favorable economic conditions, the current state of this industry in the national economy is far from its ideal state. The general lack of familiarity and low demand for insurance products, low technical knowledge in the field of insurance services, mismatch of risk with insurance premiums, significant difference in the comparison of the risk detection criteria of third-party insurance with the equivalent type of insurance in developed countries, the existence of inadequacies in the management of insurance supply units are among the reasons for the lack of proper development of this industry in the country. Since mankind has achieved many salvations and successes throughout history with the help of science and experience, a more scientific look at the problems of this industry and finding a solution in the context of science can be the solution. Today, by data mining methods, the relationship between various effective or ineffective factors in a subject is determined, and considering that data mining is a useful tool in extracting knowledge from mass data that shows the hidden connections between them, commercial companies are turning to these techniques.

    Data mining is not limited to the use of technologies and will use whatever is useful for it. However, statistics and computer are the most widely used sciences and technologies used in data mining. 1-1 Definition of data mining Data mining is the process of discovering unknown and useful rules and knowledge from a large amount of data and databases [Liu et. al 2012].

    Performing data mining, like any other operation, has its own steps which are as follows:

    1-Separation of useful data from extraneous data

    2-Integration of different data under a single format

    3-Selecting the necessary data from among other data

    4- Transferring data to the data mining environment to discover rules

    5-Creating related models and patterns by Data mining methods

    6-Evaluating the model and patterns created to determine their usefulness

    7-Dissemination of extracted knowledge to end users

     

    1-2         Definition of insurance

    Insurance: insurance is a contract by which one party undertakes to pay money or funds from the other party in case of an accident or damage caused to him. compensate or pay a certain amount. The obligor is the insurer, the party to the obligation, the insurer, the amount paid by the insurer to the insurer, the insurance premium and what is insured is the subject of insurance [Article 1 of the Insurance Law approved on 7/2/1316]. Insurance companies and check their effectiveness. Algorithms used in this research included categories, clusters, decision trees and association rules.

    1-4 steps of research

    In this thesis, using data mining methods, a part of the insurance company's issue and loss data for one year is modeled and a pattern is made from them. In fact, in this way, the algorithm is taught that the connections between the data lead to what results. Then a part of the data that was not used in the previous step is given to the created model and the results are evaluated by scientific criteria. In order to test the performance, other data can be given to the model and the results can be compared with the actual results.

    1-5 Thesis structure

    This thesis will consist of four chapters, the first chapter contains an introduction and the necessity of the research and the purpose of this research. In the second chapter, some data mining techniques and their methods are presented and the researches that have been done in this field are examined. In the third chapter, we will give a detailed description of the research done and the data mining software used in this thesis, and with the help of data mining techniques, models will be presented and the models presented in each group will be compared with each other and the best model will be selected among them. In the fourth chapter, the raised issues will be summarized and the results will be presented, and then the changes that can be made in this field in the future will be suggested. Chapter Two: Literature and Previous Researches: In this chapter, we will first have an overview of data mining methods, then we will review previous researches. 2-1: Data mining. and machine learning

     

    Data mining is a combination of machine learning techniques, pattern recognition, statistics, database theory and summarizing and relating interesting concepts and patterns automatically from large corporate databases. The main purpose of data mining is to help the decision-making process by extracting knowledge from data [Alpaydin 2010]. The purpose of data mining is to reveal trends or patterns that have been unknown so far to make better decisions, which achieves this goal by applying statistical methods such as logistic analysis and clustering, as well as by using data analysis methods obtained from other disciplines (such as neural networks in artificial intelligence and decision trees in machine learning) [Koh & Gervis 2010. Data mining is a useful tool for exploring knowledge from large data because it predicts future trends and behaviors by observing the hidden patterns of the organization. Data mining is finding meaningful information from a large number of data by some technologies as a procedure to discover knowledge from the database, the steps of which include the following [Han and Kamber 2001].

  • Contents & References of Presenting a model to identify the influencing factors and their impact factor in the profit and loss of the third party car insurance of insurance companies by means of data mining methods, a case study of Iran Insurance Company.

    List:

    Chapter One: Introduction

    1-1 Definition of data mining. 3

    1-2 Definition of insurance. 4

    1-3 The purpose of the thesis. 4

    1-4 stages of research. 4

    1-5 thesis structure. 5

     

    Chapter Two: Subject literature and previous research

    2-1 Data mining and machine learning. 7

    2-2 Data mining tools and techniques. 8

    2-3 data mining methods. 9

    2-3-1 methods of data description 10

    2-3-2 methods of dependency analysis 10

    2-3-3 methods of classification and prediction. 10

    2-3-4 decision tree. 11

    2-3-5 neural network. 12

    2-3-6 reasoning based on memory. 12

    2-3-7 Support vector machines. 13

    2-3-8 clustering methods 13

    2-3-9 K-Means method 13

    2-3-10 Cohen's network. 14

    2-3-11 two-step method. 14

    2-3-12 noise analysis methods. 14

    2-4 unbalanced categories] Sanii Abadeh 2013[. 15

    2-4-1 solution based on criteria 15

    2-4-2 solution based on sampling. 15

    2-5 research background. 16

    2-6 chapter summary. 19

    Chapter Three: Research Description

    3-1 Selection of software 21

    3-1-1 Rapidminer 21

    3-1-2 Comparison of RapidMiner with other similar software. 21

    3-2 Data 25

    3-2-1 Data selection 25

    3-2-2 Export dataset fields 25

    3-2-3 Dimensionality reduction. 25

    3-2-4 Damage data set fields. 29

    3-2-5 data cleaning 29

    3-2-6 handling lost data. 29

    3-2-7 Discovery of outlying data 30

    3-2-8 Aggregate data 32

    3-2-9 Creation of category feature. 32

    3-2-10 Data conversion 32

    3-2-11 Data transfer to the data mining environment. 32

    3-2-12 designated data types 33

    3-2-13 operation of selecting more effective features. 34

    3-3 Results of applying PCA algorithm and weighting algorithms. 34

    3-4 selected features to be used in algorithms sensitive to the number of features. 36

    3-5 evaluation criteria of classification algorithms. 37

    3-6 clutter matrix. 37

    3-7 AUC measure. 38

    3-8 evaluation methods of classification algorithms. 39

    3-8-1 Holdout method 39

    3-8-2 Random Subsampling method. 39

    3-8-3 Cross-Validation method. 40

    3-8-4 Bootstrap method. 40

    3-9 classification algorithms. 41

    3-9-1 KNN algorithm. 42

    3-9-2 Naïve Bayes Algorithm 42

    3-9-3 Neural Network Algorithm. 43

    3-9-4 Linear SVM algorithm. 45

    3-9-5 logistic regression algorithm. 46

    3-9-6 Meta Decision Tree algorithm. 47

    3-9-7 Wj48 tree algorithm. 49

    3-9-8 Random forest tree algorithm 51

    3-10 evaluation criteria of rule-based algorithms (discovery of association rules) 54

    3-10-1 FPgrowth algorithm. 55

    3-10-2 Weka Apriori algorithm 55

    3-11 evaluation criteria of clustering algorithms. 55

    3-12 clustering algorithms. 57

    3-12-1 K-Means algorithm 57

    3-12-2 Kohonen algorithm. 60

    3-12-3 Two-step algorithm. 64

     

    Chapter Four: Evaluation and Conclusion

    4-1 Comparison of results. 69

    4-2 classification algorithms. 69

    4-3 decision tree classification algorithms. 70

    4-4 clustering algorithms. 79

    4-5 algorithmic rules (based on the law) 81

    4-6 suggestions to insurance companies. 81

    4-7 suggestions for continuing work 83

    Resources and source

    List of Persian sources. 84

    List of English sources. 85

     

    Source:

     

    Persian sources

     

    [Izdperest 1389] Seyed Mahmoud Izdeperest, (1389), "Presenting a framework for predicting the loss of car body insurance customers using data mining method", Insurance Research Institute website. "http://www. irc. ac. ir"

     

    ] Rostakhiz Paydar 1389 [Nada Rastakhiz Paydar, (1389), "Segmentation of customers based on risk using data mining technique (case study: car body insurance of Mellat Insurance"), Insurance Research Institute website. "http://www. irc. ac. ir"

     

    ]Saniei Abadeh 1391[ir"

     

    ]Saniei Abadeh 1391 [Saniei Abadeh Mohammad, (1391), "Applied data mining", first edition, Niazdanash Publishing House, Tehran-Iran

     

    ]Anbari 1389[ Elham Anbari, (1389), "Risk classification of insurers in the field of car body insurance using data mining", research institute website Insurance.” http://www. irc a.c. ir"

     

    ] Foladinia and colleagues 2013 [Foladinia Babak, Kermizade Faramarz, Dastghibi Fard Gholamhossein, Sami Ashkan, (2013), "Detecting fraud in car insurance using data mining methods", 7th Iran Data Mining Conference, 19 and 20 Azar, Tehran

    ]Foladinia 1392 [Foladinia Babak, (2013), "Discovering Fraud in Car Insurance Using Data Mining Methods", Master's Thesis, Faculty of Electronic Education, Shiraz University

    ] Murki Aliabad 2013 "Entrepreneur" (, Insurance Research Institute website. "http://www. irc. ac. ir"

     

     

     

    English sources

     

    [Allahyari Soeini et. al 2012] Allahyari Soeini R and Vahidy Rodpysh K (2012), "Applying Data Mining to Insurance Customer Churn Management", "Third International Conference, ICICA 2012, Chengde, China, September 14-16, 2012. Proceedings, Part I (Communications in Computer and Information Science)

    [Alpaydin 2010] Alpaydin E. (2010), "Introduction to Machine Learning", The MIT Press Cambridge, Massachusetts London, England.

    [Bolton & Hand 2002] Bolton R. J. & Hand D. J. (2002), "Statistical fraud detection: a review", Statistical Science, vol. 17, no. 3, pp. 235–55. [Brockett et al. al 1998] Brockett P. L. Xia X. & Derrig R. A. (1998), "Using Kohonen's self-organizing feature map to uncover automobile bodily injury claims fraud", The J. of Risk and Insurance, Vol. 65, No. 2, pp. 74-245. [Derrig et. al 2006] Derrig, R., Johnston, D. & Sprinkel, E. (2006), “Auto Insurance Fraud: Measurements and Efforts to Combat It”, Risk Management and Insurance Review, Vol 9(2), pp.109 – 130.

     

    [Derrig & Ostazewski 1995] Derrig R. A. & Ostazewski K. M. (1995), "Fuzzy techniques of pattern recognition in risk and claim classification", The J. of Risk and Insurance, Vol. 62, No. 3, pp. 82-447.

    [Gupta 2006] Gupta, G. K. (2006), "Introduction to Data Mining with case studies". Prentice Hall of India, New Delhi.

     

    [Han and Kamber 2001] Han J. and Kamber K, Data Mining: Concepts and Techniques, San Francisco, Morgan Kaufmann Publishers, 2001.

     

    [Jiawei Han, 2010] Jiawei Han, Micheline Kamber, and Jian Pei(2010), "Data Mining, Concepts and Techniques". Techniques", 3rd ed, University of Illinois at Urbana-Champaign &

    Simon Fraser University.

    [Koh & Geravis 2010] Koh H. C. and Geravis G. (2010), "Fraud Detection Using Data Mining Techniques: Applications In The Motor Insurance Industry", Journal of Proceedings of Business And Information, Volume 7, No 1, pp. 49.

     

    [Kumar and Verna 2012] Kumar R. AND Verma R. (2012), “ Classification Algorithms for Data Mining: A Survey, International Journal of Innovations in Engineering and Technology (IJIET), Vol. 1, Issue 2, August 2012.

     

    [Lin & Yeh 2012] Lin Kuo-Chung and Yeh Ching-Long (2012), "Use of Data Mining Techniques to Detect Medical Fraud in Health Insurance", International Journal of Engineering and Technology Innovation, vol. 2, no. 2, pp. 42-53. [Liu et. al 2012 ]Liu Jenn-Long, Chen Chien-Liang and Yang Hsing-Hui (2012), "Efficient Evolutionary Data Mining Algorithms Applied to the Insurance Fraud Prediction", International Journal of Machine Learning and Computing, Vol. 2, No. 3, pp. 308-314. [Osmar 1999] Osmar, R.

Presenting a model to identify the influencing factors and their impact factor in the profit and loss of the third party car insurance of insurance companies by means of data mining methods, a case study of Iran Insurance Company.