Development of web mining techniques in order to personalize information in search engines

Number of pages: 190 File Format: word File Code: 31060
Year: 2014 University Degree: Master's degree Category: Computer Engineering
  • Part of the Content
  • Contents & Resources
  • Summary of Development of web mining techniques in order to personalize information in search engines

    Computer Software Engineering Master's Thesis (M.Sc)

    Abstract

    The dynamic nature of the World Wide Web and its growing dimensions have made accurate information retrieval difficult. Incorrect answers returned by search engines, especially for query terms with different meanings, have caused the dissatisfaction of web users who need accurate answers to their information requests. Today, search engines try to find out what users are asking for by studying their search history or even involving users in the search process in order to clarify what they really need. This process is part of search engines' efforts for personalization.

    One of the well-defined and well-built personalized search engines is Snect [1], which uses user participation for the personalization process. In this research, based on the personalized Snect algorithm, an architecture of the new personalized search engine proposed in this thesis called PSEFiL is presented, which by user intervention and filtering links, provides answers with the least amount or absence of subject deviation in order to enrich the answer collection. In addition, the answer set is robust because every link in the result set is either highly ranked by other search engines or has minimal subject deviation through a careful manual scanning process. Additionally, each link is clearly categorized for each available subjective meaning of a query phrase. One of the goals of PSEFiL is to prepare and deliver accurate answers, not to deliver a set of answers with more links whose content may be less accurate or not accurate.

    Keywords

    Search engine, search engine optimization, search engine personalization, web mining structure, web mining content

    Chapter 1

    Overview

    Web, a vast, diverse and dynamic environment is that many users decide to publish their documents in it. Due to the vast amount of information and with the development of information systems, data has become one of the most important resources of organizations. Therefore, in recent years, the methods and techniques of efficient access to data, sharing data and extracting information from data are highly needed by the information society and its users. The importance of effective management and classification of various types of data in order to use and analyze them efficiently for general users as well as academic staff[2] is not hidden from anyone. Meanwhile, the nature of the web includes many challenges that make it difficult to categorize and manage data. Among them, it is difficult to find the required information on the web due to the low analytical accuracy of search engines, the lack of privacy of information, the long response time perceived by the user, the user's dissatisfaction with the quality of the received response, the variety of data available on the web, and so on. pointed out.

    In the search engine [3], the user enters a keyword and the search module searches in its database and will display sites related to your topic. When the user uses a search engine to express his request, the results provided by the search engine do not lead to a list of results, but most search engines offer the user other features in addition to those results, which can be very useful in bringing the user to his real request.

    Different methods are used to retrieve information, which are mainly based on content and structure and use different algorithms for this purpose. Studies show that query words are short and different, and each user has a specific meaning for a similar query. In fact, the results presented are not always what the user expects, users have different tastes, and the search engine offers the same result for all of them.  If users' preferences can be used in the search, more satisfactory results will certainly be obtained. In fact, in such a structure, two users receive different results from the same query. One of the popular and popular topics in information retrieval is recognizing the user's behavior [4] and using his behavioral history in viewing past web pages, so that the results from the search engine are as close as possible to the user's tastes and cause more user satisfaction.One of the prominent and popular topics in information retrieval is recognizing the user's behavior [4] and using his behavioral history in viewing past web pages so that the results from the search engine are as close as possible to the user's tastes and cause more user satisfaction. In fact, the process of personalization [5] of the search engine and improving the results of the user's search is one of the open research fields in this field that has attracted many researchers and evokes valuable results until today.

    Web mining [6] as a specialized sub-branch of data mining knowledge refers to the process of discovering unknown and useful information and knowledge from web data, which is used in various fields, and in recent years, along with the development of the web, this branch has been the focus of many researchers. Web mining not only means the use of data mining techniques [7] for the data stored in web pages, but its algorithms are modified in order to respond to the demands of users from the web in terms of response time and web analysis power.

    In this thesis, first, the process of web mining, personalization of the search engine, the methods and tools used in them will be described, and then by using the combination of mining structure and content mining and by examining the Snect search engine to The personalization of the search engine is paid to achieve better results. 1-2 Statement of the problem and its importance The expansion of the World Wide Web leads to the production of a large amount of data in such a way that it will be impossible to access them effectively if the data is not properly organized and managed. Therefore, the use of web mining techniques in the World Wide Web is currently the focus of many researchers. Web mining is the process of discovering unknown information and knowledge from the data available on the web. It has turned the Internet environment into a practical environment so that users can find the information they need faster and more easily. This technique includes the discovery and analysis of data, documents and multimedia data from the web environment. Web mining uses the details and contents of the document and the structure of hyperlinks so that the user can have the information he needs. Offline data mining and web mining are done online. Web mining turns data into knowledge during the steps of retrieving the desired documents on the web, selecting information and pre-processing, generalization by automatically discovering common patterns in one or more web sites, and analysis, in which the patterns obtained in the previous step are validated and interpreted. [41]

    Web mining methods are divided into three categories based on data type exploration:

    Web content mining [8]: The process of extracting useful information from the content of web documents. This content can include text, image, video, sound, or structured records such as lists and tables. Among the related algorithms are decision trees and neural networks.

    Web structure mining [9]: Web It can be represented as a graph where the nodes are the documents and the edges are the links between the documents. Web structure mining is the process of extracting structural information from the web. Application of web mining [10]: The application of data mining techniques to discover web usage patterns, in order to better understand and meet users' needs. In fact, it is a method to predict user behavior when interacting with the web. Web usage exploration includes pre-processing steps, pattern discovery and pattern analysis [39,41]. searches in a document or database. In the Internet, it is called a web-based program that searches for keywords in files, while some search engines search for World Wide Web documents, newsgroups, and FTP archives [11] [55].

    Different methods are used to retrieve information, which are mainly based on content and structure and use different algorithms for this purpose. Studies show that query words are short and different, and each user has a specific meaning for a similar query, in fact, the results presented are not always what the user expects, users have different tastes, and the search engine provides the same result for all of them. If users' tastes can be used in the search, more satisfactory results will certainly be obtained. This thesis seeks to investigate the methods of personalizing the search engine using web mining methods [2].

    The importance and necessity of conducting research

    In recent years, the growth of the World Wide Web has been greater than expected and the remarkable variety of web applications has made the retrieval of useful content a difficult process.

  • Contents & References of Development of web mining techniques in order to personalize information in search engines

    List:

    Abstract..1

    Chapter one (General)..2

    Introduction..3

    Statement of the problem and its importance.

    2-2 Web mining..10

    2-3 Historical evolution of web mining.11

    2-4 Problems of users in using the web.13

    2-5 Similarities and differences between web mining and data mining.14

    2-6 Web mining algorithms.15

    2-7 Classification Web mining.16

    2-7-1 Web mining content.17

    2-7-1-1 Web mining content views.17

    2-7-1-2 Web mining content data. 17

    2-7-1-3 Approaches and techniques of web content mining. 18

    2-7-1-4 Types of web content mining.................. 19

    2-7-2 Web structure mining. 20

    2-7-2-1 Web mining structure categories based on structural data type. 21

    2-7-2-2 Web structure representation models. 21

    2-7-2-3 Web structure analysis applications. 23 2-7-3 Application of web mining 25 2-7-3-1 Phases of application of web mining 25 2-7-3-2 Data types of application mining

    2-9 Challenges of web mining.30

    2-10 Search engine..31

    2-11 History of search engines.31

    2-12 Search engines in terms of financial support and manpower.32

    2-12-1 Experimental search engines.32

    2-12-2 Search engines Commercial.33

    2-13 General architecture of search engines and their operation.33

    2-13-1 Inside crawler.34

    2-13-2 Control inside crawler.35

    2-13-3 Page storage.35

    2-13-4 Index module 35 2-13-5 Collection Analysis module 2-13-6 Utility Index 2-13-7 Query engine 2-13-8 Ranking module 2-14 Importance of search engines 37

    2-15 Problems of search engines in providing results.37

    2-16 Search engine optimization.38

    2-17 The purpose of SEO..39

    2-18 The advantage of website optimization for search engines.39

    2-19 Search engine optimization process.40

    2-20 Results 41

    Chapter 3 (personalization of search engines). 42

    3-1 Introduction..43

    3-2 The reason for search engine personalization. 43

    Definition of personalization. 44

    Personalization steps. 44

    3-4-1 User recognition. 45

    3-4-1-1 Methods to help users search the web. 45 3-4-1-1 Web-ready code clustering

    3-4-1-2-1 flat clustering.47

    3-4-1-2-1-1 single words and flat clustering.47

    3-4-1-2-1-2 sentences and flat clustering.47

    3-4-1-2-2 hierarchical clustering.48

    3-4-1-2-2-1 Single words and hierarchical clustering. 48

    3-4-1-2-2-2 Sentences and hierarchical clustering. 48

    3-4-1-3 Introduction of Snect. 50

    3-4-1-4 Description of Snect architecture. 51

    3-4-1-4-1 sentence selection and ranking. 52

    3-4-1-4-2 Hierarchical clustering. 55

    3-4-1-4-3 personalization of search results. 57

    3-4-1-5 browsing hierarchy documents to extract information. 59

    3-4-1-6 Hierarchy document review to select results. 59

    3-4-1-7 Query modification. 59

    3-4-1-8 Personalized ranking. 61

    3-4-1-9 Personalized web mediation. 62

    3-4-1-10 Experimental results. 63

    3-5-1-10-1 User surveys...................64

    3-4-1-10-2 Snect data collection and anecdotal evidence..............65

    3-4-1-10-3 Snect evaluation......................66

    3-4-1-10-3-1 Advantages of using DMOZ. ..............67

    3-4-1-10-3-2 Advantages of using strong text index.............67

    3-4-1-10-3-3 Advantages of using multiple engines..............68

    3-4-1-10-3-4 Advantages of using spaced sentences as folder tags...69

    3-4-1-10-3-5 Number of codes Web ready available in3-4-2 User Modeling

    3-4-2-2 -1-1-1 Personal recovery model. 76

    3-4-2-2 -1-1-2 Personal presentation style. 76

    3-4-2-2-1-1-3 Personal interest topic. 77

    3-4-2-2 -1-2 System implementation. 79

    3-4-2-2 -1-2 -1 Ranking. 81

    3-4-2-2 -1-2-2 Hierarchical classification of web pages retrieval Done. 83

    3-4-2-2-1-3 User study. 86

    3-4-2-2 -1-3  -1 Test. 86

    3-4-2-2 -1-3 -2 Test 2.87

    3-4-2-2 -3    Personalization of page ranking algorithm. 88

    3-4-2-2 -4    LTIL algorithm. 89

    3-4-2-2-5 Method IA. 89.3-4-3 implementation of personalization system.91

    3-4-3-1 deterministic method.91

    3-4-3-2 fuzzy method.91

    3-4-3 personalization of search engines using fuzzy conceptual networks and data mining tools.91

    3-4-3-3-1 Background. 91

    3-5-3-3-2 Proposed method. 95

    3-3-4-3-3 System evaluation and review of the obtained results. 97

    3-5 Conclusion. 100

    Chapter four (Proposed model for search engine personalization and results obtained from experiments). 101

    4-1 Introduction. 102

    4-2 Description of experiments and problem analysis. 102 4-3 Conclusion. 154 Chapter 5 (search engine user interface). 159

    5-4 Conclusion. 159

    Chapter Six (Conclusion). 160

    6-1 Introduction. 161

    6-2 Review of previous chapters. 161

    6-3 PSEFiL personalized search engine. 161

    6-4 Conclusion. 164

    6-5 Proposals and future studies. 164

     

    Articles extracted from the thesis. 165

     

    List of sources. 166

    English abstract.172

    Source:

    Persian sources

    [1] Arzanian, B., Moradi Dolatabadi, P., Akhlikian, F., 2018, "Personalization of search engines using fuzzy conceptual networks and data mining tools", 3rd data conference Mining, pp. 1-6. [2] Bostan, S., Qasimzadeh, M., 2013, "A review of search engine personalization algorithms using users' interests", Khavaran Institute of Higher Education, pp. 1-7.

    [3] Saniei Abadeh, M., Mahmoudi, S., Taher Paror, M., 2013, "Applied Data Mining", Niaz Danesh Publications, Chapter 1, p. 19 to 42. [4] Kamijani, A., 1381, "Indexing structure in web search engines", Journal of Information Processing and Management, Volume 17, No. 3 and 4, p. 44.

    [5] Melkian, A., 1358, "Principles of Internet Engineering", Nass Publications, p. 482 to 487

    [6] Yaqoubi, M. Mohammadzadeh, M., 1390, "Review on the personalization of search engine results with intelligent methods", the first regional conference of modern approaches in computer engineering and information technology, pp. 1-6.

Development of web mining techniques in order to personalize information in search engines