Presenting a model for ranking web documents based on user interactions

Number of pages: 99 File Format: word File Code: 30600
Year: 2013 University Degree: Master's degree Category: Computer Engineering
  • Part of the Content
  • Contents & Resources
  • Summary of Presenting a model for ranking web documents based on user interactions

    Dissertation for receiving the M.SC degree in computer engineering

    Software orientation

    Abstract:

    Today, the world wide web is used as the best environment for developing, publishing and accessing knowledge. The most important tool to access this endless ocean of information is search engines, one of the main parts of which is the ranking in response to the user's search. Therefore, helping users to find the web page they want is a very important issue. Considering the problems of text-based and link-based methods, methods based on user behavior and judgment have been considered to establish justice and democracy on the web. In other words, in order to grow the web quantitatively and qualitatively, the correct pages are determined by the users themselves. But recognizing and extracting users' judgments is of particular importance. The user's behavior during the search includes the query text, how the user clicks on the ranked list of results, the time spent on the page and other information about the events recorded during the search. These recorded events contain very valuable information that can be used to analyze and evaluate and model user behavior in order to improve the quality of the results. In this research, a model is presented that receives positive and negative user feedback for each specific query, including the number of times a site is accessed, the time spent on each site, the number of downloads performed on each site, the number of positive clicks and the number of negative clicks on each site from the displayed list of web pages, and calculates the rank of each page using a multi-criteria decision-making method, and provides a new ranking of sites. It updates the ranking regularly using the subsequent feedback of users.

    Keywords:

    Search engine, user behavior, user feedback, multi-criteria decision making

    Chapter 1

    Research overview

    1-1 Introduction

    The Internet consists of web pages that contain different topics. To find a specific topic in this huge collection of pages, one needs to use a search engine. Search engines search their database according to the user's query and show the search results to the user. Due to the fact that it is possible to find many pages, for example, the possibility of finding millions of pages, as a result of a search query, and it is possible that only a few of these pages answer the user's real need, usually search engines rank the results based on the importance of the page and display them to the user in the same way.

    Due to the wide range of pages on the web and their ever-increasing growth, there is a need for methods to rank web pages based on the importance of these pages and relevance. They are searched by topic. Ranking is one of the main components of an information retrieval system. In the case of web search engines, which are information retrieval systems, due to the particular nature of web users, the role of ranking is much more pronounced. It is normal for web search engines to find thousands and even millions of pages as search results, and on the other hand, the web user does not have enough patience and time to view all the results to reach the page of his interest. Most web users don't care about pages after the first page of search results. Therefore, it is very important for a web search engine to show the user the results of interest to the user at the top of the found results, otherwise the search engine will not be effective.

    On the other hand, the needs of users who search the web are different from traditional information retrieval systems, which only match the words in the query and the text of the pages. In traditional information retrieval systems, a page written by a typical user may be highly relevant to the user's query, if the search intent was something else. Therefore, web users are more interested in pages that are not only related to the topic but also have sufficient credibility. As a result, the focus shifts from more relevance to more credibility when searching the web. The task of a ranking algorithm is to recognize and assign more rank to the more valid pages in the set of web pages..

    1-2 Information retrieval

    Information retrieval[1] includes standards and protocols for displaying, storing, organizing and accessing information items with the aim of retrieving all documents related to the user's query[2]. Data recovery is divided into two areas: user and information, and an intermediate area called the recovery area (Figure 1-1). In the user domain, the user's information needs are expressed, which in most cases should be expressed in the language of the retrieval system, or the system models the user and predicts his information needs according to the user's behavior, his behavior history, or direct information that he has about his expertise or personal matters. In the field of information, information and knowledge hidden in documents and data must be modeled and organized. The field of recovery is also the common chapter of these two and matching the user's information needs with information documents. Information needs are either sent to the retrieval area in the form of online queries[3], in which case relevant items are retrieved from the existing document collection, or there is a query that is exposed to the flow of information (for example, news) and filters and separates relevant documents.

    Abstract

    Today, the World Wide Web as the best environment for the development and dissemination of and access to knowledge are used. The main tool to access the endless ocean of information, search engines are one of the major sectors that are ranked in response to a user's search. Thus helping the user to find their desired web page is a very important issue. Due to the text and link-based methods, methods based on user's behavior and judgment to restore justice and democracy on the web are considered. In other words, for web growth in terms of quantity and quality determine the fittest pages done by users themselves. However, it is important to detect and extract users' judgment. User behavior during the search, including text query, the user clicks on the list of ranked results, stop time on the page and the information is recorded events during the search. These logs contain valuable information which they can analyze and evaluate the user behavior modeling can be used to improve the quality of the results.

    This paper presents a model in which each query identifies five of the positive and negative feedback, including the number of users accessing the site, dwell time at each site, the number of downloads made at each site, number of positive and negative clicks per site at each list shown in web pages, receive and rank each page using one of the methods called TOPSIS multi-criteria decision calculates a new ranking of sites offering these rankings frequently using feedback from users to date is later.

  • Contents & References of Presenting a model for ranking web documents based on user interactions

    List:

    Abstract..1

    Chapter One: Research Overview ..

    1-1 Introduction..3

    1-2 Information Retrieval.4

    1-3 Motivation..5

    1-4 Search Engine..9

    1-5 Indexing and Processing Inquiry. 11

    1-6 Multi-indicator decision-making. 13

    1-7 Statement of the basic research problem. 13

    1-8 Importance and necessity of conducting research. 14

    1-9 Specific objectives of the research. 16

    1-10 Research hypotheses. 16

    Chapter Two: Overview of the work done

    1-2 Introduction.. 18

    2-2 text-based ranking.19

    2-2-1 vector space model.19

    2-2-2 probabilistic model.20

    2-3 connection-based ranking.22

    2-3-1 query-independent ranking.23

    2-3-2 query-dependent ranking 27

    2-3-3 link-based ranking challenges.31

    2-4 hybrid ranking.34

    2-5 learning-based ranking.37

    2-6 ranking based on user behavior.39

    2-6-1 document expansion.42

    2-6-2 NM method.45

    2-6-3 CVM method.43

    2-6-4 LA method.45

    2-6-5 A method for web ranking by defining popularity factors.46

    2-6-6 Marko model of user behavior as a predictor for a successful search.51

    Chapter three: Description of the proposed method

    3-1 Analysis of a multi-criteria system.64

    3-2 Review of multi-criteria decision making process.64

    3-2-1 De-scaling.66

    3-2-2 Weighting of indicators.67

    3-3 Description of TOPSIS method.69

    3-4 Proposed method.71

    Chapter four: Implementation and evaluation The proposed method 4-1 Characteristics of the proposed method 75 4-2 Description of the proposed model simulation 76 4-3 Example of the proposed model simulation 79 Chapter Five: Conclusion 5-1 Discussion and conclusion 82 5-2 Advantages of the proposed method 83 5-3 Future works 84

    List of references 85

     

    Source:

     

    [1] Baeza-Yates R, and Ribeiro-Neto B. Modern Information Retrieval. ACM Press / Addison-Wesley, 2005. [2] Bharat K, Henzinger MR. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 2006. pp:104-111.

    [3] Kleinberg M. Authoritative sources in a hyperlinked environment. Journal of the ACM, Vol. 46 No. 5, September 1999. pp:604–632. [4] Kent P. Search Engine Optimization for dummies. John Wiley & Sons Inc. 2008. [5] Keyhanipour AH, Moshiri B, Piroozmand M, Lucas C. Aggregation of Multiple Search Engines Based on Users' Preferences in WebFusion. Elsevier Journal of Knowledge-Based Systems, Vol. 20, No. 4, May 2007. pp:321–328.

    [6] Liu TY, Qin T, Xu J, Xiong W, Li H. LETOR: Benchmark dataset for research on learning to rank for information retrieval. In SIGIR Workshop on Learning to Rank for Information Retrieval, 2007.

    [7] Castillo C. Effective Web Crawling, Ph.D. Thesis, University of Chile, Nov 2004.

    [8] Baeza-Yates R. Challenges in the interaction of information retrieval and natural language processing. In Proceedings of 5th international conference on Computational Linguistics and Intelligent Text Processing (CICLing), Lecture Notes in Computer Science Springer, Vol. 2945, February 2004. pp:445–456.

    [9] Tomasic A, Garcia-Molina H. Performance of inverted indices in sharednothing distributed text document information retrieval systems. In Proceedings of the second international conference on parallel and distributed information systems, IEEE Computer Society Press, 1993. pp: 8-17.

    [10] Moussea V. Figueria J. Gomes silv C. Resolving Inconsistencies Among Constraints on the parameters of MCDA model", European journal of operational research. Volume 147,pp: 72-93.

    [11] Zhang Y, Chen W, Wang D, Yang Q. User-click modeling for understanding and predicting search-behavior. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD'11). ACM, New York, NY, USA, August 2011. pp: 1388-1396. [12] Zhao D, Zhang M, Zhang D. A Search Ranking Algorithm Based on User Preferences. Journal of Computational Information Systems, November 2012. pp: 8969-8976. [13] Attenberg J, Pandey S, Suel T. Modeling and predicting user behavior in sponsored search. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD '09). ACM, New York, NY, USA, 2009. pp:1067-1076. [14] Yu J, Lu Y, Sun S, Zhang F. Search Results Evaluation Based on User Behavior. Springer-Verlag Berlin Heidelberg, vol. 320, 2013. pp: 397-403. [15] Dupret G, Liao C. A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine. In Proceedings of the third ACM international conference on Web search and data mining (WSDM '10). ACM, New York, NY, USA, February 2010. pp: 190-181. [16] Liu C, White RW, Dumais S. Understanding web browsing behaviors through Weibull analysis of dwell time. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR '10). ACM, New York, NY, USA, 2010. pp: 386-379.

    [17] Yang C, Liu C, Shao-chieh H, A Hybrid Item-Based Recommendation Ranking Algorithm Based on User Access Patterns. Springer-Verlag Berlin Heidelberg, vol. 163, 2013. pp: 233-225.

    [18] Guo F, Liu C, Wang Y. Efficient Multiple-Click Models in Web Search. The definitive version will appear in WSDM '09: Proceedings of the second ACM international conference on web search and data mining. 2008 ACM. [19] Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing and Management: an International Journal, Vol. 24 No. 5, 1988. pp:513–523. [20] Salton G. The SMART retrieval system - experiments in automatic document processing. Prentice-Hall, 1971. [21] Zhai C. A brief review of information retrieval models. Technical report, Department of Computer Science, University of Illinois at Urbana-Champaign, 2010. [22] Sparck Jones K, Walker S, Robertson SE. A probabilistic model of information retrieval: development and comparative experiments - part 1 and part 2. Information Processing and Management, Vol. 36 no. 6, 2012. pp:779-808 and 804-809.

    [23] Robertson SE, Walker S, Jones S, Gatford M. Okapi at TREC-3. In Harman, D. K., editor, The Third Text REtrieval Conference (TREC-3), pp: 109-126.

    [24] Neelam Duhan, A. K. Sharma, Komal Kumar Bhatia, page ranking Algorithms: A Survey, IEEE International Advance Computing Conference IACC 2009 Patiala, India, March 2009. pp: 6-7.

    [25] Jain R, Dr Purohit GN, page ranking Algorithms for Web Mining, International Journal of Computer Applications (0975-8887)Volume 13- No.5, January 2011.

    [26] Saxena PC, Gupta JP, Gupta N. Web page ranking Based on Text Content of Linked Pages, International Journal of Computer Theory and Engineering, Vol. 2, No. 1 February, 2010. pp:1793-1800. [27] Haveliwala T. Topic-sensitive pagerank. In Proceedings of the Eleventh Int'l World Wide Web Conf. 2002. [28] Gyongyi Z, Garcia-Molina H, Pedersen J. Combating web spam with TrustRank. In Proceedings of the International Conference on Very Large Databases (VLDB), 2004. pp: 576-587. [29] Xue GR, Yang Q, Zeng HJ, Chen Z. Exploiting the hierarchical structure for link analysis. In SIGIR, August 16-19, Salvador, Brazil, 2005. pp:186-193. [30] Signorini A. A Survey of Ranking Algorithms. Department of Computer Science University of Iowa, September 11, 2005

Presenting a model for ranking web documents based on user interactions