Word Files
Reference for Downloading Educational Files

Presenting a model for ranking web documents based on user interactions

Number of pages: 99 File Format: word File Code: 30600
Year: 2013 University Degree: Master's degree Category: Computer Engineering

Tags/Keywords: data recovery - Information retrieval system - Internet - Search engine - User interactions - Web - Web document ranking - Web pages

Part of the Content
Contents & Resources

Summary of Presenting a model for ranking web documents based on user interactions

Dissertation for receiving the M.SC degree in computer engineering

Software orientation

Abstract:

Today, the world wide web is used as the best environment for developing, publishing and accessing knowledge. The most important tool to access this endless ocean of information is search engines, one of the main parts of which is the ranking in response to the user's search. Therefore, helping users to find the web page they want is a very important issue. Considering the problems of text-based and link-based methods, methods based on user behavior and judgment have been considered to establish justice and democracy on the web. In other words, in order to grow the web quantitatively and qualitatively, the correct pages are determined by the users themselves. But recognizing and extracting users' judgments is of particular importance. The user's behavior during the search includes the query text, how the user clicks on the ranked list of results, the time spent on the page and other information about the events recorded during the search. These recorded events contain very valuable information that can be used to analyze and evaluate and model user behavior in order to improve the quality of the results. In this research, a model is presented that receives positive and negative user feedback for each specific query, including the number of times a site is accessed, the time spent on each site, the number of downloads performed on each site, the number of positive clicks and the number of negative clicks on each site from the displayed list of web pages, and calculates the rank of each page using a multi-criteria decision-making method, and provides a new ranking of sites. It updates the ranking regularly using the subsequent feedback of users.

Keywords:

Search engine, user behavior, user feedback, multi-criteria decision making

Chapter 1

Research overview

1-1 Introduction

The Internet consists of web pages that contain different topics. To find a specific topic in this huge collection of pages, one needs to use a search engine. Search engines search their database according to the user's query and show the search results to the user. Due to the fact that it is possible to find many pages, for example, the possibility of finding millions of pages, as a result of a search query, and it is possible that only a few of these pages answer the user's real need, usually search engines rank the results based on the importance of the page and display them to the user in the same way.

Due to the wide range of pages on the web and their ever-increasing growth, there is a need for methods to rank web pages based on the importance of these pages and relevance. They are searched by topic. Ranking is one of the main components of an information retrieval system. In the case of web search engines, which are information retrieval systems, due to the particular nature of web users, the role of ranking is much more pronounced. It is normal for web search engines to find thousands and even millions of pages as search results, and on the other hand, the web user does not have enough patience and time to view all the results to reach the page of his interest. Most web users don't care about pages after the first page of search results. Therefore, it is very important for a web search engine to show the user the results of interest to the user at the top of the found results, otherwise the search engine will not be effective.

On the other hand, the needs of users who search the web are different from traditional information retrieval systems, which only match the words in the query and the text of the pages. In traditional information retrieval systems, a page written by a typical user may be highly relevant to the user's query, if the search intent was something else. Therefore, web users are more interested in pages that are not only related to the topic but also have sufficient credibility. As a result, the focus shifts from more relevance to more credibility when searching the web. The task of a ranking algorithm is to recognize and assign more rank to the more valid pages in the set of web pages..

1-2 Information retrieval

Information retrieval[1] includes standards and protocols for displaying, storing, organizing and accessing information items with the aim of retrieving all documents related to the user's query[2]. Data recovery is divided into two areas: user and information, and an intermediate area called the recovery area (Figure 1-1). In the user domain, the user's information needs are expressed, which in most cases should be expressed in the language of the retrieval system, or the system models the user and predicts his information needs according to the user's behavior, his behavior history, or direct information that he has about his expertise or personal matters. In the field of information, information and knowledge hidden in documents and data must be modeled and organized. The field of recovery is also the common chapter of these two and matching the user's information needs with information documents. Information needs are either sent to the retrieval area in the form of online queries[3], in which case relevant items are retrieved from the existing document collection, or there is a query that is exposed to the flow of information (for example, news) and filters and separates relevant documents.

Abstract

Today, the World Wide Web as the best environment for the development and dissemination of and access to knowledge are used. The main tool to access the endless ocean of information, search engines are one of the major sectors that are ranked in response to a user's search. Thus helping the user to find their desired web page is a very important issue. Due to the text and link-based methods, methods based on user's behavior and judgment to restore justice and democracy on the web are considered. In other words, for web growth in terms of quantity and quality determine the fittest pages done by users themselves. However, it is important to detect and extract users' judgment. User behavior during the search, including text query, the user clicks on the list of ranked results, stop time on the page and the information is recorded events during the search. These logs contain valuable information which they can analyze and evaluate the user behavior modeling can be used to improve the quality of the results.

This paper presents a model in which each query identifies five of the positive and negative feedback, including the number of users accessing the site, dwell time at each site, the number of downloads made at each site, number of positive and negative clicks per site at each list shown in web pages, receive and rank each page using one of the methods called TOPSIS multi-criteria decision calculates a new ranking of sites offering these rankings frequently using feedback from users to date is later.
Contents & References of Presenting a model for ranking web documents based on user interactions

List:

Abstract..1

Chapter One: Research Overview ..

1-1 Introduction..3

1-2 Information Retrieval.4

1-3 Motivation..5

1-4 Search Engine..9

1-5 Indexing and Processing Inquiry. 11

1-6 Multi-indicator decision-making. 13

1-7 Statement of the basic research problem. 13

1-8 Importance and necessity of conducting research. 14

1-9 Specific objectives of the research. 16

1-10 Research hypotheses. 16

Chapter Two: Overview of the work done

1-2 Introduction.. 18

2-2 text-based ranking.19

2-2-1 vector space model.19

2-2-2 probabilistic model.20

2-3 connection-based ranking.22

2-3-1 query-independent ranking.23

2-3-2 query-dependent ranking 27

2-3-3 link-based ranking challenges.31

2-4 hybrid ranking.34

2-5 learning-based ranking.37

2-6 ranking based on user behavior.39

2-6-1 document expansion.42

2-6-2 NM method.45

2-6-3 CVM method.43

2-6-4 LA method.45

2-6-5 A method for web ranking by defining popularity factors.46

2-6-6 Marko model of user behavior as a predictor for a successful search.51

Chapter three: Description of the proposed method
3-1 Analysis of a multi-criteria system.64

3-2 Review of multi-criteria decision making process.64

3-2-1 De-scaling.66

3-2-2 Weighting of indicators.67

3-3 Description of TOPSIS method.69

3-4 Proposed method.71

Chapter four: Implementation and evaluation The proposed method 4-1 Characteristics of the proposed method 75 4-2 Description of the proposed model simulation 76 4-3 Example of the proposed model simulation 79 Chapter Five: Conclusion 5-1 Discussion and conclusion 82 5-2 Advantages of the proposed method 83 5-3 Future works 84

List of references 85

Source:

[1] Baeza-Yates R, and Ribeiro-Neto B. Modern Information Retrieval. ACM Press / Addison-Wesley, 2005. [2] Bharat K, Henzinger MR. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 2006. pp:104-111.

[3] Kleinberg M. Authoritative sources in a hyperlinked environment. Journal of the ACM, Vol. 46 No. 5, September 1999. pp:604–632. [4] Kent P. Search Engine Optimization for dummies. John Wiley & Sons Inc. 2008. [5] Keyhanipour AH, Moshiri B, Piroozmand M, Lucas C. Aggregation of Multiple Search Engines Based on Users' Preferences in WebFusion. Elsevier Journal of Knowledge-Based Systems, Vol. 20, No. 4, May 2007. pp:321–328.

[6] Liu TY, Qin T, Xu J, Xiong W, Li H. LETOR: Benchmark dataset for research on learning to rank for information retrieval. In SIGIR Workshop on Learning to Rank for Information Retrieval, 2007.

[7] Castillo C. Effective Web Crawling, Ph.D. Thesis, University of Chile, Nov 2004.

[8] Baeza-Yates R. Challenges in the interaction of information retrieval and natural language processing. In Proceedings of 5th international conference on Computational Linguistics and Intelligent Text Processing (CICLing), Lecture Notes in Computer Science Springer, Vol. 2945, February 2004. pp:445–456.

[9] Tomasic A, Garcia-Molina H. Performance of inverted indices in sharednothing distributed text document information retrieval systems. In Proceedings of the second international conference on parallel and distributed information systems, IEEE Computer Society Press, 1993. pp: 8-17.

[10] Moussea V. Figueria J. Gomes silv C. Resolving Inconsistencies Among Constraints on the parameters of MCDA model", European journal of operational research. Volume 147,pp: 72-93.

[11] Zhang Y, Chen W, Wang D, Yang Q. User-click modeling for understanding and predicting search-behavior. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD'11). ACM, New York, NY, USA, August 2011. pp: 1388-1396. [12] Zhao D, Zhang M, Zhang D. A Search Ranking Algorithm Based on User Preferences. Journal of Computational Information Systems, November 2012. pp: 8969-8976. [13] Attenberg J, Pandey S, Suel T. Modeling and predicting user behavior in sponsored search. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD '09). ACM, New York, NY, USA, 2009. pp:1067-1076. [14] Yu J, Lu Y, Sun S, Zhang F. Search Results Evaluation Based on User Behavior. Springer-Verlag Berlin Heidelberg, vol. 320, 2013. pp: 397-403. [15] Dupret G, Liao C. A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine. In Proceedings of the third ACM international conference on Web search and data mining (WSDM '10). ACM, New York, NY, USA, February 2010. pp: 190-181. [16] Liu C, White RW, Dumais S. Understanding web browsing behaviors through Weibull analysis of dwell time. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR '10). ACM, New York, NY, USA, 2010. pp: 386-379.

[17] Yang C, Liu C, Shao-chieh H, A Hybrid Item-Based Recommendation Ranking Algorithm Based on User Access Patterns. Springer-Verlag Berlin Heidelberg, vol. 163, 2013. pp: 233-225.

[18] Guo F, Liu C, Wang Y. Efficient Multiple-Click Models in Web Search. The definitive version will appear in WSDM '09: Proceedings of the second ACM international conference on web search and data mining. 2008 ACM. [19] Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing and Management: an International Journal, Vol. 24 No. 5, 1988. pp:513–523. [20] Salton G. The SMART retrieval system - experiments in automatic document processing. Prentice-Hall, 1971. [21] Zhai C. A brief review of information retrieval models. Technical report, Department of Computer Science, University of Illinois at Urbana-Champaign, 2010. [22] Sparck Jones K, Walker S, Robertson SE. A probabilistic model of information retrieval: development and comparative experiments - part 1 and part 2. Information Processing and Management, Vol. 36 no. 6, 2012. pp:779-808 and 804-809.

[23] Robertson SE, Walker S, Jones S, Gatford M. Okapi at TREC-3. In Harman, D. K., editor, The Third Text REtrieval Conference (TREC-3), pp: 109-126.

[24] Neelam Duhan, A. K. Sharma, Komal Kumar Bhatia, page ranking Algorithms: A Survey, IEEE International Advance Computing Conference IACC 2009 Patiala, India, March 2009. pp: 6-7.

[25] Jain R, Dr Purohit GN, page ranking Algorithms for Web Mining, International Journal of Computer Applications (0975-8887)Volume 13- No.5, January 2011.

[26] Saxena PC, Gupta JP, Gupta N. Web page ranking Based on Text Content of Linked Pages, International Journal of Computer Theory and Engineering, Vol. 2, No. 1 February, 2010. pp:1793-1800. [27] Haveliwala T. Topic-sensitive pagerank. In Proceedings of the Eleventh Int'l World Wide Web Conf. 2002. [28] Gyongyi Z, Garcia-Molina H, Pedersen J. Combating web spam with TrustRank. In Proceedings of the International Conference on Very Large Databases (VLDB), 2004. pp: 576-587. [29] Xue GR, Yang Q, Zeng HJ, Chen Z. Exploiting the hierarchical structure for link analysis. In SIGIR, August 16-19, Salvador, Brazil, 2005. pp:186-193. [30] Signorini A. A Survey of Ranking Algorithms. Department of Computer Science University of Iowa, September 11, 2005

How To Access The File

Development of web mining techniques in order to personalize information in search engines

Number of pages: 190 Category: Computer Engineering

Master's Thesis of Computer Software Engineering (M.Sc) Abstract The dynamic nature of the global network and its growing dimensions have made accurate information retrieval difficult. Incorrect answers returned by search engines, especially for query terms with different meanings, have caused the dissatisfaction of web users who need accurate answers to their information ...

Detection of web spam using data mining techniques

Number of pages: 95 Category: Computer Engineering

Master's thesis (M.sc) Abstract: Nowadays, spam [1] is one of the main problems of search engines, because they make the quality of search results unfavorable. In recent years, there have been many advances in detecting fake pages, but new spamming techniques have also emerged in response. It is necessary to improve anti-spam techniques to overcome these attacks. A common ...

Cluster optimization using evolutionary algorithms for web personalization

Number of pages: 79 Category: Computer Engineering

Master's Thesis Field: Computer Engineering Major: Software Abstract Expensiveness of information is a major problem in the current web. To deal with this problem, web personalization systems have been provided that adapt the content and services of a website to people based on their interests and browsing behavior. A fundamental component of any web personalization system is ...

Optimizing the link importance detection method in the link database and its application in the architecture of search engines

Number of pages: 119 Category: Computer Engineering

Master's Thesis of Computer-Software Engineering (M.Sc) Abstract In the age of information, the web has become one of the most powerful and fastest means of communication and interaction between people. Search engines as web applications automatically navigate the web and receive a set of available documents. The process of receiving, storing, classifying and indexing is done ...

Creating a recommender system on the web using user profiles and machine learning methods

Number of pages: 85 Category: Computer Engineering

Computer Engineering Master's Thesis Abstract Web development that lacks an integrated structure creates many problems for users. Not finding the information needed by users in this huge warehouse is one of the problems of web users. In order to deal with these problems, web personalization systems have been provided, which by finding the behavior patterns of users without their ...

Presenting a feature-based model to analyze the sentiment in texts

Number of pages: 74 Category: Computer Engineering

Master's Thesis in Computer Engineering (Software) First Chapter Preface 1-1- Introduction Some authors define data mining as a tool to search for useful information in a large amount of data. To perform the data mining process, we encounter various research fields, such as database, machine learning and statistics. Databases are essential for analyzing large amounts of data. ...

Diagnosing pelagism using graphs in Persian texts

Number of pages: 79 Category: Computer Engineering

Master's Thesis in Computer Engineering Major: Software Abstract The focus of this thesis is on the search for graph-based similarities in texts related to natural languages. The need for a strong method to present texts is an important issue in the field of plagarism detection. In this project, according to this need, we have introduced a powerful method to present natural ...

Identifying the appropriate features in the text to resolve semantic ambiguity

Number of pages: 87 Category: Computer Engineering

Master's thesis in the field of computer engineering - software abstract identifying the appropriate characteristics in the text to resolve the semantic ambiguity, it can be boldly claimed that the present age is the age of information explosion and perhaps language can be considered as the most important barrier and obstacle in the transmission of information. Therefore, the ...

Predicting exploitation and clustering of vulnerabilities by means of text mining

Number of pages: 105 Category: Computer Engineering

A master's thesis in the field of computer-software engineering. Software vulnerabilities can lead to financial and information losses. Due to the limited financial and human resources, prioritizing the damages is very important. Before this research, a large number of researchers have classified vulnerabilities based on empirical and statistical knowledge. However, the variable ...

Retrieving images of fighter planes based on 3D model

Number of pages: 106 Category: Electrical Engineering

Master's Thesis for Master's Degree in Electrical and Robotics Department of Electronics Abstract In this thesis, the problem of retrieving images of fighter planes from a database including 10 different models based on a query image [1] has been investigated. Airplanes of the same model as the query image are identified in the database and presented to the user. The main ...

Presenting a model for ranking web documents based on user interactions

Summary of Presenting a model for ranking web documents based on user interactions

Contents & References of Presenting a model for ranking web documents based on user interactions