Contents & References of Optimizing the link importance detection method in the link database and its application in the architecture of search engines
List:
Abstract 1
Chapter One: General. 2
1-1 Introduction. 3
1-2 statement of the problem. 4
1-3 The importance and necessity of conducting research. 5
1-4 thesis structure. 6
Chapter Two: Fundamentals and Basic Concepts 7
2-1 Introduction. 8
2-2 types of search engines. 13
2-2-1 Keyword Engines. 13
2-2-2 Search Engines by Subject Directory. 13
2-2-3 Crawler-based search engines 15
2-2-3-1 Difference between directory engines and crawler-based engines 16
2-2-4 Hybrid search engines. 16
2-2-5 Meta search engines 17
2-2-5-1 List of search engines. 17
2-2-5-2 Sequential search. 17
2-2-5-3 simultaneous search. 17
2-2-6 Smart search engines. 18
2-2-7 Cost-Based Search Engines. 18
2-3 architecture of search engines. 20
2-4 architectural components of search engines. 22
2-5 Repository Update Strategies. 27
2-5-1 batch method or permanent crawler. 27
2-5-2 Partial or full searches. 32
2-6 The two main profiles of the indexing unit. 28
2-7 An example of how the search engine works. 31
2-8 steps of search engines. 31
2-8-1 Data preprocessing 31
2-8-2 Prioritization of results. 32
2-9 Tags 33
2-9-1 Descriptive text tags. 33
2-9-2- on the alt tag. 33
2-10 robots.txt file 34
2-11 position and distance. 34
2-12 crawler problems 35
2-13 search engine optimization methods. 35
2-13-1 Indexing. 35
2-13-2 Prevention of creep and the standard of exiting robots 35
2-13-3 Increasing importance. 36
2-14 ranking algorithms. 37
2-14-1 Rating parameters. 37
2-14-2 Giving weight to words. 37
2-14-3 Evaluation of keywords. 37
2-14-4 Weighting parameters. 38
2-14-5 Tolerable recovery. 38
2-14-6 general algorithm for finding spelling errors in search engines. 38
2-14-7 spelling mistakes. 39
2-14-8 Editing distance algorithm. 39
2-14-9 K-gram proximity algorithm. 40
2-14-10 Context-sensitive error detection. 40
2-14-11 The concept of connection. 41
2-14-11-1 Relevance from the user's point of view. 42
2-14-11-2 Relevance in terms of recovery system. 42
2-14-12 Asking the user's opinion in the rating. 43
2-14-13 Major Search Engines. 43
2-14-13-1 Google. 43
2-14-13-2 Excite. 44
2-14-13-3 Altavista. 44
2-14-13-4 Yahoo. 44 2-14-13-5 Fast 44 2-14-13-6 Lycos 44 2-14-14 News search engines. 45
2-14-15 Metacrawler. 46
2-14-16 Profitable search engines. 48
2-14-17 Payment List Search Engines. 49
2-14-18 Proprietary search engines. 49
2-14-19 Search for answers. 50
2-14-20 Children's search engines. 51
2-14-21 Regional Search Engines. 51
2-15 Conclusion. 52
Chapter 3: Web crawling architecture and crawling strategies. 53
3-1 Introduction. 54
3-2 Architecture of web crawlers. 54
3-3 page selection. 56
3-4 The importance of the page. 57
3-5 Challenges of running a crawler 57
3-5-1 Selecting pages to download. 57
3-5-1 Selecting pages to download. 57
3-6 Complexities of the crawling process. 58
3-6-1 measurement strategies for choosing pages. 58
3-6-1-1 Criteria based on users' tendencies. 58
3-6-1-2 Criteria based on page reputation. 58
3-6-1-3 criteria based on the location of pages. 58
3-7 How to start and end the process of extracting and storing web pages. 59
3-7-1 Creep and stop. 59
3-7-2 Threshold value-based creep and stop. 59
3-8 strategies for updating pages. 60
3-8-1 Integrated Update Policy. 60
3-8-2 Update Policy. 60
3-8-2 Relative Update Policy. 60
3-9 Minimizing the Load on Visited Websites 60
3-10 Parallelizing the Crawler Process 60
3-11 Web Structure. 61
3-12 Creep Strategies. 62
3-12-1 Uninformed Search. 62
3-12-1-1 depth first move. 62
3-12-1-2 first level movement. 63
3-12-1-3 Uniform cost search. 65
3-12-2 Informed or exploratory search. 66
3-12-2-1 best-start move. 67
3-12-2-2 Search * A. 69
3-12-3 Local search. 69
3-12-3-1 Hill Climb Search. 70
3-12-3-2 Local beam search. 70
3-12-3-3 Heat simulation search. 71
3-12-3-4 Acceptance threshold algorithm. 72
3-12-3-2 Local beam search. 70
3-13 Conclusion. 73
Chapter four: Analysis of research results. 74
4-1 Introduction. 75
4-2 First step: Checking the first level method. 75
4-3 The second step: checking the first depth method. 80
4-4 The third step: examining the combined method. 86
4-4-1 1st formation: 1st level navigation as BFS. 86
4-4-2 2nd formation: navigation of the first and second level as BFS. 86
4-4-3 3rd combination: navigation of the first, second and third levels as BFS. 86
4-5 Step Four: Check the best-start method. 86
4-6 The fifth step: Examining the hill climbing method. 87
4-7 Experimental results obtained 88
4-8 The number of pages downloaded for each query. 90
4-9 Conclusion. 91
Chapter five: conclusion and suggestions. 97
5-1 Conclusion and final summary. 93
5-2 Suggestions and future work 100
Resources. 101
Source:
Persian sources
Aristopour, S., 1385, "Khezandeh and Web Structure", Library and Information Magazine, Volume 9, Number 2, pp. 4-15.
Ismaili, M. Tawakli, Hashemi Majed, S., 2013, "Web crawlers", APA specialized laboratory in the field of information and communication technology security, document number: APA_FUM_W_WEB_0111, pp. 5-28.
Anuri, S., 2015, "Investigation of search engines and comparison of the Pag Rank algorithm with the "HITS" algorithm, the first conference on intelligent computer systems and their applications. p. 2-7.
Latin sources
Ahmadi-Abkenari, F and Selamat, A, 2012, “An Architecture for a Focused Trend Parallel Web Crawler with the Application of Clickstream Analysis”, International Journal of Information Sciences, Vol. 184, pp: 266-281.
Ahmadi-Abkenari, F and Selamat, A, 2013, “Advantages of Employing LogRank Web Page Importance Metric in Domain Specific Web Search Engines”, JDCTA: International Journal of Digital Content Technology and its Applications, Vol. 7, No. 9, pp: 425-432. 6, No.1, pp: 200-207.
Arasu, A, Cho, J, Garcia-Molina, H, Paepcke, A and Raghavan, S, 2001, “Searching the Web”, ACM Transactions on Internet Technology, Vol. 1, No. 1, pp: 2–43.
Baeza-Yates, R, Castillo, C, Marin, M and Rodriguez, A, 2005, “Crawling a country: Better strategies than breadth-first for Web page ordering”, In Proceedings of the 14th international conference on World Wide Web/ Industrial and Practical Experience Track, Chiba, Japan, ACM Press, pp: 864– 872.
Baeza-Yates, R, Carlos, C and Jean, F.S, 2004, "Web Dynamics, Structure, and Page Quality", In Mark Levene and Alex Poulovassilis (editors), Web Dynamics Springer Verlag, pp: 93-109.
Brin, S and Page, L, 1998, "The Anatomy of a Large-Scale Hypertextual Web Search Engine", International Journal of Computer Networks, vol. 30, Issue. 1-7, pp: 107-117.
Brandman, Onn, Cho, J and Garcia-Molina, H, 2000, “Crawler Friendly Servers”, In Proceedings of the Workshop on Performance and Architecture of Web Servers (PAWS), Santa Clara, California, Vol. 28, Issue. 2, pp: 9-14.