Cluster optimization using evolutionary algorithms for web personalization

Number of pages: 79 File Format: word File Code: 31048
Year: 2014 University Degree: Master's degree Category: Computer Engineering
  • Part of the Content
  • Contents & Resources
  • Summary of Cluster optimization using evolutionary algorithms for web personalization

    Master's Thesis Field: Computer Engineering Focus:

    Software

    Abstract

    Information overload is a major problem in today's web. To deal with this problem, web personalization systems have been provided that adapt the content and services of a website to people based on their interests and browsing behavior. A fundamental component of any web personalization system is its user model. The purpose of web personalization is to provide the content and services needed by users through the knowledge obtained from the previous interactions of users on web pages. Currently, several clustering methods are available for web personalization. The methods that have been presented so far had problems in some cases. Of course, new techniques have been presented to solve these problems and improve them. But in most of these techniques, there are issues of data redundancy and high scaling. Considering that the increase of web users leads to the increase of the cluster size, the need to optimize the clusters will be inevitable. In the research, a cluster optimization methodology based on fuzzy system is presented. In order to increase the final accuracy of clustering, genetic algorithm has been used to adjust the parameters of membership functions. The simulation results show that the proposed method increases the accuracy of web page clustering to a significant extent. 

    Keywords: personalization of web pages - clustering - application of web mining - C Means fuzzy algorithm - Yandex database.

                                        Chapter 1

    Generalities of the research:

    With the development of information systems, data has become one of the most important resources of organizations. Therefore, methods and techniques are needed to efficiently access data, share data, extract information from data and use this information. With the creation and expansion of the web and the significant increase in the volume of information, the need for these methods and techniques is felt more than ever. The web is a vast, diverse and dynamic environment where many users publish their documents. There are currently more than two billion pages on the web, and this number is growing at a rate of 7.3 million pages per day. Considering the vast amount of information on the web, it is almost impossible to manage it with traditional tools, and new tools and methods are needed to manage it. In general, web users face the following problems when using it:

    1. Finding relevant information: It is difficult to find the required information on the web. Traditional methods of information retrieval that are used to search for information in databases cannot be used on the web, and users usually use search engines, which are the most important and common tools for finding information on the web. These engines receive a query based on keywords from the user and in response provide him with a list of documents related to his query, which are sorted based on the degree of relevance to this query. But search engines have two main problems (Baeza-Yates, 2004). First, the accuracy of search engines is low, because these engines retrieve hundreds or thousands of documents in response to a user query, while many of the documents retrieved by them are not related to the user's information needs (Bharat, and et. al., 2001). Secondly, the recall rate of these engines is low, which means that they are not able to retrieve all the documents related to the user's information needs. Because the volume of documents on the web is very large and search engines are not able to keep the information of all web documents in their databases (Chakrabarti, and et. al., 1999).

    2. Creating new knowledge using the information available on the web: Currently, the question is how to convert the abundant data available on the web into usable knowledge, so that finding the required information in it is easy. Also, how to get new information and knowledge using web data.

    3. Information privatization: since different users each have a particular taste about the type and manner of information representation, this issue should be taken into consideration by information providers on the web. For this purpose, according to the wishes and desires of different users, the way to present information to them should be customized.

    Web mining techniques are able to solve these problems (Chakrabarti, 2000).

    1-1-Problem definition

    Web is an unyielding part.

    1-1-Problem definition

    The web has become an unyielding part of the world and web browsing is an important activity for customers who shop online (Varghese, 2012). As mentioned, due to the large amount of information on the web, it is almost impossible to manage it with traditional tools, and new tools and methods are needed to manage it. One of these methods is web mining. In general, web mining can be considered as data mining on the content, structure and application data of the web. The purpose of web mining is to discover models and patterns hidden in web resources. The purpose of web mining of web application is specifically to discover the behavioral patterns of web users. Discovering such patterns from the huge amount of data generated by web servers has important applications (Anand, and Mobasher, 2005). Among them, we can mention the systems that evaluate the effectiveness of a site in meeting the user's expectations, techniques to dynamically balance the load and optimize web servers to reach users more effectively, and applications related to restructuring and adapting a site based on the anticipated needs of the user.

    Discovery of extracting useful information from web data or blog files, improving the efficiency of web information, and providing technology for web applications, for example, personalization, etc., are among other goals of web mining. For decision management, the result of web application research can be used for advertising, improving web design, improving customer satisfaction, guiding the decision strategy of market analysis and organization (Naveena Devi et al., 2012).

    In recent years, web application research techniques have been presented as another user-based approach in web personalization that reduces some of the problems related to mass filtering. In particular, web application web mining has been used to increase the scalability of traditional personalized systems based on collective filtering techniques.

    Web page personalization involves clustering different web pages that have a similar pattern. Web personalization uses the technique of web mining to customize web pages for a specific user. This issue includes extracting user sessions from log files. A user session is a sequence of web pages accessed by the user in a specific period of time. 1-2- Importance and necessity of research With the sudden growth of the web size and the use of the world wide web, it became very difficult for users to effectively access relevant and interested information. The need to anticipate user needs in order to improve site usability and user retention is obvious and can be addressed through personalization. Web personalization is the processing of a site to address the needs of a specific user or groups of users using knowledge obtained through the analysis of user browsing behavior. The purpose of the web personalization system is to provide information and needs of users, without explicitly asking them questions.

    Any action that adapts the information or services provided by a website to the needs of a user or a specific group of users by applying the knowledge obtained from the user's browsing behavior and his special interests in combination with the content and structure of the website is called web personalization (Eirinaki, 2003).

    Generally, the goals of web personalization are From:

    The personalization of the services provided by a website plays an important role in reducing the cost of information and makes the website a more user-friendly environment for people.

    By providing the user's desired information in the right way and at the right time, it improves the user's circulation on the website.

    In e-commerce, it provides a mechanism to better understand the customer's needs, identify his future desires and ultimately increase the customer's adherence to the service provided.

    In the year Recent web application web mining techniques have been presented as another user-centered approach to web personalization that alleviates some of the problems associated with mass filtering. In particular, web application web mining has been used to increase the scalability of traditional personalized systems that are based on collective filtering techniques. For example, personalization focuses on processing web user identification, gathering information through user preferences or interests.

  • Contents & References of Cluster optimization using evolutionary algorithms for web personalization

    List:

    Summary.. 1

    The first chapter. 2

    1-1-Introduction. 3

    1-2- Problem definition. 4

    1-3-The importance and necessity of research. 5

    1-4- Research method. 8

    1-5- thesis framework. 8

    References.. 10

    The second chapter:. 11

    2-1-Introduction. 12

    2-2- Focus on the work done. 12

    References.. 21

    The third chapter:. 24

    3-1-Introduction. 25

    3-2-stages of web mining. 26

    3-2-1-types of web browsing. 27

    3-3-web personalization. 28

    3-3-1-reasons for the need to personalize the web. 28

    3-3-2-steps of web personalization. 29

    3-3-2-1-data collection. 30

    3-3-2-2- Data processing. 31

    3-3-2-3- pattern discovery. 31

    3-3-2-4-Knowledge analysis. 31

    3-3-3-User modeling techniques in web personalization. 31

    3-3-3-1-tf-idf technique. 32

    3-3-3-2-meta model technique and OLAP tool. 32

    3-3-3-3-technique based on web content. 33

    3-3-3-4-technique based on providing effective data (ODP). 34

    3-3-3-5-web personalization using combined methods. 34

    3-3-3-6-web personalization based on inductive algorithm and tf-idf technology. 35

    3-3-3-7- web personalization using sequential pattern mining and pattern tree. 35

    3-4-Clustering for web personalization. 35

    3-4-1- Fuzzy clustering. 36

    3-4-1-1-basic fuzzy clustering algorithm. 36

    3-4-1-2- Ka-Means fuzzy algorithm. 36

    3-4-1-3-clustering web pages using k-means fuzzy clustering. 37

    3-4-2-genetic algorithm. 39

    3-4-2-1-optimization of fuzzy clustering using genetic algorithm. 40

    3-4-3-Proposed method in this research. 42

    3-4-4-Overview of the proposed system. 42

    3-4-5-an example of the proposed system. 43

    3-4-6-pseudo code of the proposed method. 50

    3-5-Conclusion. 51

    References.. 53

    Chapter Four:. 55

    4-1-Introduction. 56

    4-2-data collection. 56

    4-2-1- YANDEX Dataset. 57

    4-2-1-1-Preprocessing done with raw data sets before publishing. 57

    4-3-Evaluation parameters. 60

    4-4- Tests performed. 61

    4-4-1-Used hardware. 62

    4-4-2-Test results. 62

    4-5-Conclusion. 64

    References:.. 65

    Chapter Five:. 66

    5-1-Introduction. 67

    5-2-Results and achievements of the project. 68

    5-3-Proposals. 68

    References.. 70

     

    Source:

    [1].    Anand, S. S., & Mobasher, B. (2003, August). Intelligent techniques for web personalization. In Proceedings of the 2003 international conference on Intelligent Techniques for Web Personalization (pp. 1-36). Springer-Verlag.

     

    [2].    Baeza-Yates, R. (2004, January). Web mining in search engines. In Proceedings of the 27th Australasian conference on Computer science-Volume 26 (pp. 3-4). Australian Computer Society, Inc.

     

    [3].    Bharat, K., Chang, B. W., Henzinger, M., & Ruhl, M. (2001). Who links to whom: Mining linkage between web sites. In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on (pp. 51-58). IEEE.

     

    [4].    Chakrabarty, S. (2000). Data mining for hypertext: A tutorial survey. ACM SIGKDD Explorations Newsletter, 1(2), 1-11.

     

    [5].    Chakrabarti, S., Dom, B. E., Kumar, S. R., Raghavan, P., Rajagopalan, S., Tomkins, A., & Kleinberg, J. (1999). Mining the Web's link structure. Computer, 32(8), 60-67.

     

    [6].    Devi, B. N., Devi, Y. R., Rani, B. P., & Rao, R. R. (2012). Design and Implementation of Web Usage Mining Intelligent System in the Field of e-commerce. Procedia Engineering, 30, 20-27.

     

    [7].    Eirinaki, M., & Vazirgiannis, M. (2003). Web mining for web personalization. ACM Transactions on Internet Technology (TOIT), 3(1), 1-27.

     

    [8].    Varghese, N. M., & John, J. (2012, October). Cluster optimization for enhanced web usage mining using fuzzy logic. In Information and Communication Technologies (WICT), 2012 World Congress on (pp.948-952). IEEE.

     

     [1].   Acharyya, S., & Ghosh, J. (2003, August). Context-sensitive modeling of web-surfing behavior using concept trees. In Proc. of the WebKDD Workshop on Web Mining and Web Usage Analysis (pp. 1-8).

     

    [2].   Aghabozorgi, S. R., & Wah, T. Y. (2009, December). Using incremental fuzzy clustering to web usage mining. In Soft Computing and Pattern Recognition, 2009. SOCPAR'09. International Conference of (pp. 653-658). IEEE.

     

    [3].   Baraglia, R., & Palmerini, P. (2002, April). Suggest: A web usage mining system. In Information Technology: Coding and Computing, 2002. Proceedings. International Conference on (pp. 282-287). IEEE. Britos, P., Martinelli, D., Merlino, H., & Garc?a-Mart?nez, R. (2007). Web usage mining using self organized maps. IJCSNS, 7(6), 45. [4].   Banerjee, S., & Pedersen, T. (2003, August). Extended gloss overlaps as a measure of semantic relatedness. In IJCAI (Vol. 3, pp. 805-810).

     

    [5].   Dai, H. K., & Mobasher, B. (2002). Using ontologies to discover domain-level web usage profiles. Semantic Web Mining, 35. [6].   Eirinaki, M., Vazirgiannis, M., & Varlamis, I. (2003, August). SEWeP: using site semantics and a taxonomy to enhance the Web personalization process. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 108-99). ACM.

     

    [7].   Etminani, K., Akbarzadeh-Totonchi, M. R., & Yanehsari, N. R. (2009). Web Usage Mining: users' navigational patterns extraction from web logs using ant-based clustering method. In IFSA/EUSFLAT Conf. (pp. 396-401).

     

    [8].   Kosala, R., & Blockeel, H. (2000). Web mining research: A survey. ACM Sigkdd Explorations Newsletter, 2(1), 1-15.

     

    [9].   Leacock, C., & Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. WordNet: An electronic lexical database, 49(2), 265-283.

    [10]. Lieberman, H., Van Dyke, N., & Vivacqua, A. (1999). Let's browse: a collaborative browsing agent. Knowledge-Based Systems, 12(8), 427-431. [11]. Magnini, B., & Strapparava, C. (2004). User modeling for news web sites with word sense based techniques. User Modeling and User-Adapted Interaction, 14(2-3), 239-257.

     

    [12]. Maratea, A., & Petrosino, A. (2009, November). An heuristic approach to page recommendation in web usage mining. In Intelligent Systems Design and Applications, 2009. ISDA'09. Ninth International Conference on (pp. 1043-1048). IEEE.

    [13]. Markov, Z., & Larose, D. T. (2007). Data mining the Web: uncovering patterns in Web content, structure, and usage. John Wiley & Sons.

     

    [14]. Masseglia, F., Poncelet, P., & Cicchetti, R. (2000). An efficient algorithm for web usage mining. Networking and Information Systems Journal, 2(5/6), 571-604. [15]. Minio, M., & Tasso, C. (1996, January). User modeling for information filtering on internet services: Exploiting an extended version of the umt shell. In UM96 Workshop on User Modeling for Information Filtering on the WWW (pp. 2-5).

    [16]. Miranda, T., Claypool, M., Gokhale, A., Mir, T., Murnikov, P., Netes, D., & Sartin, M. (1999). Combining content-based and collaborative filters in an online newspaper. In Proceedings of ACM SIGIR Workshop on Recommender Systems. [17]. Mobasher, B., Cooley, R., & Srivastava, J. (2000). Automatic personalization based on Web usage mining. Communications of the ACM, 43(8), 142-151. [18]. Mobasher, B., Dai, H., Luo, T., Sun, Y., & Zhu, J. (2000). Integrating web usage and content mining for more effective personalization. In Electronic commerce and web technologies (pp. 165-176). Springer Berlin Heidelberg.

     

    [19]. Nasraoui, O.

Cluster optimization using evolutionary algorithms for web personalization