Development of automatic tools to identify design patterns with tag refinement and correction operations

Number of pages: 69 File Format: word File Code: 31086
Year: 2013 University Degree: Master's degree Category: Computer Engineering
  • Part of the Content
  • Contents & Resources
  • Summary of Development of automatic tools to identify design patterns with tag refinement and correction operations

    Master's Thesis in Computer Engineering (Software)

    Abstract

    Development of automatic design pattern recognition tools with tag refinement and correction operations

    Design patterns are proven and reliable solutions that are provided to answer some frequently occurring issues in Shigra software design. They understand the code, means to recover the design and the secret purpose of the designer and ease in maintainability. Since the ease of maintaining the system is very important and inevitable, the production of automatic tools to identify patterns was considered. Most current detection tools have a high recovery rate. But in identifying patterns, especially with similar structure and function, they produce high false positives. Therefore, the refinement operator was also proposed. Refinement tries to identify false positives and remove them. In this work, a new operator named "tag correction" is introduced. This operator first identifies false positives, then instead of removing them from the output, corrects their correct identity with the help of a new set of criteria introduced in this work. Operator automation with data mining. The results of the presented method correct the output of the tools with a learning accuracy of 97.8% in the "multi-label" category, with an average of 99.3% in the "one vs. all" category, and an average of 99.6% in the "two by two" category.

    Chapter One

    Introduction

    Although designing a Shigra software has its own difficulties, it is more difficult to design a software Shigra is reusable. Design patterns facilitate the use of successful designs and architectures [1]. Design patterns are proven and reliable solutions that are used to solve problems that frequently occur in the design of a software. A design pattern has its own purpose and unit structure. Patterns describe the roles, responsibilities, how the classes cooperate and examples of the participants in this cooperation. Therefore, by extracting the design patterns from the source code, we are able to reveal the purpose and design of a software system [5]. The correct use of design patterns in the development of a smart software can significantly improve the quality of the source code in terms of maintainability and reusability. The most important issue of maintainability of software systems, especially old systems, is that they lack a complete document of the system design and its goals. Therefore, identifying design patterns automatically or semi-automatically facilitates system coding, maintainability and reusability. Many researchers have worked in the field of identifying design patterns (automatically or semi-automatically).  But none of them could provide developers with a reliable output without false positives. In general, the methods of identifying design patterns are divided into two categories. Those that perform the identification work based on the structural aspects of the patterns and those that use the behavioral aspects in the patterns for identification [5].

    Targeting the structural aspects

    Some methods consider only their structural aspect to identify the patterns. First, the structural characteristics of each class in the source code are compared with each role forming a pattern and the candidates for each role are identified. Then, candidates for roles that can be related to each other are combined. Finally, inter-class relations are analyzed and compared with patterns without regard to behavioral characteristics. Interclass relations include inheritance, return types, definition [1], generalization [2], binding [3], and . they become For example, SPOOL [19], DP++ [18], Osprey [20], and [21] identify patterns in the above structural way.

    Balania and his colleagues [3], using a framework called Columbus, extracted abstract semantic graphs [4], and to identify patterns based on the comparison of graphs [5]. Also [2] uses explicit semantics [6] to find patterns on the extracted semantic graph. In any case, in order to identify the patterns, in addition to the structural characteristics, it is also necessary to analyze the behavioral characteristics.

    Targeting the behavioral aspects

    The methods discussed in the previous section are able to recognize patterns that are structurally the same but have different behavior, such as the "strategy" pattern vs.

    Targeting behavioral aspects

    The methods discussed in the previous section are not able to distinguish patterns that are structurally the same but have different behavior, such as the "strategy" pattern versus the "situation" pattern. Methods that target behavioral aspects try to solve this problem by using machine learning, dynamic analysis [7] [24] or static analysis [8] [5].

    Dynamic analysis

    These methods use runtime data [9] to detect behavioral aspects of patterns. KT [22], only used dynamic analysis to detect the "chain of responsibilities[10]" pattern, but the result was not successful (due to the inappropriateness of the message logging mechanism[11] and insufficient test data).  Dynamic analysis requires good coverage of test data in order to apply every possible execution path. Such test data are often not available. Even if test data is available, runtime results may be misleading. Because such data are basically not designed to detect the behavior of specific patterns [5].

    Analysis and static analysis

    These methods basically apply static analysis methods to abstract semantic tree [12] in the body of methods. The reference [25] used the "create path-insensitive object" statements[13] to distinguish between the "abstract factory" and "factory method" patterns[15]. To identify patterns with similar structure and functionality, it is very important to extract the pattern target in the implementation in order to differentiate them. But most of such methods are not able to extract the purpose of the program [5].

    In this work, the automatic tools for identifying design patterns are extended with two operations of "refining" and "label correction". The filtering process identifies most or even all false positives generated in the output of automated tools. While label correction identifies the correct identity of the false positive sample (in terms of which false positive is false positive due to similarity with which sample) using the criteria defined in this work and data mining methods. For example, if the sample detected by automatic tools is identified with the label "strategy", but in fact this sample is not a strategy, it is first recognized as a false positive (refining), then its correct label (according to the values ??of the criteria) is recognized. Therefore, the label of strategy changes to status. By using these criteria and data mining methods, in addition to false positive samples, false negative samples are also reduced in some situations. Because when a tag is corrected, for example, from strategy to status, this status sample may not be among the identified samples for the status pattern, so this status sample is considered a false negative, and by identifying it, we have removed a false negative.

    Given that the previous tools and methods produce the most false positives in identifying patterns with similar structure and function, in this work, we have tried to correct the output of the tools with the ability to identify such patterns. Therefore, label correction is performed on the extracted samples of the strategy pattern (due to its structural and functional similarities with other patterns, according to the output of the tools, the most false positives have been produced in its identification). First, based on common data mining methods, a data set has been prepared with false positive and correct positive samples of the strategy model obtained by the tools. Then, based on the available documentation as well as manual review, in order to predict the correct identity of each sample, two columns have been determined in the data set. One column is labeled "true" and "false" (true if correctly identified by the tools and false if false positive), and the other column is labeled with the correct pattern name for that sample or no pattern if unknown. Then the values ??of criteria or predictors extracted in this work are calculated on each sample and finally the data set is provided to data mining algorithms for modeling. In modeling, the knowledge in the data is tried to be extracted in the form of a series of rules. These rules are used to identify unknown (new) samples in the dataset.

    Experiments have been performed using three machine learning algorithms C5.0, Boosting and SVM. The proposed method has been performed on three open source software jhotdraw6 [6], jrefactory [7] and javaio [8].

  • Contents & References of Development of automatic tools to identify design patterns with tag refinement and correction operations

    List:

    Introduction. 8

    Assumptions and limitations of the problem. 12

    Necessity of research. 13

    Aim of research. 13

    Head of content. 14

    Definitions and basic concepts. 17

    Introduction. 17

    Classification techniques. 18

    Efficiency evaluation criteria. 19

    Summary. 21

    Overview of previous researches. 23

    Introduction. 23

    Previous studies on automatic and semi-automatic identification of design patterns and their limitations. 24

    Summary. 28

    Generation of data sets.

    Extracted criteria.31

    Analysis framework for initial identification and correction of design patterns label.48

    Summary.50

    Numerical experiments and results.51

    Introduction.52

    Learning efficiency.52

    Summary.56

    Conclusion and future work.58

    List of sources and sources. 59

    Abstract in English. 62

    Source:

    List of sources and sources

    [1] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Pub Co, 1995.

    [2] J.M. Smith and D. Stotts, “SPQR: Flexible Automated Design Pattern Extraction from Source Code,” Proc. 18th IEEE Int'l Conf. Automated Software Eng. (ASE ’03), Oct. 2003. [3] Z. Balanyi and R. Ferenc. Mining Design Patterns from C++ Source Code. In Proceedings of the 19th International Conference on Software Maintenance (ICSM 2003), pages 305–314. IEEE Computer Society, Sept. 2003.

     

    [4] R. Ferenc, ´A. Besz´edes, L. Fulop, and J. Lele. Design pattern mining enhanced by machine learning. In ICSM, pages 295–304, 2005.

     

    [5] N. Shi and R. A. Olsson, "Reverse engineering of design patterns from java source code," in ASE'06. Washington, USA: IEEE Computer Society, 2006, pp. 123–134.

     

    [6]http://www.javaworld.com/javaworld/jw-02-2001/jw-0216-jhotdraw.html, Accessed April 2013.

     

    [7] Jrefactory, http://jrefactory.sourceforge.net/, Accessed April 2006 2013.

    [8] http://sewiki.iai.uni-bonn.de/research/dpd/benchmarks/start, Accessed April 2013.

    [9] N. Tsantalis, A. Chatzigeorgiou, G. Stephanides, and S. T. Halkidis, “Design pattern detection using similarity scoring,” IEEE TSE, vol. 32, no. 11, pp. 896–909, 2006.

     

    [10] G. Antoniol, R. Fiutem, and L. Cristoforetti. Using Metrics to Identify Design Patterns in Object-Oriented Software. In Proceedings of the Fifth International Symposium on Software Metrics (METRICS98), pages 23–34. IEEE Computer Society, Nov. 1998. [11] M. Zanoni. MARPLE: discovering structured groups of classes for design pattern detection. Master's thesis, Università degli studi di Milano-Bicocca, Milano, Italy, July 2008. [12] M. Zanoni. Data mining techniques for design pattern detection. PhD's thesis, Università degli Studi di Milano Bicocca, 2012. [13] S. Uchiyama, H. Washizaki, Y. Fukazawa and A. Kubo, Design Pattern Detection using Software Metrics and Machine Learning. [14] J. Cooper, design patterns java companion, Addison-Wesley, 1998.

    [15] A. Binun and G. Kniesel. Joining Forces for Higher Precision and Recall of Design Pattern Detection, in proceeding of the 16th conference on software maintenance and reengineering (CSMR2012), March 27-30, Washington, DC, USA, 2012. IEEE Computer society.

     

    [16] Clementine 12, copyright(c) integral solutions ltd, help-modeling nodes, (1994-2007)

    [17] J. Han, M. Kamber and J. Pei, Data Mining third edition: concepts and techniques, M. Kaufmann pub, 2011.

    [18] J. Bansiya. Automating design-pattern identification - DP++ is a tool for C++ programs. Dr. Dobbs Journal, 1998. [19] R. Keller, R. Shauer, S. Robitaille, and P. Pag´e. Pattern-based reverse-engineering of designPattern-based reverse-engineering of design components. In Proc. Of the 21st International Conference on Software Engineering, pages 226-235. IEEE Computer Society Press, May 1999. [20] A. Asencio, S. Cardman, D. Harris, and E. Laderman. Relating expectations to automatically recovered design patterns. In WCRE, pages 87–96, 2002.

     

    [21] M. Vok´a?c. An efficient tool for recovering design patterns from C++ code. Journal of Object Technology, 5(2), March-April 2006. [22] K. Brown. Design Reverse Engineering and Automated DesignPattern Detection in SmallTalk. Master's thesis, North Carolina State University, 1998.

     

     

    [23] http://courses.cs.vt.edu/cs2604/spring04/Notes/C04.AlgorithmAnalysis.pdf

    [24] M. Birkner, object oriented design pattern detection using static and dynamic analysis in java software, master thesis, university of applied sciences bonn-rhein-sieg sankt augustin, Germany august 2007.

    [25] FUJABA. http://www.fujaba.de, Accessed May 2013

    [26] S. Alhusain, S. Coupland, R. John AND M. Kavanagh. Towards Machine Learning Based Design Pattern Recognition. Computational intelligence, (UKCI) 2013. [27] Y. Freund AND R.E. Schapire. A Short Introduction to Boosting, Proc. Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999.

    [28] P. W?egrzynowicz and K. Stencel. "Relaxing Queries to Detect Variants of Design Patterns". Computer Science and Information Systems (FedCSIS), 2013 Federated Conference.

     

    [29] R.S. Rao and M. Gupta. "Design Pattern Detection by Sub Graph Isomorphism Technique". International Journal of Engineering and Computer Science, 2013.

Development of automatic tools to identify design patterns with tag refinement and correction operations