Word Files
Reference for Downloading Educational Files

Diagnosing pelagism using graphs in Persian texts

Number of pages: 79 File Format: word File Code: 31056
Year: 2014 University Degree: Master's degree Category: Computer Engineering

Tags/Keywords: Algorithm - Dependency graphs - Diagnosis of pelargism - Diagnosis of pelargism with the help of graph - Graph - Pelagarism

Part of the Content
Contents & Resources

Summary of Diagnosing pelagism using graphs in Persian texts

Master's Thesis in Computer Engineering

Trend: Software

Abstract

This thesis focuses on graph-based similarity search in natural language texts. The need for a strong method to present texts is an important issue in the field of plagarism detection. In this project, according to this need, we have introduced a powerful method to present natural language and have used it to detect plagarism. For this purpose, we have defined the concept of "graph modification distance" and used it to calculate the distance between two graphs. Sentences are represented by dependency graphs in which words are connected by their dependencies. Dependency graph extracts the grammatical structure of sentences. The graph-based similarity method has been used in the problem of pelargism detection. The main advantage of graph-based presentation is related to the ability of this method to detect similarities between words. The evaluations showed that the results obtained from the dependence graph have better results than the methods of direct comparison of the graphs. Using the graph modification distance to compare two graphs leads to the improvement of the dependency graph results and increases its efficiency. Keywords: graph modification distance, natural language processing, dependency graphs, detection of plagarism. Chapter 1. Introduction. Some people may intentionally or unintentionally use the works of researchers while not even mentioning the name of the main author of the work, this act is called plagarism. can be Plagarism is the intentional or unintentional act of copying or using the ideas and works of others without crediting the original source. Based on the investigations conducted on the issue of plagarism and the number of articles published on the web and the reflection of concern about its occurrence, it is clear that most educational and research centers in their electronic pages in all parts of the world, including in the developing countries of Asia and Africa, under the influence of the publishers with a history of research journals and also to preserve the scientific dignity of their centers, have started to comprehensively introduce this anti-ethical and anti-social phenomenon as a crime. Plagarism is a problem in the scientific community and is growing rapidly, because data and information from electronic documents and the Internet are quickly and easily obtained through copying and pasting from those sources. This problem occurs when the content of illegal documents is found without permission and without citation. This problem is known as plagarism, and plagarism can include a wide range of deliberate manipulations to accidental copying of other people's content. The main goal of this thesis is to express the graph-based method for text presentation and its use in pelarism detection. The following sections of this chapter state the reasons for using this method and describe the problem of pelargism diagnosis. In addition, they provide a solution for this problem. Finally, the structure of the introductory thesis and the topics that will be discussed in the next chapters will be reviewed.

The problem of finding similarities between two texts is a common problem in the field of natural language processing. In order to evaluate the similarities between two texts, each text needs a method of presentation. One method is simple text, where a list of words forms a sentence. Plain text is usually used because of its simplicity, but it lacks clear information about the grammatical structure.

Some aspects of language are better presented using structural presentation methods such as dependency graphs that include words connected together.

Dependency graphs extract the grammatical structure of a sentence, and are limited to the scope of the same sentence. One of the main advantages of graph-based representation is that dependency graphs are in most cases insensitive to word order. This advantage makes it possible to find similarities between sentences in which the word order is scrambled.

A stronger representation provides a better foundation for identifying similarities in complex situations. Consider the two sentences presented in the dependence graph of Figures 1-1 and 1-2. Humans must be able to determine that these sentences have the same meaning. However, the automatic recognition of the similarity of these two sentences will have problems due to the substitution of words. If plain text presentation is used, the only common words will be "to", "out", "from", "shot" and "ball". The words "shot" and "ball" are the only words that convey the meaning of the sentence.By looking closely at the dependency graphs, it is clear that there is structural similarity between the sentences.

With the increasing availability of texts on the web, plagarism has become simpler and easier. A large amount of plagarism texts in the field of teaching and learning are increasing year by year. As a result, there is a strong need for automatic detection of plagarism. 1-1 Problem description The most important part of this thesis is the implementation of an algorithm to calculate the graph modification distance, which calculates the similarity between two graphs. The algorithm is based on the calculation of the number of editing operations required to transform one graph into another [1]. Each editing action has an editing cost, which determines how much an action costs.

Automatic plagarism detection is a research field that is basically based on textual similarity. The problem of using graph-based text similarity to detect plagarism is stated in research question 1. Research question 1: Is graph-based similarity, within a certain graph modification distance, applicable and computationally feasible for plagarism detection? Experimental plagarism detection systems are often based on simpler text representations, such as ngram matching and vector space model [2 and 3]. As a result, the method can be relatively unique. Due to the uniqueness of the method, some implementation details are not defined. Research question 2 states a problem that specifies the details of the algorithm. Research question 2: What is the best method for calculating the graph editing distance between sentences, especially in terms of the cost of editing and presenting the graph in the field of plagarism detection? In order to evaluate the efficiency of the algorithm for calculating the graph editing distance, its efficiency is compared to existing test systems. Research question 3 raises the problem of comparing the algorithm with existing methods. Research question 3: How does graph-based similarity compare with other methods in the detection of plagarism, such as index-based recovery and ngram matching? 1-2 solutions The following objectives guarantee the answers to the questions raised in the previous section. It will be implemented on the test graph editing interval. The system should be able to evaluate many texts and recognize plagiarism at the sentence level.

Research objective 2: Some functions will be presented to calculate the cost of implementation editing. Different methods are compared with each other and the resulting systems are evaluated.

Research objective 3: The output of the experimental system is compared with existing pelargism detection systems. This comparison is considered an experimental evaluation of the system. 1-3 problems in the implementation of the algorithm One of the most important problems in the implementation of semantic plagarism detection software in the Persian language is related to the Persian language itself. As we know, the Persian language is a very complex language and has a grammar with many exceptions, and due to its similarities with the Arabic language, it has problems with the characters that are common in Arabic and Persian, which causes some cases to be misdiagnosed. The Persian language, which is almost similar to the Arabic language, has various forms and changes depending on the third person. These exceptions and thousands of similar problems due to the rich and complex grammar of the Persian language cause the task of detecting plagarism to face many problems and the need to consider many aspects in the implementation of the algorithm.

Another problem that may arise in the field of semantic plagarism detection is that it is not enough to only use the dictionary of synonyms of words, because it is possible that users who engage in the inhuman act of plagarism and copying other people's works, in some cases instead of replacing the synonyms instead of their originals, with a slight change in the structure and appearance of the sentences, use the opposites of the words instead of their originals. Sometimes even without replacing the words, it is possible to create similar sentences with a slight change in the structure of the sentences.
Contents & References of Diagnosing pelagism using graphs in Persian texts

List:

Introduction. 2

1-1 Explanation of the problem. 5

1-2 solutions 6

1-3 problems in algorithm implementation. 6

1-4 thesis structure. 7

Research background. 9

2-1 Diagnosis of pelarism. 9

2-2 dimensions of pelarism diagnosis. 12

2-2-1 Grammar-based method. 12

2-2-2 Meaning-based methods 13

2-2-3 Combined methods. 14

2-2-4 The method of detecting external pelargism. 14

2-3 methods for calculating the degree of similarity of graphs 15

2-3-1 The method of the largest common subgraph - the smallest common supergraph. 15

2-3-2 Method based on state space search. 17

2-3-3 Possible methods. 18

3-1 Diagnosis of pelargism. 23

3-1-1 matching n grams. 23

3-1-2 Expression weighting. 23

3-1-3 Generalization of the expression. 24 3-2 Dependency graphs. 25

3-2-1 dependencies 26

3-3 graph editing interval. 26

3-3-1 Editing operation. 26

3-3-2 The issue of attribution. 27

3-3-3 Cost matrix. 28

3-3-4 Assignment algorithms. 29

4-1 Architecture. 32

4-2 Text preprocessing. 32

4-2-1 Finding sentences. 33

4-2-2 rooting words. 34

4-2-3 Formation of dependency graph. 40

4-3 Candidate extraction 44

4-3-1 Sentence indexing. 45

4-3-2 Extraction of candidate sentences 45

4-4 Analysis of details. 45

4-4-1 Distance algorithm for editing two graphs. 48

4-4-2 Detection of plagarism based on GED provided in this project 49

5-1 Detection of plagarism of word displacement and sentence structure change. 55

5-1-1 10% structural changes. 56

5-1-2 50% structural changes. 57

5-2-2 100% structural changes. 59

5-2 Recognizing semantic pelagism. 60

5-2-1 semantic changes of 10 percent. 60

Conclusions and suggestions. 64

References. 67

Source:

Fankhauser, S., K. Riesen, and H. Bunke. Speeding up graph edit distance computation through fast bipartite matching. Graph-Based Representations in Pattern Recognition, (2011)

Suchomel, S., J. Kasprzak, and M. Brandejs (2012). Three way search engine queries with multi-feature document comparison for plagiarism detection. See Forner et al. (2012).

Grman, J. and R. Ravas Improved implementation for _nding text similarities in large sets of data - notebook for PAN at clef 2011. See Petras et al. (2011).

Asim M. El Tahir Ali, Hussam M. Dahwa Abdulla, and V´aclav Sn´a?sel Overview and Comparison of Plagiarism Detection Tools, Dateso 2011, pp. 161{172, ISBN 978-80-248-2391-1.

A. S. Bin-Habtoor and M. A. Zaher "A Survey on Plagiarism Detection Systems", International Journal of Computer Theory and Engineering Vol. 4, No. 2, April 2012

Sindhu.L, Bindu Baby Thomas, Sumam Mary Idicula A Study of Plagiarism Detection Tools and Technologies, IJART, Vol. 1 Issue 1, 2011,64-70.

Schleimer, S., Wilkerson, D. and Aiken, A. (2003) Winnowing: Local Algorithms for Document Fingerprinting. SIGMOD 2003, San Diego, 9-12 June 2003, 76-85.

J.A. Malcolm and P.C.R. Lane, Tackling the PAN'09 External Plagiarism Detection Corpus with a Desktop Plagiarism Detector, 3rd PANWORKS-HOP. UNCOVERING PLAGIARISM, AUTHORSHIP AND SOCIAL SOFTWARE MISUSE, 2009, p. 29. C. Basile, G. Cristadoro, D. Benedetto, E. Caglioti, and M. Degli Es-posti, A plagiarism detection procedure in three steps: selection, matches and "squares", 3rd pan workshop. Uncovering plagiarism, authorship and social software misuse, 2009, p. 19.

Adam Shenker Horste Bunke, Mark Last and Abraham Kandle Graph Theoretic Techniques For Web Content Mining, Published by World Scientific Publishing, USA 2005

Ahmed Hamza Osman, Naomie Salim and Mohammed Salem Binwahlan, Plagiarism Detection Using Graph-Based Representation, Journal Of Computing,

Adam Shenker Horste Bunke, Mark Last and Abraham Kandle Graph Theoretic Techniques For Web Content Mining, Published by World Scientific Publishing, USA 2005

Ahmed Hamza Osman, Naomie Salim and Mohammed Salem Binwahlan, Plagiarism Detection Using Graph-Based Representation, Journal Of Computing, Volume 2, Issue 4, Issn 2151-9617 , April 2010.

H. Bunke, On a relation between graph edit distance and maximum common subgraph, Pattern Recognition Letters (1997) H. Bunke and K. Shearer, A graph distance metric based on the maximal common subgraph, Pattern Recognition Letters, Vol. 19, 1998 J. T. L. Wang, K. Zhang, and G.-W. Chirn, Algorithms for Approximate Graph Matching, Information Sciences, Vol. 82, 1995

R. C. Wilson and E. R. Hancock, Structural Matching by Discrete Relaxation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 6, June 1997

R. Myers, R. C. Wilson, and E. R. Hancock, Bayesian Graph Edit Distance, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vo.!22, No. 6, June 2000. Papineni, K., S. Roukos, T. Ward, and W. Zhu (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pp. Association for Computational Linguistics. Stamatatos, E. Plagiarism detection using stopword n-grams. Journal of the American Society for Information Science and Technology (2011) Jones, K. A statistical interpretation of the term specificity and its application in retrieval. Journal of documentation (1972) Marcus, M., M. Marcinkiewicz, and B. Santorini Building a large annotated corpus of English: The penn treebank. Computational linguistics(1993).

Riesen, K. and H. Bunke Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision Computing (2009).

Porter, M. F. An algorithm for suffix stripping. Program, pp. 137-130. (1980).

Megerdoomian, K. (2004). Finite-state morphological analysis of Persian. In Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, University of Geneva, Iran.

Sheykhzadegan, J. and M. Bijankhan (2006). The speech databases of Persian language. In Proceedings of the 2nd Workshop on Persian Language and Computing, the University of Tehran, Tehran, Iran, pp. 247-261.

Taghva, Beckley and Sadeh. A stemming algorithm for the Farsi language. IEEE ITCC, pp. 158 - 162. 2005.

Anvari, H. & Ahmadi Givi, H. (2006). Persian Language Grammar (2nd Ed.). Tehran: Fatemi Publication. A. A. Sharifloo, and M. Shamsfard, "A Bottom up Approach to Persian Stemming", Proceedings of the Third International Joint Conference on Natural Language Processing, 2008.

How To Access The File

Separation of prosthesis projections and tissue implants in spiral CT scan sinogram images using active contour methods

Number of pages: 147 Category: Electronic Engineering

Master's thesis in medical engineering, bioelectrical trend, abstract of the separation of prosthesis projections and tissue implants in sinogram images of spiral CT scan using active contour methods for metal implants such as prostheses and tooth filling materials during the reconstruction of CT images with different methods, creates an artifact that appears as light and dark ...

Presenting a method for right and left ventricle segmentation from cardiac MRI images

Number of pages: 103 Category: Electronic Engineering

Master's thesis in the field of bioelectrical-medical engineering. Abstract Presenting a method for segmenting the right and left ventricles from cardiac MRI images. Examining the structure and function of cardiac ventricles in magnetic resonance images is considered an important step in the management of many cardiac disorders. Although the manual segmentation of heart ...

The effectiveness of painting therapy on the mental health of male students of primary schools in Gachsaran city

Number of pages: 105 Category: Psychology

Dissertation for Master's Degree in Counseling and Guidance (M.A) Abstract The purpose of this study was to investigate the effect of painting therapy on the mental health of male students of primary schools in Gachsaran city. The variables of the current research were mental health with dimensions (anxiety, hostility, physicalization, interpersonal sensitivity, obsession, ...

The effectiveness of reality therapy on reducing shyness and emotional deficiency of students

Number of pages: 91 Category: Psychology

Dissertation for Master's Degree (M.A) in Clinical Psychology Abstract The main purpose of this research was to study the effectiveness of reality therapy on reducing shyness and emotional deficiency of students and its components including reality therapy, reducing shyness, and emotional deficiency. The statistical population of the current research is all students studying in ...

Investigating the effectiveness of cognitive therapy in reducing dysfunctional attitudes and drug-related beliefs in drug abusers in Ilam city.

Number of pages: 112 Category: Psychology

Dissertation for Master's Degree in Psychology (M.A) Major: Clinical Abstract Considering that addiction is one of the most common psychiatric disorders, ""Cognitive Therapy"" is a tool that therapists use to control and treat addiction by influencing thoughts and beliefs. For this purpose, the present study was conducted with the aim of investigating the effectiveness of ...

Investigating the effect of 16 sessions of hydrotherapy exercises on chronic back pain in the elderly

Number of pages: 85 Category: Physical Education - Sports

Master's Thesis Department of Pathology and Corrective Movements Introduction If we consider the last century as the era of extraordinary progress, it is not an exaggeration. In this era, humanity has opened a new door to itself with inventions and initiatives. For sure, the development of human consciousness has started with the help of science and research. Today, physical ...

Detection of non-ideal iris images based on meta-heuristic algorithms

Number of pages: 71 Category: Electronic Engineering

Dissertation for Master's Degree in Mechatronics Engineering (M.Sc) Abstract Biometric technology, based on the unique characteristics of each person, automatically recognizes people's identity. Researchers have been able to extract the iris tissue with high precision even in different conditions. As a result, our effort in this thesis is to present views and methods that ...

Survey and approach of surrealism photography in a selection of contemporary graphic works - the practical title of designing 12 posters with social themes based on the expression of surrealism photography

Number of pages: 287 Category: Art - Graphics

Master's Degree Thesis Abstract This thesis is a research on the recognition of surrealism, not as an artistic style, but in terms of philosophical and artistic thought and perspective and the entry of this thought into the field of photography and its surrealistic application in contemporary graphics. For this reason, a major part of the treatise is dedicated to examining and ...

Presenting a new method to investigate the treatment process of lesions and tooth bone loss in a period of time

Number of pages: 104 Category: Computer Engineering

Master's thesis in the field of computer engineering (artificial intelligence) abstract of various image processing techniques are widely used in the field of dentistry. Among these applications, we can mention the identification based on dental images, the detection of dental caries, as well as the identification and examination of dental lesions, which mainly occur around the ...

Development of automatic tools to identify design patterns with tag refinement and correction operations

Number of pages: 69 Category: Computer Engineering

Master's Thesis in Computer Engineering (Software) Abstract Development of automatic tools to identify design patterns with refining operations and label correction Design patterns are proven and reliable solutions that have been provided to answer some frequently occurring issues in Shigra software design. Identifying them is understood as recovering the designer's secret plan ...

Diagnosing pelagism using graphs in Persian texts

Summary of Diagnosing pelagism using graphs in Persian texts

Contents & References of Diagnosing pelagism using graphs in Persian texts