Word Files
Reference for Downloading Educational Files

Presenting a feature-based model to analyze the sentiment in texts

Number of pages: 74 File Format: word File Code: 31019
Year: 2014 University Degree: Master's degree Category: Computer Engineering

Tags/Keywords: artificial intelligence - data mining - data recovery - Database - machine learning - sentiment analysis - The feeling in the writings

Part of the Content
Contents & Resources

Summary of Presenting a feature-based model to analyze the sentiment in texts

Master's Thesis in Computer Engineering (Software)

First Chapter Preface

1-1- Introduction

Some authors define data mining as a tool to search for useful information in a large amount of data. To perform the data mining process, we encounter various research fields, such as database, machine learning and statistics. Databases are essential for analyzing large amounts of data. Machine learning is an area of ??artificial intelligence that allows computers to learn by analyzing data sets by creating techniques. The focus of these methods is on symbolic data and deals with the analysis of experimental data. Its basis is statistical theory. In this theory, uncertainty and chance are modeled by probability theory. Today, many statistical methods are used in the field of data mining. It can be said that text mining uses the techniques of information retrieval, information extraction as well as natural language processing and relates them to algorithms and methods of data mining, machine learning and statistics. According to different research areas, different definitions of text mining can be considered for each of them. Some of these definitions are given below:

Text mining = information extraction: In this definition, text mining is considered corresponding to information extraction (extracting facts from the text).

Text mining = textual data discovery: text mining can be defined as methods and algorithms of Considered machine learning and statistical fields for texts with the aim of finding useful patterns. For this purpose, it is necessary to pre-process the texts. In many methods, information mining methods, natural language processing or some simple pre-processing are used to extract data from texts, then data mining algorithms can be applied on the extracted data. Text mining = knowledge extraction process: which is fully explained in the previous section and will not be described here. In this research, we mostly consider text mining as textual data discovery and focus more on the methods of extracting useful patterns from text to categorize text collections or extract useful information. In the current world, the problem is not the lack of information, but the problem of the lack of knowledge that can be obtained from this information. Millions of web pages, millions of words in digital libraries and thousands of pages of information in every company are just a few of these sources of information. But it is not possible to specifically introduce a source of knowledge in between. Knowledge is a summary of information, as well as a conclusion and the result of thinking and analyzing information.

Data mining is a very efficient way to discover information from structured data stored in tables. Data mining extracts patterns from transactions, groups and categorizes data. By data mining, we can find out the relationships between the data items that fill the database. At the same time, we have a problem with data mining, and that is the lack of commonality in its application. Most of our knowledge is completely unstructured, if not non-digital. Digital libraries, news, e-books, many financial documents, scientific articles, and almost anything else you can find on the web are unstructured. As a result, we cannot apply data mining teachings to them directly. However, there are three basic methods in dealing with this vast amount of unstructured information, which are: information retrieval, information mining, and natural language processing.

Information recovery: It is basically related to the recovery of documents and documents. The usual work in information retrieval is to pull out the most relevant texts and documents, or in fact, words from among other documents of a collection, according to the needs raised by the user. It's not finding knowledge, it's just handing over those bits of words that it thinks are most relevant to the searcher's information needs. This method does not really bring us knowledge or even information. Natural language processing: The general goal of natural language processing is to achieve a better understanding of natural language by computers. Robust and simple techniques are used for fast text processing. Linguistic analysis techniques are also used to process the text.

Information extraction: The goal of information extraction methods is to extract specific information from text documents. Information extraction can be used as a pre-processing phase in text mining. Information mining involves mapping natural language texts (eg reports, journal articles, newspapers, e-mails, web pages, any text database, etc.) to a predefined structured representation or templates that, when filled, reveal a selection of key information from the original text. Once the information is extracted and then the information can be stored in the database for future use. Today, despite the large amount of textual information, text mining is one of the research-commercial methods that is of particular importance. All commercial companies, producers of goods, service providers and politicians are able to receive useful knowledge as feedback from their goods, services and performance by using the process of text mining. Among the applications of text mining, the following can be mentioned: 1. Spam identification: analyzing the title and content of a received email, to determine whether the email can be spam or not. 2. Surveillance: It means monitoring the behavior of a person or a group of people secretly. A project called ENCODA monitors telephone, internet and other means of communication to identify terrorism.

3. Pseudonym detection: Pseudonyms in medical care are analyzed to identify fraud. For example, an invoice may be presented as John Smith, J. Smith and Smith, John. In this way or by using other methods, claimants will find the possibility of abuse and receive many premium claims under different aliases. Using text mining to detect these aliases can help insurance companies a lot in finding fraud.

4. Summarizing: The meaning of summarizing is the process of extracting and presenting a set of basic concepts from the text, in just a few lines. This can make it easier for users to check the contents of the documentation and speed them up on the way to what they need.

5. Relationships between concepts: among the facts that can be obtained from a set of texts, the connection and dependence of some concepts with other concepts. These facts can say, for example, that the appearance of some words may depend on the appearance of some other words. This means that whenever we see the first set of words, we can expect to see the second set of words as well. This concept is also borrowed from data mining in the database.

6. Finding Behavior Analysis: To illustrate this application, assume that you are the manager of a business company. Obviously, you should always monitor the activities of your competitors. It can be any kind of information that you got from the news, stock market transactions or from the documents produced by the same competitor company. Today, information is increasing exponentially, managing all these data sources is definitely not possible with the help of eyes alone. Text mining allows you to automatically find new behaviors and changes. In fact, what should be expected from text mining is to tell you what news among a range of news is related to what you want, and which news is new, what developments are taking place in your field of work, and what are your current interests and behaviors and how are they changing. By using this information, managers are able to profit from the discovered information to check the competitor's situation.

7. Sentiment analysis: In this application, the purpose of text mining is to identify the author's feelings. The degree of satisfaction or happiness and unhappiness of the author is recognized. This thesis will examine the text mining in order to analyze the feeling in the texts, so we will analyze the feeling in the texts in more detail.

All the textual information can be classified into two categories: facts[1] and opinions[2]. Facts are scientific and practical statements about entities, events and their characteristics that exist objectively and truly in the outside world or have happened. Opinions are non-objective and subjective expressions that express people's opinions, evaluations or feelings about an entity, event and their characteristics [23].
Contents & References of Presenting a feature-based model to analyze the sentiment in texts

The first chapter of the preface. 1

1-1- Introduction. 2

1-3- Analyzing the feeling in the text. 6

1-4- Objectives of the treatise. 8

1-5- work method. 9

1-6- thesis structure. 9

The second chapter of the works done 10

2-1- Introduction. 11

2-2- Definition of the problem. 11

2-3- The first step of analyzing the feeling in the text. 12

2-4- Methods based on N-gram features. 13

2-5- feature selection algorithms. 18

The third chapter of the proposed method. 22

3-1- Preface. 23

3-2- Required resources. 23

3-3- The first proposed method. 25

3-3-1.               Pre-processing of documents 26

3-3-2.               Tagging speech habits. 29

3-3-3.               Feature vector extraction and feature combination 30

3-3-4.                Apply feature selection algorithm. 33

3-4- The second proposed method 34

3-5- The third proposed method 37

3-5-1.               Word polarity extraction and feature vector filter. 38

Chapter 4 implementation and results obtained 47

4-1- Introduction. 48

4-2- Data collection 48

4-3- Data classification 48

4-4- Results of the first method. 49

4-5- The results of the second method 52

4-6- The results of the third method 53

4-7- Comparison of the proposed method with previous methods. 53

4-8- The results of applying the proposed method for the Persian language. 54

4-9- Future works 58

References and sources. 59

Source:

[1] A. Abbasi, S. France, Z. Zhang, H. Chen; "Selecting Attributes for Sentiment Classification Using Feature Relation Networks.", IEEE Transactions on Knowledge and Data Engineering 23, pp. 447–462 (2011).

[2] A. Ahmed, H. Chen, A. Salem; "Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums," ACM Trans. Information Systems, vol. 26, no. 3, article no. 12, 2008

[3] A. Abbasi, H. Chen, S. Thoms, T. Fu; "Affect Analysis of WebForums and Blogs Using Correlation Ensembles" IEEE Trans.Knowledge and Data Eng.,vol. 20, no. 9, pp. 1168-1180, Sept. 2008.

[4] B. Pang, L. Lee, S. Vaithyanathan; "Thumbs up? Sentiment classification using machine learning techniques.", Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86, (2002).

[5] B. Agarwal, N. Mittal; "Optimal Feature Selection Methods for Sentiment Analysis", 14th International Conference on Intelligent Text Processing and Computational Linguistics, Vol-7817, pages-13-24, 2013.

[6] C.E. Shannon; "A Mathematical Theory of Communication," Bell Systems Technical J., vol. 27, no. 10, pp. 379-423, 1948.

[7] C. Priyanka, G. Deepa, "Identifying the Best Feature Combination for Sentiment Analysis of Customer Reviews" International Conference on Advances in Computing, Communications and Informatics (ICACCI), India, pp. 102 – 108, Aug 2013. [8] C.E. Shannon, "A Mathematical Theory of Communication," Bell

Systems Technical J., vol. 27, no. 10, pp. 379-423, 1948.

[9] E.Andrea and S.Fabrizio, "SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining," In Proceedings of the 5th Conference on Language Resources and Evaluation, LREC'06, page 417-422, 2006.

[10] E. Riloff, S. Patwardhan, and J. Wiebe, “Feature Subsumption for Opinion Analysis,” Proc. Conf. Empirical Methods in Natural Language Processing, pp. 440-448, 2006. [11] J.R. Quinlan; "Induction of Decision Trees", Machine Learning, vol. 1, no. 1, pp. 81-106, 1986. [12] J. Wiebe, T. Wilson, R. Bruce, M. Bell, and M. Martin; "Learning Subjective Language", Computational Linguistics, vol. 30, no. 3, pp. 277-308, 2004.

[13] J. Blitzer, M. Dredze, F. Pereira; "Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification.", Proceedings of the”, Proceedings of the Association for Computational Linguistics (ACL), pp. 440–447 (2007).

[14] J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack, “Sentiment Analyzer: Extracting Sentiments about a Given Topic Using Natural Language Processing Techniques,” Proc. Third IEEE Int'l Conf. Data Mining, pp. 427-434, 2003. [15] J.R. Quinlan, "Induction of Decision Trees," Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.

[16] K. Tsutsumi, K. Shimada, and T. Endo, “Movie Review Classification Based on Multiple Classifier,” Proc. 21st Pacific Asia Conf. Language, Information, and Computation, pp. 481-488, 2007.

[17] L. Bing, Z. Lei “Mining Text Data”, springer, USA, 2012.

[18] L. Yu and H. Liu, “Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution,” Proc. 20th Int'l Conf. Machine Learning, pp. 856-863, 2003. [19] L. Yu and H. Liu, “Efficient Feature Selection via Analysis of Relevance and Redundancy,” J. Machine Learning Research, vol. 5, pp. 1205-1224, 2004. [20] M. Gamon; "Sentiment Classification on Customer Feedback Data: Noisy Data, Large Feature Vectors, and the Role of Linguistic Analysis," Proc. 20th Int'l Conf. Computational Linguistics, pp. 841-847, 2004. [21] M. Hall, L.A. Smith; "Feature Subset Selection: A Correlation Based Filter Approach," Proc. Fourth Int'l Conf. Neural Information Processing and Intelligent Information Systems, pp. 855-858, 1997.

[22] M. Ghiassi, J. Skinner, D. Zimbra: “Twitter brand sentiment analysis: A hybrid system using N-gram analysis and dynamic artificial neural network”, Expert Systems with Applications, 40, (2013) 6266–6282

[23] p. Bo, Lillian Lee, "Opinion Mining and Sentiment Analysis", Information Retrieval, Vol. 2, Nos. 1–2, pp. 1–135, (2008)

[24] T. Zhang, D. Tao, X. Li, and J. Yang, “Patch Alignment for Dimensionality Reduction,” IEEE Trans. Knowledge and Data Eng., vol. 21, no. 9, pp. 1313-1299, Sept. 2009

[25] V. Ng, S. Dasgupta, S.M.N. Arifin "Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews", Conf. Computational Linguistics, Assoc. for Computational Linguistics, pp. 611-618, 2006. [26] Z. Fei, J. Liu, G. Wu; "Sentiment Classification Using Phrase Patterns", Proc. Fourth IEEE Int'l Conf. Computer Information Technology, pp. 1147-1152, 2004. [27] WEKA. Open Source Machine Learning Software Weka, http://www.cs.waikato.ac.

How To Access The File

Presenting a model to identify the influencing factors and their impact factor in the profit and loss of the third party car insurance of insurance companies by means of data mining methods, a case study of Iran Insurance Company.

Number of pages: 100 Category: Computer Engineering

Master's thesis in the field of computer - software engineering. Abstract: The review of car insurance information has shown that factors such as the type of car used, having a driver's license, the type of license and its compatibility or non-compatibility with the vehicle, the amount of the insurance premium, the amount of insurance policy obligations, the quality of the car ...

Consensus clustering on heterogeneous distributed data

Number of pages: 120 Category: Computer Engineering

Master's Thesis in Computer Engineering - Software Orientation Abstract Clustering can be considered one of the most important steps in data analysis. Many clustering methods have been developed and presented so far. One of these methods that has been studied in recent studies is consensus clustering method. The goal of consensus clustering is to combine several initial ...

Fuzzy clustering of data based on fuzzy logic

Number of pages: 46 Category: Computer Engineering

Dissertation for Master's Degree in Computer Engineering - Artificial Intelligence Abstract Data clustering is a method for classifying similar data, which has been used for many years in various sciences and many algorithms have been designed in this field. Recent clustering research leads to hybrid methods that are more robust and accurate. Hybrid clustering tries to first ...

Presenting an efficient model based on the subcombinations extracted from the feature to recognize human physical activities

Number of pages: 140 Category: Computer Engineering

Doctoral thesis in the field of computer engineering (artificial intelligence) Abstract Understanding and extracting information from images and videos is the common thread of the majority of problems related to machine vision. Finding the main and useful parts of a movie and modeling the actions between these parts is one of the main goals of movie analysis. In the last decade, ...

Searching for emerging patterns with streaming features

Number of pages: 132 Category: Computer Engineering

Master's Thesis in Computer Engineering (Artificial Intelligence) Abstract Extracting useful patterns from data sets is one of the challenging topics in data mining. On the other hand, in high-dimensional data, extracting a small set of emerging patterns with strong predictive ability is one of the important issues in creating a classifier based on emerging patterns. In the real ...

Stock market pattern prediction using perceptron multilayer artificial neural networks

Number of pages: 81 Category: IT Information Technology Engineering

Master's Thesis of Information Technology Engineering, Information Systems Management, Abstract In today's world, due to the change in lifestyle, people are looking for a way to improve and improve their economic situation, one of the most important ways to improve their financial situation is to increase their income. One of the easiest ways is investment, which has different ...

Evaluation of the performance of intelligent neurophasic models and artificial neural networks in predicting and simulating the quality parameter of TDS of rivers (Case study of the Shirin River)

Number of pages: 117 Category: Civil Engineering

Master's thesis in the field of civil engineering - hydraulic structures Abstract: Rivers are one of the most important and common sources of drinking, agricultural and industrial water supply. These resources have many qualitative fluctuations due to passing through different platforms and direct connection with the surrounding environment. Therefore, predicting the quality of ...

Investigating the effectiveness of teaching emotional intelligence components on parenting stress of mothers with children with physical-motor disabilities in Zanjan city, Sal

Number of pages: 180 Category: Psychology

Master's Thesis Abstract: The present study investigated the effectiveness of emotional intelligence training on parenting stress of mothers of children with physical-motor disabilities in Zanjan city. For this purpose, in addition to the research question of ""determining the effect of emotional intelligence training on reducing the stress of parenting mothers of children with ...

Detection of web spam using data mining techniques

Number of pages: 95 Category: Computer Engineering

Master's thesis (M.sc) Abstract: Nowadays, spam [1] is one of the main problems of search engines, because they make the quality of search results unfavorable. In recent years, there have been many advances in detecting fake pages, but new spamming techniques have also emerged in response. It is necessary to improve anti-spam techniques to overcome these attacks. A common ...

Explaining the relationship between transformational leadership and psychological empowerment of employees (case study: Central Bank)

Number of pages: 192 Category: Management

Dissertation for Master's Degree in Public Management, Organizational Transformation Orientation Winter 1391 Abstract The current research aims to investigate the relationship between transformational leadership and Empowerment of psychology

Presenting a feature-based model to analyze the sentiment in texts

Summary of Presenting a feature-based model to analyze the sentiment in texts

Contents & References of Presenting a feature-based model to analyze the sentiment in texts