Word Files
Reference for Downloading Educational Files

Presenting an efficient model based on the subcombinations extracted from the feature to recognize human physical activities

Number of pages: 140 File Format: word File Code: 31009
Year: 2014 University Degree: Master's degree Category: Computer Engineering

Tags/Keywords: artificial intelligence - Data structure - Detection of human physical activities - Encoder group tank - Human activity detection - Thin coder - Unsupervised feature learning

Part of the Content
Contents & Resources

Summary of Presenting an efficient model based on the subcombinations extracted from the feature to recognize human physical activities

Doctoral thesis in the field of computer engineering (artificial intelligence)

Abstract

Understanding and extracting information from images and videos is the common thread of most problems related to machine vision. Finding the main and useful parts of a movie and modeling the actions between these parts is one of the main goals of movie analysis. In the last decade, human activity detection using video images has been raised as a challenging topic in machine vision. Among the applications of this subject, we can mention the issues of surveillance and security, medicine and human-computer interaction. It is difficult and complex to extract the main components and summarize an activity due to the great diversity in the way an activity is performed. If we consider the beginning of video analysis to process the brightness of image pixels in different frames and the final goal to detect human activity, there is a great distance between the level of analysis and the final goal, and an urgent need to extract meaningful and higher-level features is felt. In fact, the main challenge is to fill the deep gap between low-level descriptors to express the type of activity and summarize it. In recent decades, researchers have not been very successful in providing effective summarization methods using vision and machine learning techniques, even at the level of images. In this regard, separation methods [1] have been proposed, which model the differential boundary of different classes. Despite their success, these models require a lot of labeled data and are limited to a specific context. In addition, the risk of overfitting [2] also threatens them. On the other hand, generative models [3] solved this problem by adding additional constraints to the model using a large amount of available unlabeled data. As an example, we can point to the unsupervised feature learning methods that reduce the distance between the low-level descriptors and the final model by adding some basic knowledge about the overall data structure. In this thesis, by presenting five different frameworks, the problem of human activity recognition is solved with the approach of summarizing and extracting higher-level features. The main steps of doing the work can be divided into three main parts, 1- feature extraction, 2- their quantization and 3- classification. In this research, the features of shape and movement related to two-dimensional images of video frames have been extracted. In the second part, which is almost the main part of this research, in order to reduce the quantization error and raise the level of the features (using the basic knowledge hidden in the data) and also for easier classification in the later stages, instead of the common methods such as K-means, thin coding methods and some improved versions of it, which are considered as unsupervised feature learning methods, have been used. In such methods, the goal is to find higher-level basic functions and describe the video using a linear combination of them. Also, we have used the very useful method of thin group encoding to extract useful information of the time sequence. Then, in order to avoid overfitting the model, spatial and temporal integration of coefficients is suggested. Finally, by using two different algorithms of the general methods of generative and separator classification, we have finished the activity detection.

Among the highlights of this thesis, we can mention the combination of several features with different modalities, extracting the meaningful components of an activity and modeling their relationship by considering the temporal structure of the data, reducing the quantization error, and also significantly reducing the spatial and temporal complexity. The presented methods have been evaluated on several activity recognition databases consisting of artificial and real data with different challenges, and good results have been obtained.

Key words: human activity recognition, basic knowledge, data structure, multi-category system, sparse coder, group sparse coder, unsupervised feature learning.

Chapter 1

Introduction

Introduction

Introduction

Understanding and analyzing images is the common thread of most machine vision problems. In this regard and with the advancement of different machine vision techniques, the analysis of different scenes has risen above the image level and analyzes the film (a sequence of frames) taking into account the time relationships between them. This provides a better and more accurate understanding of the intended scene. Today, human activity detection is one of the most important and interesting research topics in the field of machine vision. The purpose of this diagnosis is to analyze the activities of humans in an unknown video.In general, the analysis of human movements can be divided into three categories: 1- detection of human activity [1], 2- tracking human movements [2] and 3- analysis of movements of different parts of the human body [3]. Each of these analyzes can be performed on two or three-dimensional frames. In many practical problems, after finding people in images and following them, we seek to categorize their activities. Activity detection is a process of tagging human activities that can be done using various sensors such as vision and sound. In this research, we only use field of vision observations that can be taken from one or more cameras. The label of a specific activity is a name that almost average people, upon hearing it, imagine the same activity and can do it the same way. In other words, the activity tag is the best descriptor of an example of an activity performed by different people in different conditions.

Looking deeper into the problem of activity recognition, it can be considered similar to some fields of artificial intelligence, such as natural language processing, text processing, and voice recognition, from different perspectives. It is useful to use different perspectives to analyze this issue. For example, we use the concepts of natural language and human speech to define the activity more precisely and recognize it. Humans use sentences in their daily conversations. Each simple sentence consists of subject, object and verb. There is almost the same structure to express the visual concepts in a movie. From this point of view, the subject or performer of the activity is usually humans. The object can usually be other people or objects or the environment on which the subject performs its activity. Finally, the verb indicates the type of activity or interaction between the subject and objects. From the point of view of sound processing, as components such as phonemes, letters and words make a sentence in this area, the sequence and order of movements together form a meaningful activity. Considering the existing similarities, it seems that we can achieve a more efficient solution to our problem by examining different methods in the mentioned areas.

There are different types of human activities. We divide the activities into 4 different levels according to their complexity[1]:

1.[4]: The basic movements of body parts are atomic and are used to describe meaningful human movements. Such as opening the hand from the elbow or folding it, fisting the hand, etc.

Human activity [5]: We put the simple activities that can include several movements from the first category in the time dimension into the second category. In other words, the combination of human atomic movements constitutes an activity. Such as walking, shaking hands, etc.

Interaction of human activities[6]: In this category, two or more people or people and objects are connected. Such as two people fighting with each other or someone stealing someone's bag, which is an example of the interaction of two people with the same object.

Group activities [7]: an operation that is carried out by a group of people with each other or with objects. Like marching a group of soldiers, meeting a group, etc.

For example, a game of tennis is an interaction of human activity. This interaction includes several activities such as serving, returning the ball or time-out, etc. Each of these activities includes basic movements. For example, serving includes throwing the ball upwards, moving the racket back, moving the racket and hitting the ball. It should be noted that the choice of initial movements is an important and influential issue in the continuation of the diagnosis process. For example, arm movement may not be a sufficient movement for part of the activity of playing tennis, while it may be a sufficient movement for the activity of drinking. Therefore, the extraction of the basic movements of an activity is somewhat dependent on the type of activity, and a precise definition is not completely possible. Applications The ability to recognize complex human activities has various applications. including automatic monitoring systems in public places such as airports and highways that require the detection of abnormal and suspicious movements and activities against normal and normal activities [1]. For example, in airports, detecting some activities such as a person leaving a bag or throwing a person's handbag in the trash can be considered as suspicious movements.
Contents & References of Presenting an efficient model based on the subcombinations extracted from the feature to recognize human physical activities

List:

1- Introduction. 2

1-1- Introduction. 2

1-2- Applications 14

1-3- Challenges and features of the environment. 6

1-4- General definition of the problem. 11

2- Review of past researches. 24

2-1- Introduction. 24

2-2- Single layer methods. 24

2-2-1- Introduction of various time-space methods. 15

2-2-2- Summary and comparison of time-space methods. 23

2-2-3- sequential methods. 25

2-2-4- Summary and comparison of successive methods. 26

2-3- Multilayer (hierarchical) methods. 26

2-3-1- Statistical methods. 27

2-3-2- Syntactic methods. 27

2-3-3- descriptive model. 28

2-3-4- Summary and comparison of hierarchical methods. 28

3- Studying the tools used 31

3-1- Introduction. 31

3-2- Tools used in feature extraction. 31

3-2-1- Directional gradient histogram. 31

3-2-2- optical flux. 32

3-3- Tools used in learning higher level features. 44

3-3-1- General pattern in unsupervised feature learning. 36

3-3-2- Common methods in unsupervised feature learning. 37

3-3-3- Moody's empirical analysis. 61

3-4- Tools used in classification. 62

3-4-1- Hidden Markov model. 62

3-4-2- Support vector machine: 56

4- Suggested method. 61

4-1- Introduction. 61

4-2- Defining the main framework. 61

4-3- Steps to do the work. 62

4-3-1- Video expression. 64

4-3-2- feature extraction. 76

4-3-3- Quantizing words and creating a dictionary. 68

4-3-4- Integration. 88

4-3-5- Classification. 89

4-4- Proposed frameworks. 92

4-4-1- First frame: 92

4-4-2- Second frame: 92

4-4-3- Third frame: 83

4-4-4- Fourth frame: 84

4-4-5- Fifth frame: 86

5- Results. 95

5-1- Available databases. 95

5-2- Setting the parameters of the problem. 102

5-3- Results. 104

6- Discussion. 120

6-1- Innovations and their advantages and disadvantages 120

6-2- Comparison of proposed frameworks. 113

6-3- Proposed works for the future. 114

6-4- Summary. 115

7- List of sources. 116

Source:

1.J. K. Aggarwal, and M. S. Ryoo, "Human Activity Analysis: A Review", ACM Computing Surveys Journal (CSUR), Vol. 43, No. 3, pp. 1-47, 2011

2.R. Poppe, "A survey on vision-based human action recognition", Image and Vision Computing, Vol. 28, pp. 976–990, 2010.

3.M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes”, IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. 29, No. 12, pp. 2247–2253, 2007.

4.A. Bobick, and J. Davis "The recognition of human movement using temporal templates", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 23, No. 3, pp. 257-267, 2001.

5.E. Shechtman, and M. Irani, "Space-time behavior based correlation", CVPR, 2005.

6.Y. Ke, R. Sukthankar, and M. Hebert, "Spatio-temporal shape and flow correlation for action recognition", CVPR, 2007.

7.M.D. Rodriguez, J. Ahmed, and M. Shah, "Action MACH: a spatiotemporal maximum average correlation height filter for action recognition", CVPR, 2008.

8.Z. Li, Y. Fu, T. Huang, and S. Yan, "Real-time human action recognition by luminance field trajectory analysis", ACM International Conference on Multimedia, 2008.

9.Y. Sheikh, M. Sheikh, and M. Shah, "Exploring the space of a human action", ICCV, 2005.

10.Yilmaz, and M. Shah, "Recognizing human actions in videos acquired by uncalibrated moving cameras", ICCV, 2005.

11.G. Johansson, "Visual perception of biological motion and a model for its analysis", Perception & Psychophysics, Vol. 14, pp. 201-211, 1973.

12.I. Laptev, T. Lindeberg, "On Space-Time Interest Points", International Journal of Computer Vision, Vol. 64, pp. 107-123, 2005.

13.P. Doll?r, V. Rabaud, G. Cottrell, S. Belongie, "Behavior Recognition via Sparse Spatio-TemporalBelongie, "Behavior Recognition via Sparse Spatio-Temporal Features", IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), 2005.

14.A. Oikonomopoulos, I. Patras, and M. Pantic, "Spatiotemporal salient points for visual recognition of human actions", IEEE Trans. On Systems Man and Cybernetics (SMC) – Part B: Cybernetics, Vol. 36, No. 3, pp. 710–719, 2006.

15.S.F Wong, and R. Cipolla, “Extracting spatiotemporal interest points using global information”, ICCV, 2007.

16.T.K Kim, S.F Wong, and R. Cipolla, “Tensor canonical correlation analysis for action classification”, CVPR, 2007.

17.G. Willems, T. Tuytelaars, and L. VanGool, "An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector", ECCV, 2008.

18.I. Laptev and P. Perez, "Retrieving actions in movies", ICCV, 2007.

19.W.L Lu, James J. Little, "Simultaneous tracking and action recognition using the PCA-HOG descriptor", Canadian Conference on Computer and Robot Vision, 2006.

20.P. Scovanner, S. Ali, and M. Shah, "A 3-dimensional SIFT descriptor and its application to action recognition", International Conference on Multimedia, 2007.

21.J. Yamato, J. Ohya, and K. Ishii, "Recognizing human action in time-sequential images using hidden Markov model", CVPR, 1992.

22.A.Veeraraghavan, R. Chellappa, and A. Roy-Chowdhury, "The function space of an activity", CVPR, 2006.

23.R. Lublinerman, N. Ozay, D. Zarpalas, and O. Camps, "Activity recognition from silhouettes using linear systems and model (in) validation techniques", ICPR, 2006.

24.F. Lv, and R. Nevatia, "Recognition and segmentation of 3-D human action using HMM and multi-class adaBoost", ECCV, 2006.

25.B. Chakraborty, O. Rudovic, J. Gonzalez, "View-invariant human-body detection with extension to human action recognition using component-wise HMM of body parts", International Conference on Automatic Face and Gesture Recognition, 2008.

26.N.M. Oliver, B. Rosario, and A.P. Pentland, "A Bayesian computer vision system for modeling human interactions". IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 22, No. 8, pp. 831-843, 2006.

27.S. Park, J.K. and Aggarwal, "A hierarchical Bayesian network for event recognition of human actions and interactions". Multimedia Systems, Vol. 10, No. 2, pp.164-179, 2004.

28.E. Yu, and J.K. Aggarwal, "Detection of fence climbing from monocular video", ICPR, 2006.

29.Y. Shi, Y. Huang, D. Minnen, A.F. Bobick, and I.A. Essa, "Propagation networks for recognition of partially ordered sequential action", CVPR, 2006.

30.Y.A. Ivanov, and A.F. Bobick, "Recognition of visual activities and interactions by stochastic parsing". IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 22, No. 8, pp. 852-872, 2000.

31.D. Moore and I. Essa, "Recognizing multi tasked activities using stochastic context-free grammar using video", AAAI, 2002.

32.M.S. Ryoo, and J.K. Aggarwal, "Recognition of composite human activities through context-free grammar based representation", CVPR, 2006.

33.A. Gupta, P. Srinivasan, J. Shi, and L.S. Davis, "Understanding videos, constructing plots learning a visually grounded storyline model from annotated video", CVPR, 2009.

Th. Brox, A. Bruhn, N. Papenberg, and J. Weickert, "High accuracy optical flow estimation based on a theory for warping", ECCV, 2004.

A. Coates, "Demystifying Unsupervised Feature Learning", PhD thesis. Stanford University, 2012.

F. Bach, "Consistency of the Lasso group and multiple kernel learning", Journal of Machine Learning Research, Vol. 9, pp.1179–1225, 2008.

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, "Visual categorization with bags of keypoints", Workshop on statistical learning in computer vision, ECCV, 2004.

R.

How To Access The File

Extraction of appropriate features for EEG voluntary movement signals detection

Number of pages: 100 Category: Computer Engineering

Dissertation for Master's Degree in Artificial Intelligence Abstract In this thesis, we intend to perform classification on brain signals by presenting a suitable feature. For this purpose, first the noise of the recording device is removed from the brain signals, then the features are extracted from these signals using Walsh transform and entropy. After feature extraction, ...

Consensus clustering on heterogeneous distributed data

Number of pages: 120 Category: Computer Engineering

Master's Thesis in Computer Engineering - Software Orientation Abstract Clustering can be considered one of the most important steps in data analysis. Many clustering methods have been developed and presented so far. One of these methods that has been studied in recent studies is consensus clustering method. The goal of consensus clustering is to combine several initial ...

The relationship between social capital and the components of professional ethics of faculty members in Islamic Azad University units in Gilan province

Number of pages: 172 Category: Management

Thesis for obtaining the master's degree (M.A.) (Trend: Human Resource Management Winter 1391 Abstract: Naza University

Presenting a new index to measure the level of brain fatigue during mental activity from the EEG signal

Number of pages: 92 Category: Computer Engineering

Master's Thesis in Computer Engineering (Artificial Intelligence) Abstract In recent years, many methods have tried to investigate the level of mental fatigue with different criteria. These methods have used different scales for this task, including performance and electrophysiological based measurements. Among these tools, electroencephalogram (EEG) seems to work better and ...

The detection system of the obstacles on the road, data fusion multi-sensor (emergency braking system of the car)

Number of pages: 99 Category: Facilities - Mechanics

Dissertation for the degree of Master of Mechatronics M.SC. Abstract: The subject of this research is the design and simulation of data integration in a radar network that has overlapping. Data fusion means combining the output data of dissimilar radar sensors that are different in terms of accuracy in ranging and angle measurement. These sensors are installed in a row in front ...

Identifying overlapping entities in dynamic networks

Number of pages: 82 Category: Computer Engineering

Master's Thesis in Computer Engineering-Artificial Intelligence Abstract Identifying Overlapping Organizations in Dynamic Networks Many complex natural and social structures can be considered as networks [1]. Roads, Internet sites, social networks, organizational communication, kinship relationships, electronic mail exchange, telephone calls and financial transactions are just a ...

Presenting a feature-based model to analyze the sentiment in texts

Number of pages: 74 Category: Computer Engineering

Master's Thesis in Computer Engineering (Software) First Chapter Preface 1-1- Introduction Some authors define data mining as a tool to search for useful information in a large amount of data. To perform the data mining process, we encounter various research fields, such as database, machine learning and statistics. Databases are essential for analyzing large amounts of data. ...

Effectiveness of anger management group training based on cognitive restructuring in increasing social intimacy and reducing deterministic thinking among students in Shiraz

Number of pages: 77 Category: Psychiatry

Master's Thesis of School Counseling Abstract: In line with the need to teach ""emotional management"" skills, the current research under the title, the effectiveness of anger management group counseling, based on cognitive restructuring, on increasing social intimacy and reducing deterministic thinking among male high school students in Shiraz city, was conducted. The study ...

Presenting a model to identify the influencing factors and their impact factor in the profit and loss of the third party car insurance of insurance companies by means of data mining methods, a case study of Iran Insurance Company.

Number of pages: 100 Category: Computer Engineering

Master's thesis in the field of computer - software engineering. Abstract: The review of car insurance information has shown that factors such as the type of car used, having a driver's license, the type of license and its compatibility or non-compatibility with the vehicle, the amount of the insurance premium, the amount of insurance policy obligations, the quality of the car ...

Explaining team effectiveness by emotional intelligence and transformational leadership in bank branches in Gilan province

Number of pages: 153 Category: Management

Dissertation for obtaining a master's degree (A..M) orientation: Human Resources Winter 2019 Abstract With the advancement of technology and the complexity of organizational activities, it can be boldly claimed that the working life of

Presenting an efficient model based on the subcombinations extracted from the feature to recognize human physical activities

Summary of Presenting an efficient model based on the subcombinations extracted from the feature to recognize human physical activities

Contents & References of Presenting an efficient model based on the subcombinations extracted from the feature to recognize human physical activities