Contents & References of Extraction of time-frequency feature for visual identification of Persian vowels
List:
Chapter One: Introduction 1
1-1 Introduction ..2
1-2 Structure of Thesis 4
Chapter Two: Review of Research 5
2-1 Introduction ..6
2-2 Active Frontier Models 6
2-2-1 Energy function 7. 2-2-2 Energy minimization 9 2-3 ??Active form models 12 2-4 Flexible models 16 2-4-1 Lip model 16 2-4-2 Cost function formulation 17
2-4-3 Optimizing model parameters. 2-7 Special component analysis. 23
2-7-1 Mathematical background of EM-PCA. 24
2-7-2 Manifold production from input image. 24
2-8 Discrete cosine transform. 26
2-8-1 Modeling based on 3-D DCT. 26
2-8-1-1 Lip movement feature extraction. 27
2-8-1-2 Network-based movement feature extraction. 27
2-8-1-3 Contour-based movement feature extraction. 28
2-8-2 Feature extraction from the target area. 29
2-8-2-1 Features extraction Visual. 30 2-8-3 Cosine transform and LSDA. 31 2-8-3-1 Preprocessing. 31 2-8-3-2 DCT method. DCT + LDA 32 Diagnosis 39. Introduction 3-1 40 3-2 Detection of the lip area 41 3-2-1 Lip and skin color combination analysis 41 3-2-2 Color, saturation and light intensity (HSV) 42 3-2-3 Removing the red component 43. 3-2-4 Ka-Means Algorithm 43 3-2-4-1 Algorithm Implementation 44 3-2-5 Illumination Intensity and Binary 45 3-2-6 Combined Methods 45 3-3 Classification and Identification Methods 47
3-3-1 neural network. 47
3-3-1-1 feedforward networks. 48
3-3-1-2 error back propagation algorithm. 48
3-3-2 hidden Markov model. 51 4-1 database Removing red color. 56
4-3-3 Analysis of lip and skin color combination.
4-4-2 Windowing 62
5-4-4 Calculating frequency Mel coefficients 65 4-5 Finding the center of the lip and extracting an area around the lip 66 4-5-1 Zigzag scan 67 4-5-2 Feature reduction with LSDA 4-5-2-1 Using the Logsigmoid function and changing the algorithm Training.70
4-5-2-2 Using Tansigmoid function and momentum algorithm.70
4-6 Extracting features from different images
4-6-1 Extracting features from new images
4-7 Reducing the number of frames and reducing the size of images. 73
73 4-7-3 Reducing the number of frames and reducing the size of images with the resize command
Table 1-1 of the grouping of idioms in English. 3
Table 1-2 of the grouping of idioms in Persian language. 3
Table 4-1 of monosyllabic words in the database. 52
Table 4-2 of the results before adjusting the endpoints. 71
Table 3-4, the results after setting the end points. 71
Table 4-4, the results of the features extracted from the original images with 20 frames. 74
Table 5-4, the results of the features extracted from the normalized images with relation (4-7) with 20 frames. 74
Table 6-4, the results of Features extracted from reduced images with 20 frames. 75. Table 4-7, the results of the first 10 coefficients of the DCT coefficients of the original images with 20 frames. 75. Table 4-8, the results of the first 10 coefficients of the DCT coefficients of the normalized images with 20 frames. 76. Table 4-9, the results of the 10 coefficients. First, DCT coefficients of reduced images with 20 frames. 76 13.
Figure 2-3 point distribution model, each state is drawn with ?2 ± around the average. 14
Figure 2-4 geometric model of the lip. 16
Figure 2-5 lip pattern. 19
Figure 2-6 Manifold production process. 25
Figure 2-7 (a) Manifold interpolation result (b) Re-sampling of the interpolated manifold with 20 key points. 26
Figure 2-8 Block diagram for network-based motion feature extraction. 28
Figure 2-9 Contour-based motion feature extraction 29.
Figure 2-10 The original image and four regions processed for feature extraction. 30
Figure 2-11 (a) Points with similar color and shape are placed in a class. (b) An intraclass graph connects points with the same label. (c) An interclass graph connects points with different labels. (d) After applying LSDA, the distance between different classes has been maximized. 33
Figure 2-12 The left side of the Bezier curve and the right side of the lip model. 36
Figure 2-13 The horizontal opening angle 2? and the vertical opening angle 1?. 38
Figure 3-1 The result of the analysis of skin and lip color combination and the lip corner points. 42
Figure 3-2 Algorithm for separation of lip region
Figure 4-1 Thresholding with threshold 0.4.55
Figure 4-2 Thresholding with threshold 0.5.55
Figure 4-3 Using red color removal algorithm with ?=0.5 .56
Figure 4-4 pictures of speakers. 57
Figure 4-5 of the extracted lip shape after applying the algorithm. 58
Figure 4-6 of the extracted lip shape after labeling. 59
Figure 4-7 The rectangle surrounding the lip. 60
Figure 4-8 steps of calculating the Mel coefficients. 61
Figure 4-9 triangular bank filter. 63
Figure 4-10 The target area around the lip. 66
Figure 4-11 Number of 25 frames related to the word bear after finding the desired area. 67
Figure 4-12 how to scan zigzag matrix. 68
Figure 4-13 The results of features + LSDA.70
Figure 4-14 The results of reduced images with a scale of 0.5 and the number of 25 frames. 77
Figure 4-15 The results of reduced images with a scale of 0.7 and the number of 25 frames. 78
Figure 16-4, the results of different DCT coefficients with a scale of 0.5. 79
Figure 4-17 The results of different DCT coefficients with a scale of 0.7. 80
Source:
[1] T Chen, ''Audiovisual speech processing''. IEEE Signal Processing Magazine, Vol.18(1), pp: 9–21, (2001).
[2] Sadeghi, Vahida Al-Sadat, "Vowel Recognition in Persian Monosyllabic and Bisyllabic Words," Master's Thesis, Semnan University, 1385
[3] E.D.Petajan, "Automatic Lipreading to Enhance Speech Recognition," PhD thesis, University of Illinois at Urbana-Champain, 1984.
[4] M. Kass, A.Witkin, and Terzopoulos, "Snakes: Active Contour Models," International Journal of Computer Vision, pp.321-331,1988.
[5] C. Bregler and Y. Konig, "Eigenlips For Robust Speech Recognition," in Proc. IEEE Conf.