Contents & References of Extracting time-frequency feature for visual identification of Persian vowels
List:
The first chapter: Introduction 1. Introduction 1-1 Introduction 2. 1-2 Thesis structure 4. The second chapter: An overview of the conducted researches 5. 2-1 Introduction 6. 2-2 Active boundary models 6. 2-2-1 Energy function 7. 2-2-2 Energy minimization 9. 2-3 Active shape models 12. 2-4 Flexible models 2-4-1 lip model 21
2-6-2 color transformation.
2-8 Discrete cosine transform 26 2-8-1 Modeling based on 3-D DCT 26 2-8-1-1 Extraction of lip motion feature 28. 2-8-2 Feature extraction from the target area. 29 2-8-2-1 Extraction of visual features. 30 2-8-3 Cosine transform and LSDA. 31 2-8-3-1 Preprocessing. 2-8-3-2 DCT method.31 2-8-3-3 DCT + PCA.31 2-8-3-4 DCT +LDA 32 Bezier curve. 35
2-10 separation of the lip area with Cam-Meniz. 37
Chapter three: Mouth area extraction methods and detection systems 39 3-1 Introduction 40 3-2 Lip area detection 41 3-2-1 Lip and skin color composition analysis 41 3-2-2 Color and saturation and light intensity (HSV) 42 3-2-3 Removal Red component 43 3-2-4 Cummins algorithm 43 3-2-4-1 Algorithm implementation 44 3-2-5 Brightness and binarization 45 3-2-6 Combined methods 45 3-3 Classification and identification methods 47
3-3-1 neural network 47 3-3-1-1 feedforward networks 48 3-3-1-2 error back propagation algorithm 48 3-3-2 hidden Markov model Chapter 4: Extracting features and implementation of the proposed method and introducing the database. 51
4-1 Database. 52
4-1-1 Separation of recorded videos. 53
4-2 Extracted features. 54
4-4-1 Framing 61 4-4-2 Windowing 62 4-4-3 Discrete Fourier transform 62 4-4-4 Mel scale 62 4-4-5 Discrete cosine transform 64 4-4-5-1 Calculation Cosine and Violet coefficients. 65
4-4-5-2 Calculating Mel-frequency coefficients. 65
4-5 Finding the center of the lip and extracting an area around the lip. 66
4-5-1 Zigzag scan. 67
4-5-2-1 Using the Logsigmoid function and changing the training algorithm 70 4-5-2-2 Using the Tansigmoid function and the momentum algorithm 4-6 Extracting features from different images 4-6-1 Extracting features from new images 4-6-2 Mel coefficients Frequency and cosine coefficients. 72
4-7 Reducing the number of frames and reducing the size of images. 73
4-7-1 Calculating MFCC coefficients
73 4-7-3 Reducing the number of frames and reducing the size of images with the resize command
[1] T Chen, "Audiovisual speech processing". IEEE Signal Processing Magazine, Vol.18(1), pp: 9–21, (2001).
[2] Sadeghi, Vahida Al-Sadat, "Vowel Recognition in Persian Monosyllabic and Bisyllabic Words," Master's Thesis, Semnan University, 1385
[3] E.D.Petajan, "Automatic Lipreading to Enhance Speech Recognition," PhD thesis, University of Illinois at Urbana-Champain, 1984.
[4] M. Kass, A.Witkin, and Terzopoulos, "Snakes: Active Contour Models," International Journal of Computer Vision, pp.321-331,1988.
[5] C. Bregler and Y. Konig, "Eigenlips For Robust Speech Recognition," in Proc. IEEE Conf. Acoustics, Speech and Signal Processing, pp.669-672, 1994.
[6] Takeshi Saitoh and Ryosuke Konishi, "Word Recognition based on Two Dimensional Lip Motion Trajectory," international Symposium on Intelligent Signal Processing and Communication System (ISPACS2006), pp.287-290. 12-15 Dec, 2006
[7] Mir Hadi Seyed Arabi, Ali Agha Golzadeh, Sohrab Khan Mohammadi, "Automatic tracking of lip movements and its special points using active contour", 14th Iranian Electrical Engineering Conference 2006 ICEE.
[8] T.F. Cootes, C.J. Taylor, D.H. Cooper, and J. Graham, "Active Shape Models-Their Training and Application," Computer Vision and Image Understanding, vol. 61, no. 1, pp. 38-59, Jan. 1995
[9] I. Matthews, T. F. Cootes, J. A. Bangham, S. Cox, and R. Harvey, "Extraction of visual features for lipreading," IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 213, Feb. 2002.
, S.H.Leung, et al. "A real-time automatic lipreading system,"
International Symposium on Circuits and Systems, No.2, pp.101-104, IEEE, Vancouver, Canada, May 2004.
[12] D. Thambiratnam, T. Wark, S. Sridharan and V. Chandran, "Speech Recognition in Adverse Environments using Lip Information," Speech and Image Technologies for Computing and Telecommunications, IEEE TENCON 1997, Vol.1, pp.149-152, 4Dec,1997
[13] Tanveer A Faruquie, Abhik Majumdar, Nitendra Rajput, L V Subramaniam,"Large Vocabulary Audio-Visual Speech Recognition Using Active Shape Models," Pattern Recognition, 2000, 15th International Conference, Vol.3, pp.106-109,2000.
[14] A.L.Liew, et al,"Lip contour extraction from color images using a deformable model," The Journal of the Pattern Recognition Society, No.35, 2949-2962, 2002
[15] Stefan Horbelt, Jean-Luc Dugelay, "Active Contours For Lipreading Combining With Templates," 15th GRETST Symposium on Signal and Image processing, pp.18-22, September 1995, France.
[16] Mohammad Mehdi Hosseini, Abdorreza Alavi Gharahbagh and Sedigheh Ghofrani, "Vowel Recognition by Using the Combination of Haar Wavelet and Neural Network," KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems, Part I, pp.331-339, 2010.
[17] M.M, Hosseini, S.Ghofrani, "Automatic Lip Extraction Based On Wavelet Transform," IEEE GCIS, pp.393-396, 2009, China.
[18] Dahai Yu, Ovidiu Ghita, Alistair Sutherland, Paul F. Whelan, "A PCA based Manifold Representation for Visual Speech Recognition," In: CIICT 2007, Proceedings of the China-Ireland International Conference on Information and Communication Technologies, 28-29 August 2007, Dublin, Ireland.
[19] Y. L. Tian and T. Kanade," Robust Lip Tracking by Combining Shape, Color and Motion," Proc. of the Asian Conference on Computer Vision, pp.1040-1045, 2000.