Contents & References of Speaker recognition in multi-speaker environment using support vector machine
List:
Table of Contents
Chapter One: Introduction of speaker recognition systems
1-1-Introduction..2
1-2-Different working stages of speaker recognition systems. 6
1-2-1-Acoustic segment..7
1-2-2-Speech recognition from other Speech... 8
1-2-3-Speaker gender detection..9
1-2-4-Speaker change detection..9
1-3-Speaker segmentation and clustering methods. Model..11
1-3-3-Hybrid or combined methods..11
1-4-Clustering..11
1-5-Summary..12
Chapter Two: Recognition of speech from non-speech areas
2-1-Introduction..14
2-2- Structure of speech recognition from non-speech Speech. 16
2-2-1- Preprocessing.. 16
2-2-2- Feature extraction.. 17
2-2-2-1- Energy.. 18
2-2-2-2- Zero crossing rate.. 19
2-2-2-3- Feature extraction with the help of Capstral frequency coefficients on the mel scale. 19
2-2-2-4- LPC coefficients. 23
2-2-2-5- entropy.. 24
2-2-2-6- Intermittent size. Band..28
2-2-2-8- Other parameters..28
2-2-3- Threshold calculation..29
2-2-4- VAD decisions.
2-2-4-1- Decision based on hidden Markov model. 2-2-4-2- Decision making based on neural networks. 31
2-2-5- Correction of VAD results. 33
2-3- Block diagram of multiple VAD standards.. 33
2-3-1- ETSI AMR standard. 33
2-4-Summary..35
Chapter three: Revealing speaker change
3-1-Introduction..37
3-2-Speaker segmentation..38
3-2-1-Segmentation based on distance..38
3-2-2-Segmentation based on model...40
3-2-3-Hybrid segmentation..40
3-3-Comparison of segmentation methods..40
3-4-Common speaker detection methods..41
3-4-2- Combination of T2 statistics and BIC.45
3-4-2-1- More speed and benefit in T2-BIC segmentation. 47
3-4-3- General likelihood rate interval (GLR..49
3-4-4- KL distance 2.49
3-4-5- Change detection Speaker using DSD. 51. 3-4-6- Cross-BIC (XBIC)...52
3-4-7- Estimation of Gaussian mixture model. (GMM-L). 53
3-5- Summary.. 53
Chapter Four: Classification Methods
4-1-Introduction..55
4-2-Clustering System Components..56
4-3-Clustering Methods..57
4-3-1-Hierarchical Clustering Methods..58
4-3-1-1-Techniques Ascending clustering.. 59
4-3-1-2-Descending clustering techniques. 60
4-3-2- Ascending clustering methods.. 61
4-4- Common clustering methods in speaker clustering systems. 61
4-5- Support vector machine classifier.. 63
4-5-1- Linear support vector machine classifier. 63
4-5-1-1- Classification of separable classes. 4-5-2- Non-linear support vector machines..72
4-6- Summary..74
Chapter five: Implementation and observations of the proposed hybrid system
5-1-Introduction..76
5-2- Structure of the implemented system..77
5-3- Database..80
5-4-Feature extraction..82
5-5-Evaluation criteria of speaker recognition systems..84
5-6-Test results..88
5-6-1- Effect of applying VAD on speech signal..88
5-6-2- Effect of changing VAD window length on system accuracy.89
5-6-3- The effect of changing the length of the BIC window on the segmentation results.Feature vector on the accuracy of the segmentation stage 93 5-6-6-Comparison of the results of the segmentation stage with the use of different feature vectors 95 5-6-7 The effect of gender, speakers, on the correct recognition of segmentation boundaries 96 5-6-8 Accuracy The clustering step of using the support vector machine (SVM) with the MFCC feature vector. 96
5-6-9- The accuracy of the support vector machine clustering step using the root-MFCC feature vector. 97
5-6-10- The effect of changing the kernel function type of the support vector machine on the accuracy of the clustering step. 98
5-7-Summary. 98
Chapter six: Total Classification and suggestions
6-1-Summary and summary of results.100
6-2-Suggestions.101
Resources.
Source:
[1].Xavier.Anguera.Mir, Phd Thesis, "Robust Speaker Diarization for meetings", 2006.
[2].L.Docio, C.Garcia, "Speaker Segmentation, detection and tracking in multi-speaker long audio recordings", Third COST275 Workshop Bimetrics on the internet. 2005.
[3]. Janes.Zibert, B.Vesnicer, F.Mihelie, "A System for speaker detection and tracking in audio broadcast news", IEEE proceeding, pp.51-61, 2008.
[4].A.F.Martin, M.A.Przybocki, "Speaker recognition in a multi-speaker environment", Euro speech 2001 Scandinavia, Coference on Speech Communication and Technology, 2001.
[5]. R.O.Duda, P.E.Hart, D.G.Stork, "Pattern Classification", John Wiley and sons, 2nd edition, 2007.
[6]. Christopher M. Bishop, "Pattern Recognition and Machine learning", pp.738, Springer 2006.
[7]. M.A.Siegler,U.Jain,B.Raj, M.Stern, "Automatic Segmentation, Classification and Clustering of Broadcast News Audio", Proc.DARPA Speech Recognition Workshop, Chantilly, Virginia, pp.97-99, 1997.
[8].S.Chen, P.Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", Proc.Darpa Broadcast News Transcription Understanding Workshop, Lansdowne, VA, USA, pp. 127-132, 1998.
[9].T.Hain, S.E.Johnson, A.Tuerk, P.C.Woodland, S.J.Young, “Segment generation and clustering in the HTK broadcast news transcription system”, Proc.Darpa Broadcast News Transcription and Understanding Workshop, Landsdowne, pp.133-137, 1998.
[10]. J.Amera, C.Wooters, "A Robust speaker clustering algorithm", Proc.ASRU(Automatic Speech Recognition Understanding) Workshop, U.S. Virgin Islands, pp.411-416, 2003
[11].B.Zhou, J.H.L.Hansen, "Unsupervised Audio Stream Segmentation and clustering via the Baysian Information Criterion", Proc. ICSLP, Beijing, China, pp. 714-717, 2000.
[12]. K. Sommez, L. Heck, M. Weintraub, "Speaker Tracking and Detection with Multiple Speakers", Proc. EUROSPEECH, Budapest, Vol. 5, pp. 2219 – 2222, 1999.
[13].P.C.Woodland, T.Hain, S.Johnson, T.Niesler, A.Tuerk, S.B.Young, “Experiments in Broadcast News Transcription”, Proc.ICASSP, Seattle, Washington, pp.909 ff, 1998.
[14].L.Wilcox, F.Chen, D.Kimber, V.Balasubramanian, "Segmentation of Speech Using Speaker Identification", Proc. ICASSP, Adelaide, Australia, Vol, pp. 161-164, 1994.
[15].H.Kim, D.Ertelt, T.Sikora, "Hybrid speaker-based segmentation system using model-level clustering", Proc. ICASSP, Philadelphia, USA, Vol, pp. 745-748, 2005.
[16].H.Kim, T.Sikora, "Automatic Segmentation of Speakers in Broadcast Audio Material", Proc. SPIE, Vol. 5307, pp.429-438, 2003.
[17].P.Yu, F.Seide, C.Ma, E.Chang, "An Improved Model-based Speaker Segmentation System", Proc. EUROSPEECH, Geneva, Switzerland, pp. 2025-2028, 2003.
[18].D.Valj, B.Kacic, B.Horvat, "Usage of frame dropping and frame attenuation algorithms in automatic speech recognition system", IEEE proceeding, pp.149-152, 2003.
[19].J.Faneuff, "Spatial, spectral, and perceptual nonlinear noise reduction for hands-free microphones in a car", Master Thesis Electrical and computer Engineering, 2002.
[20]. L. Karray, C. Mokbel, J.