Contents & References of Speaker recognition in multi-speaker environment using support vector machine
List:
First: Introduction of speaker recognition systems
1-1-Introduction..2
1-2-Different working stages of speaker recognition systems.
1-2-3-Speaker gender detection..9
1-2-4-Speaker change detection..9
1-3-Speaker segmentation and clustering methods.10
1-3-1-Methods based on distance..10
1-3-2-Methods based on model..11
1-3-3-Hybrid or combined methods..11
1-4-Clustering..11
1-5-Summary..12
Chapter Two: Recognizing speech from non-speech areas
2-1-Introduction..14
2-2-Structure of speech recognition from non-speech.16
2-2-1- Pre-processing..16
2-2-2-Feature extraction..17
2-2-2-1-Energy...18
2-2-2-2-Zero crossing rate...19
2-2-2-3- Feature extraction with the help of scale frequency cepstral coefficients. Mel.19
2-2-2-4- LPC coefficients. 23
2-2-2-5- Entropy. 2-2-2-8- Other parameters.. 28
2-2-3- Threshold calculation.. 29
2-2-4- VAD decisions. 29
2-2-4-1- Decision-making based on the hidden Markov model. 31
2-2-5- Correction of VAD results. 33
2-3- Block diagram of several VAD standards. 33
2-3-1- ETSI AMR standard. 33
2-3-2- GSM algorithm. Speaker change detection
3-1-Introduction...37
3-2-Speaker segmentation...38
3-3-Comparison of segmentation methods..40
3-4-Common methods of speaker detection..41
3-4-1- Bayesian information criterion (BIC.41
3-4-1-2- Segmentation using the statistical model of the speaker.
BIC.45
3-4-2-1- More speed and gain in segmentation T2-BIC.47
3-4-3- General Likelihood Rate Distance (GLR..49
3-4-4- KL2.49 distance
3-4-5- Speaker change detection using DSD.51
3-4-6- Cross-BIC (Cross-BIC (XBIC)).
4-1-Introduction..55
4-2-Components of clustering system..56
4-3-Clustering methods..57
4-3-1-Hierarchical clustering methods..58
4-3-1-1-Ascending clustering techniques..59
4-3-1-2- Top-down clustering techniques. 60
4-3-2- Upward clustering methods. 61
4-4- Common clustering methods in speaker clustering systems. 61
4-5- Support vector machine classifier.. 63
4-5-1- Support vector machine classifier. Linear. 63
4-5-1-1- Classification of separable classes. 63
4-5-1-2- Classification of inseparable classes. 68
4-5-1-3- Classification of multi-class data with support vector machines. 71
4-5-2- Non-support vector machines. Linear..72
4-6-Summary..74
Chapter Five: Implementation and observations of the proposed hybrid system
5-1-Introduction..76
5-2-Structure of the implemented system..77
5-3-Data base..80
5-4-Feature extraction..82
5-5-Evaluation criteria of speaker recognition systems..84
5-6-Test results..88
5-6-1- The effect of applying VAD on the speech signal..88
5-6-2- The effect of changing the length of the VAD window on the accuracy of the system.89
5-6-3- The effect of changing the length of the BIC window on the segmentation results.89
d
5-6-4-accuracy.resulting.from.segmentation.on.two.types.of.data usingSegmentation. 93
5-6-6-comparison of the results of the segmentation stage using different feature vectors. 95
5-6-7-the effect of gender, speakers, on the correct identification of the segmentation boundaries. 96
5-6-8-the accuracy of the clustering stage using the support vector machine (SVM) with the feature vector MFCC.96
5-6-9-accuracy of the support vector machine clustering stage using the root-MFCC feature vector.97
5-6-10- the effect of changing the type of support vector machine kernel function on the accuracy of the clustering stage.98
5-7-summary.98
Chapter six: summary and suggestions
6-1-summary and Summary of results. 100
6-2-Recommendations. 101
Resources.
Source:
[1].Xavier.Anguera.Mir, Phd Thesis, "Robust Speaker Diarization for meetings", 2006.
[2].L.Docio, C.Garcia, "Speaker Segmentation, detection and tracking in multi-speaker long audio recordings", Third COST275 Workshop Bimetrics on the internet. 2005.
[3]. Janes.Zibert, B.Vesnicer, F.Mihelie, "A System for speaker detection and tracking in audio broadcast news", IEEE proceeding, pp.51-61, 2008.
[4].A.F.Martin, M.A.Przybocki, "Speaker recognition in a multi-speaker environment", Euro speech 2001 Scandinavia, Coference on Speech Communication and Technology, 2001.
[5]. R.O.Duda, P.E.Hart, D.G.Stork, "Pattern Classification", John Wiley and sons, 2nd edition, 2007.
[6]. Christopher M. Bishop, "Pattern Recognition and Machine learning", pp.738, Springer 2006.
[7]. M.A.Siegler,U.Jain,B.Raj, M.Stern, "Automatic Segmentation, Classification and Clustering of Broadcast News Audio", Proc.DARPA Speech Recognition Workshop, Chantilly, Virginia, pp.97-99, 1997.
[8].S.Chen, P.Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", Proc.Darpa Broadcast News Transcription Understanding Workshop, Lansdowne, VA, USA, pp. 127-132, 1998.
[9].T.Hain, S.E.Johnson, A.Tuerk, P.C.Woodland, S.J.Young, “Segment generation and clustering in the HTK broadcast news transcription system”, Proc.Darpa Broadcast News Transcription and Understanding Workshop, Landsdowne, pp.133-137, 1998.
[10]. J.Amera, C.Wooters, "A Robust speaker clustering algorithm", Proc.ASRU(Automatic Speech Recognition Understanding) Workshop, U.S. Virgin Islands, pp.411-416, 2003
[11].B.Zhou, J.H.L.Hansen, "Unsupervised Audio Stream Segmentation and clustering via the Baysian Information Criterion", Proc. ICSLP, Beijing, China, pp. 714-717, 2000.
[12]. K. Sommez, L. Heck, M. Weintraub, "Speaker Tracking and Detection with Multiple Speakers", Proc. EUROSPEECH, Budapest, Vol. 5, pp. 2219 – 2222, 1999.
[13].P.C.Woodland, T.Hain, S.Johnson, T.Niesler, A.Tuerk, S.B.Young, “Experiments in Broadcast News Transcription”, Proc.ICASSP, Seattle, Washington, pp.909 ff, 1998.
[14].L.Wilcox, F.Chen, D.Kimber, V.Balasubramanian, "Segmentation of Speech Using Speaker Identification", Proc. ICASSP, Adelaide, Australia, Vol, pp. 161-164, 1994.
[15].H.Kim, D.Ertelt, T.Sikora, "Hybrid speaker-based segmentation system using model-level clustering", Proc. ICASSP, Philadelphia, USA, Vol, pp. 745-748, 2005.
[16].H.Kim, T.Sikora, "Automatic Segmentation of Speakers in Broadcast Audio Material", Proc. SPIE, Vol. 5307, pp.429-438, 2003.
[17].P.Yu, F.Seide, C.Ma, E.Chang, "An Improved Model-based Speaker Segmentation System", Proc. EUROSPEECH, Geneva, Switzerland, pp. 2025-2028, 2003.
[18].D.Valj, B.Kacic, B.Horvat, "Usage of frame dropping and frame attenuation algorithms in automatic speech recognition system", IEEE proceeding, pp.149-152, 2003.
[19].J.Faneuff, "Spatial, spectral, and perceptual nonlinear noise reduction for hands-free microphones in a car", Master Thesis Electrical and computer Engineering, 2002.
[20]. L. Karray, C. Mokbel, J.