Word Files
Reference for Downloading Educational Files

Speaker recognition in multi-speaker environment using support vector machine

Number of pages: 118 File Format: word File Code: 32227
Year: 2011 University Degree: Master's degree Category: Electronic Engineering

Tags/Keywords: Audio sections - Identify the speaker - MFCC - Segmentation system - Speakers clustering - Statistical segmentation of the speaker - the speaker

Part of the Content
Contents & Resources

Summary of Speaker recognition in multi-speaker environment using support vector machine

Electronics group

Master thesis

Abstract:

Speaker identification is one of the topics discussed in speech processing. Speaker identification is the process of identifying who is really speaking and when using the speech signal. The goal is to design a system that can identify the change in the speaker and tag each speaker's speech for the system. It means to specify which speaker spoke in which intervals. Today, this practice has been popularized by a new title that encompasses both the process of separation and labeling called Speaker Diarization. The purpose of segmentation is to divide the speech signal into parts that only contain the speech of one speaker, and the purpose of clustering is to identify the speech parts of a speaker and assign a single label to them. The purpose of this thesis is to design and implement a speaker segmentation and clustering system using new algorithms and also to improve the results of these algorithms for this issue. This system must correctly recognize the change points of the speaker without knowing the previous information about the speaker and finally put all the audio parts related to a speaker in one cluster. In the first step, the non-speech parts are removed from the speech parts of the audio file, in order to increase the accuracy and speed of the system operation in the next steps. Then the speech file is divided into homogeneous parts in which there is only one speaker's speech. In the third step, using appropriate clustering, the speech parts of the previous step, which belong to a speaker, are placed in a cluster. To implement the system, four types of MFCC feature vectors root-MFCC, TDC, and root-TDC and three types of databases have been used, and the accuracy of the segmentation stage was 80%, and the accuracy of the clustering stage was 59% using the support vector machine. rtl;">speaker segmentation

audio segments recognition

speaker clustering

introduction

Today, multimedia data include a significant part of human knowledge. The amount of multimedia files archived in various institutions has increased significantly in recent years. The accessibility and clarity of these files can be of great help to people who are looking for information. Therefore, searching and retrieving information in this high volume is a task that requires a computer system. And as a result, one of the research areas that has recently received attention is related to the structuring of multimedia files. Among these data, voice information is more important. Because most of the archives contain audio data from TV and radio reports as well as phone conversations. In recent years, extensive research has been started in this field and acceptable results have been obtained. Among the other uses of this field in identifying the guilty, separating the important words of a witness or accused in the court and so on. It can be mentioned.

In the audio application, the main information in the files is the speech of a number of speakers, and the purpose of the final system is to answer the question of who spoke at what times? Different parts of this research field have different names such as: Speaker segmentation [1], speaker detection [2], robust transcription [3], and speaker indexing [4] have been called. Such systems are used for easy movement of audio data in long audio files (such as: news, meetings and meetings of a company, etc.) that belong to several speakers. Long radio conversations and calculations are environments in which several speakers are present and talk to each other. The ultimate goal of such systems is to implement appropriate methods to distribute audio files to areas where a particular speaker has spoken. Easy access to parts of a speaker's speech is provided by this system.

With the increase in the number of text documents available on the Internet, the need for techniques such as text indexing in order to facilitate access and search in these documents increased. Similar to this need, with the increase in the number of audio documents such as lectures, interviews and gatherings. was created Obviously, accessing audio documents is much more difficult than accessing text, and listening to a recorded audio file is more time-consuming than reading text, and manual indexing of audio documents is difficult compared to text indexing. The proposed solution to solve this problem is the automatic cataloging of audio documents[5]. In 2001, Pelkan and Sidharun and their group improved the results of the system by reducing the effect of noise on the signal and led to better speaker separation. In 2005, Boulian and Kenny obtained different results by using other feature vectors (or integrating previous methods) and using Gaussian models in the system. In 2005, Yamashita and Matsunaga improved the speaker segmentation results of this system by using audio signal features such as signal pitch frequency, energy, signal maximum frequencies, and three other features.[1] And in the following years, by performing different methods on its different parts, until today these systems have been completed and the results have been improved.

The purpose of this thesis is to design and implement a system that can identify the change in the speaker in an audio file that includes the speech of several speakers and, as far as possible, categorize the speech of each speaker without knowing his previous information. This system can include two basic parts, which are: - Speaker segmentation - Speaker clustering - The work of the segmentation part[6] is to divide the speech signal into segments that only contain the speech of one speaker. In the clustering stage [7], the speech parts related to a speaker are identified and categorized and a single label is assigned to it. This article is used in many speech applications that are related to speech recognition or indexing[8] in an environment where several speakers may speak, such as a meeting, conference, news, and the like. This work can not only help advanced speech recognition systems to improve the results of group recognition, but also help them in identifying and transcribing conversations. as information varies depending on who utters the spoken words. Within the speech technologies, the broad topic of acoustic indexing studies the classification of sounds into different classes/sources. Algorithms used for acoustic indexing worry about the correct classification of the sounds, but not necessarily about the correct separation of them when more than one exist in the same audio segment. These purely classification techniques have sometimes been called audio clustering, which benefits from the broad topic of clustering, well studies in many areas. When multiple sounds appear in the same audio signal one must turn his attention to techniques called as audio diarization to process them. These can include particular speakers, music, background noise sources.

When the possible classes correspond to the different speakers in a recording these techniques

are called speaker diarization. Speaker diarization can be defined in terms of being a subtype of audio diarization, where the speech segments of the signal are broken into the different speakers. They aim at answering the question "Who spoke when?" given an audio signal. Algorithms doing speaker diarization need to locate each speaker turn and assign them to the appropriate speaker cluster. The output of the system is a set of segments with a unique ID assigned to each person who intervenes in the recording.
Contents & References of Speaker recognition in multi-speaker environment using support vector machine

List:

First: Introduction of speaker recognition systems

1-1-Introduction..2

1-2-Different working stages of speaker recognition systems.

1-2-3-Speaker gender detection..9

1-2-4-Speaker change detection..9

1-3-Speaker segmentation and clustering methods.10

1-3-1-Methods based on distance..10

1-3-2-Methods based on model..11

    1-3-3-Hybrid or combined methods..11

1-4-Clustering..11

1-5-Summary..12

Chapter Two: Recognizing speech from non-speech areas

2-1-Introduction..14

2-2-Structure of speech recognition from non-speech.16

    2-2-1- Pre-processing..16

     2-2-2-Feature extraction..17

         2-2-2-1-Energy...18

          2-2-2-2-Zero crossing rate...19

          2-2-2-3- Feature extraction with the help of scale frequency cepstral coefficients. Mel.19

         2-2-2-4- LPC coefficients. 23

2-2-2-5- Entropy. 2-2-2-8- Other parameters.. 28

2-2-3- Threshold calculation.. 29

2-2-4- VAD decisions. 29

         2-2-4-1- Decision-making based on the hidden Markov model. 31

2-2-5- Correction of VAD results. 33

2-3- Block diagram of several VAD standards. 33

2-3-1- ETSI AMR standard. 33

2-3-2- GSM algorithm. Speaker change detection

3-1-Introduction...37

3-2-Speaker segmentation...38

3-3-Comparison of segmentation methods..40

3-4-Common methods of speaker detection..41

3-4-1- Bayesian information criterion (BIC.41

         3-4-1-2- Segmentation using the statistical model of the speaker.

BIC.45

      3-4-2-1- More speed and gain in segmentation T2-BIC.47

3-4-3- General Likelihood Rate Distance (GLR..49

     3-4-4- KL2.49 distance

     3-4-5- Speaker change detection using DSD.51

     3-4-6- Cross-BIC (Cross-BIC (XBIC)).
4-1-Introduction..55

4-2-Components of clustering system..56

4-3-Clustering methods..57

4-3-1-Hierarchical clustering methods..58

4-3-1-1-Ascending clustering techniques..59

       4-3-1-2- Top-down clustering techniques. 60

4-3-2- Upward clustering methods. 61

4-4- Common clustering methods in speaker clustering systems. 61

4-5- Support vector machine classifier.. 63

4-5-1- Support vector machine classifier. Linear. 63

4-5-1-1- Classification of separable classes. 63

4-5-1-2- Classification of inseparable classes. 68

4-5-1-3- Classification of multi-class data with support vector machines. 71

4-5-2- Non-support vector machines. Linear..72

4-6-Summary..74

Chapter Five: Implementation and observations of the proposed hybrid system

5-1-Introduction..76

5-2-Structure of the implemented system..77

5-3-Data base..80

5-4-Feature extraction..82

5-5-Evaluation criteria of speaker recognition systems..84

5-6-Test results..88

5-6-1- The effect of applying VAD on the speech signal..88

5-6-2- The effect of changing the length of the VAD window on the accuracy of the system.89

5-6-3- The effect of changing the length of the BIC window on the segmentation results.89

d

5-6-4-accuracy.resulting.from.segmentation.on.two.types.of.data usingSegmentation. 93

5-6-6-comparison of the results of the segmentation stage using different feature vectors. 95

5-6-7-the effect of gender, speakers, on the correct identification of the segmentation boundaries. 96

5-6-8-the accuracy of the clustering stage using the support vector machine (SVM) with the feature vector MFCC.96

5-6-9-accuracy of the support vector machine clustering stage using the root-MFCC feature vector.97

5-6-10- the effect of changing the type of support vector machine kernel function on the accuracy of the clustering stage.98

5-7-summary.98

Chapter six: summary and suggestions

6-1-summary and Summary of results. 100

6-2-Recommendations. 101

Resources.

Source:

[1].Xavier.Anguera.Mir, Phd Thesis, "Robust Speaker Diarization for meetings", 2006.

[2].L.Docio, C.Garcia, "Speaker Segmentation, detection and tracking in multi-speaker long audio recordings", Third COST275 Workshop Bimetrics on the internet. 2005.

[3]. Janes.Zibert, B.Vesnicer, F.Mihelie, "A System for speaker detection and tracking in audio broadcast news", IEEE proceeding, pp.51-61, 2008.

[4].A.F.Martin, M.A.Przybocki, "Speaker recognition in a multi-speaker environment", Euro speech 2001 Scandinavia, Coference on Speech Communication and Technology, 2001.
[5]. R.O.Duda, P.E.Hart, D.G.Stork, "Pattern Classification", John Wiley and sons, 2nd edition, 2007.

[6]. Christopher M. Bishop, "Pattern Recognition and Machine learning", pp.738, Springer 2006.

[7]. M.A.Siegler,U.Jain,B.Raj, M.Stern, "Automatic Segmentation, Classification and Clustering of Broadcast News Audio", Proc.DARPA Speech Recognition Workshop, Chantilly, Virginia, pp.97-99, 1997.

[8].S.Chen, P.Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", Proc.Darpa Broadcast News Transcription Understanding Workshop, Lansdowne, VA, USA, pp. 127-132, 1998.

[9].T.Hain, S.E.Johnson, A.Tuerk, P.C.Woodland, S.J.Young, “Segment generation and clustering in the HTK broadcast news transcription system”, Proc.Darpa Broadcast News Transcription and Understanding Workshop, Landsdowne, pp.133-137, 1998.

[10]. J.Amera, C.Wooters, "A Robust speaker clustering algorithm", Proc.ASRU(Automatic Speech Recognition Understanding) Workshop, U.S. Virgin Islands, pp.411-416, 2003

[11].B.Zhou, J.H.L.Hansen, "Unsupervised Audio Stream Segmentation and clustering via the Baysian Information Criterion", Proc. ICSLP, Beijing, China, pp. 714-717, 2000.

[12]. K. Sommez, L. Heck, M. Weintraub, "Speaker Tracking and Detection with Multiple Speakers", Proc. EUROSPEECH, Budapest, Vol. 5, pp. 2219 – 2222, 1999.

[13].P.C.Woodland, T.Hain, S.Johnson, T.Niesler, A.Tuerk, S.B.Young, “Experiments in Broadcast News Transcription”, Proc.ICASSP, Seattle, Washington, pp.909 ff, 1998.

[14].L.Wilcox, F.Chen, D.Kimber, V.Balasubramanian, "Segmentation of Speech Using Speaker Identification", Proc. ICASSP, Adelaide, Australia, Vol, pp. 161-164, 1994.

[15].H.Kim, D.Ertelt, T.Sikora, "Hybrid speaker-based segmentation system using model-level clustering", Proc. ICASSP, Philadelphia, USA, Vol, pp. 745-748, 2005.

[16].H.Kim, T.Sikora, "Automatic Segmentation of Speakers in Broadcast Audio Material", Proc. SPIE, Vol. 5307, pp.429-438, 2003.

[17].P.Yu, F.Seide, C.Ma, E.Chang, "An Improved Model-based Speaker Segmentation System", Proc. EUROSPEECH, Geneva, Switzerland, pp. 2025-2028, 2003.

[18].D.Valj, B.Kacic, B.Horvat, "Usage of frame dropping and frame attenuation algorithms in automatic speech recognition system", IEEE proceeding, pp.149-152, 2003.

[19].J.Faneuff, "Spatial, spectral, and perceptual nonlinear noise reduction for hands-free microphones in a car", Master Thesis Electrical and computer Engineering, 2002.

[20]. L. Karray, C. Mokbel, J.

How To Access The File

Speaker recognition in multi-speaker environment using support vector machine

Number of pages: 117 Category: Electronic Engineering

Electronics Department of Master's Thesis Abstract: Speaker identification is one of the topics discussed in speech processing. Speaker identification is the process of identifying who is really speaking and when using the speech signal. The goal is to design a system that can identify the change in the speaker and tag each speaker's speech for the system. It means to specify ...

Presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means

Number of pages: 102 Category: Industrial Engineering

Master's thesis in the field of automation engineering and precision instruments, abstract presenting a new method in information clustering using a combination of bat algorithm and Fuzzy c-means. The similarity between data within each cluster is maximum and the similarity between data within different clusters is minimum. Fuzzy c-means is also a fuzzy clustering technique that ...

Analysis of the content of social education books for the fourth and fifth grades of elementary school based on metacognitive skills

Number of pages: 142 Category: Educational Sciences

Abstract The general purpose of the research is to analyze the content of social studies books for the fourth and fifth grades of elementary school based on metacognitive skills. Descriptive research method and basically a ""research"" Content analysis " is cup

Extraction of time-frequency feature for visual identification of Persian vowels

Number of pages: 107 Category: Electronic Engineering

Master's thesis in electrical engineering, electronics major. Abstract: In this thesis, a method for identifying Persian vowels in monosyllabic words is presented. For this purpose, after separating the image frames and selecting the frames that were related to the pronunciation of the vowel in the monosyllabic word, as well as extracting the area around the lips, various ...

Extracting time-frequency feature for visual identification of Persian vowels

Number of pages: 102 Category: Electronic Engineering

Identifying the appropriate features in the text to resolve semantic ambiguity

Number of pages: 87 Category: Computer Engineering

Master's thesis in the field of computer engineering - software abstract identifying the appropriate characteristics in the text to resolve the semantic ambiguity, it can be boldly claimed that the present age is the age of information explosion and perhaps language can be considered as the most important barrier and obstacle in the transmission of information. Therefore, the ...

Content analysis of religious Facebook pages

Number of pages: 140 Category: Social Sciences - Sociology

Dissertation in Master's Degree Abstract The subject of this research is the analysis of the content of religious Facebook pages, and during it, an attempt has been made to analyze the structure of religious pages in the Facebook space, considering the importance of Facebook in the virtual space and also the place of religion in life. The existence of many religious pages raises ...

A comparative study of the word Malik in the Holy Quran and the image of the angel in Shahnameh Tahmasabi

Number of pages: 109 Category: Art - Graphics

Dissertation in Master's Degree in Handicrafts, Traditional Arts Research, Abstract: Some of God's high-ranking creatures, which cannot be seen by humans under normal conditions, are called angels. The Holy Quran deals more with the subject of angels than other heavenly books, and for this reason, it is considered a rich source for knowing these unseen beings. In various verses ...

Evaluation of seventh grade English textbook based on Tomlinson's model

Number of pages: 112 Category: Educational Sciences

Dissertation for receiving a master's degree in general orientation. Abstract Textbooks play a very vital role in the teaching-learning process, especially in societies where English is considered a foreign language, so it is necessary to evaluate textbooks so that the content of textbooks is appropriate to the needs of teaching and learning in English as a foreign language ...

Presenting and comparing three two-stage models for customer segmentation based on their value using K-Means, SOM and RFM data mining tools (Case study: Iran Apple Center chain stores)

Number of pages: 124 Category: Industrial Engineering

Dissertation for Master's Degree in Industrial Engineering, Mehr 2011, Identifying the value[1] of customers, one of the main components of success in the store. There are different types that are more attention today than before

Speaker recognition in multi-speaker environment using support vector machine

Summary of Speaker recognition in multi-speaker environment using support vector machine

Contents & References of Speaker recognition in multi-speaker environment using support vector machine