Speech signal enhancement in the time-frequency domain

Number of pages: 117 File Format: word File Code: 32182
Year: 2013 University Degree: Master's degree Category: Electrical Engineering
  • Part of the Content
  • Contents & Resources
  • Summary of Speech signal enhancement in the time-frequency domain

    To obtain a master's degree

    Electronic engineering field

    Abstract

    One of the important topics of signal processing (for example, in communication systems, audio signal coding, voice recognition (...) is the reduction and removal of unwanted noise from the original signal and its improvement. For this purpose, in the past decades Extensive research has been done for speech enhancement. Speech enhancement can be investigated depending on the problem, the noise characteristics, and the available facilities. Hence, there are different methods for classifying speech enhancement systems based on single-channel and multi-channel methods. Single-channel methods that have only one input microphone are the most common types of real-time algorithms because they are relatively easy to implement. They are less than systems with multiple input channels. From the family of single-channel methods, we can use spectral subtraction methods, Wiener filter, speech enhancement using statistical models, wavelet transform and so on. He pointed out that each of these methods has disadvantages such as musical noise, distortion and complexity. One of the methods that can improve the defects of these methods is the use of hybrid systems that result from the combination of methods.

    In this thesis, two new hybrid methods based on the single-channel method are proposed for improving speech. Meanwhile, non-speech (silence) parts, which are usually from the first window of the signal, are generally used to measure noise characteristics. The severe weakness of this method is revealed when the first frame is not a silent frame. In this thesis, a new method has been proposed to overcome this problem, and these methods are as follows:

    a) Proposed noise estimation method: Noise estimation has been done using LPC analysis, and in both of the following proposed methods, this method has been used to estimate noise. Wavelet subtraction is obtained from the combination of spectral subtraction and wavelet transform methods, and its noise estimation, which is done by LPC analysis, is improved with genetic algorithm.

    c) The second proposed method: improving audio signals using the mean square error method in the wavelet space, is obtained from the combination of two methods of wavelet transform and minimizing the logarithm of the mean square error (LOG-MMSE). In this method, by using the Fourier transform of the wavelet coefficients of the speech signal impregnated with noise and the noise signal, a logarithmic estimator based on the least mean square error is proposed. According to the results, these methods have performed better both qualitatively and quantitatively, and have been able to improve SNR and MOS, and musical distortion and noise.

    Key words: audio signal enhancement, wavelet transform spectral subtraction, log MMSE estimator, LPC analysis, genetic algorithm

    Preface

    With the growing use of speech systems in practical and everyday applications, the need to maintain the quality of speech has been raised as an inevitable matter. The ideal and noise-free conditions that are considered in laboratory works and simulations are seriously violated in many real applications and their establishment is questioned. For example, the use of mobile phones, hearing aids, speech recognition systems or any speech communication device in a noisy environment are all cases in which maintaining the quality of speech and keeping the efficiency of the system high is of particular importance; Without the use of speech improvement methods, the performance of these systems will be severely degraded and perhaps unacceptable. Therefore, the topic of improving speech as one of the practical and practical necessities has been an active field of research in recent years. In the rest of this chapter, we will briefly examine the issues and topics in the field of speech improvement..

    1-2 Noise speech enhancement: goals, applications, concepts

    In the form of a general definition, the subject of speech enhancement or speech enhancement is an attempt to improve the performance of speech communication systems in cases where the signal is affected by noise, reflections and other destructive factors.

    Need The improvement of speech is caused by the fact that the speech signal:

    Either: it is produced from a source that is in a noisy environment,

    Either: it is affected by the transmission channel and is damaged by the effect of noise or reflection,

    Or: it is contaminated by noise in the receiver.

    (Of course, the origin of destructive factors entering into the problem can be a combination of these three situations.)

    The meaning of improvement in the above definition can be explained by referring to the corrective function of the speech improvement process in the following practical examples:

    Telephone systems: in them, the original speech is caused by background noise or noise in the transmission path and also due to sound reflection Both sides of the conversation will be destroyed.

    Public telephones: which are located in noisy and crowded environments.

    Air-to-ground communication systems: in which the noise of the cockpit destroys the message sent by the pilot.

    Hearing aid: which, as an amplifier, both the speech signal and the noise in the It strengthens the environment and causes discomfort to the user.

    Speech or speaker recognition systems: which are usually trained for clean signals (without noise) and in noisy conditions, face a sharp drop in efficiency and effectiveness.

    and other applications.

    Various applications of speech enhancement have different goals for this The process followed. While in some cases, the aim is to increase speech intelligibility, in other applications, reducing listener fatigue is the ultimate goal of applying the speech improvement process. Perhaps it is possible to use improvement (general quality of speech) as the most comprehensive expression to express the purpose and meaning of speech improvement and to provide a suitable definition of the word (quality) in different applications: a task that is somewhat complicated in practice.

    It is noteworthy that many sources and references in the discussion of speech improvement consider noise as the main destructive factor of the speech signal and investigate the solutions to clean noisy speech. are In this thesis, among the various factors of the speech signal, our attention is focused on noise. Of course, as we will see, some of the investigated methods (adaptive filters) can be equally used in both noise cleaning and reflection removal.

    Speech improvement issues include a wide family of topics that are mediated by:

    Type of noise source

    How noise or signal interact. main

    The number of available channels or microphones

         is determined. Possible interfering noise or signal from noise, music, wind, traffic or. or to be more precise, it has different forms in terms of power spectrum. Each of these types of noise can exert its destructive effect on speech signals in a different way than the other; For example, while noises with a frequency content of 400-500 Hz to 4-5 kHz strongly reduce speech intelligibility, noises with energy in frequencies above 4-5 kHz, although they cause listener fatigue, their effect on speech intelligibility is negligible. In fact, although in many simulations and research works, a white spectrum is considered for noise affecting the speech signal, but in practice, real noise rarely has a white spectrum. On the other hand, the source of noise entering the system can be at the source of the main signal, or in the transmission path or at the receiver. Noise may be added, multiplied or convolved with the original signal. It is also possible for the noise to be independent or dependent on the original signal.

  • Contents & References of Speech signal enhancement in the time-frequency domain

    List:

    Acknowledgment and thanks.

    Abstract. H

    List of figures. Q

    List of tables. P.

    Chapter One: 1

    Introduction. 1

    1-1 Preface. 1

    1-2 Improvement of noisy speech: goals, applications, concepts. 2

    1-3 defining the problem and dividing the methods. 3

    1-4 research innovation. 4

    1-5 Thesis structure. 4

    The second chapter. 5

    backgrounds of speech signal processing. 5

    2-1 How to produce speech in humans. 5

    2-2 Introducing noise and its types. 10

    2-2-1 white noise. 13

    2-2-2 Pink noise. 13

    2-2-3 brown noise. 14

    2-2-4 Industrial noise. 14

    2-3 Time-frequency analysis of the speech signal. 15

    2-3-1 Fourier transform. 15

    2-3-2 Short-time Fourier transform. 17

    2-3-3 multi-precision time-frequency analysis. 20

    2-3-4 one-dimensional wavelet transform. 20

    2-3-4-1 continuous wavelet transform. 20

    2-3-4-1-1 time and frequency accuracies. 22

    2-3-4-1-2 Mathematical relationships of wavelet transform: 22

    2-3-4-1-3 Picture of wavelet transform: 24

    2-3-4-2 discrete wavelet transform. 24

    2-4 genetic optimization algorithm. 28

    2-4-1 about the science of genetics. 28

    2-4-2 The history of genetic science. 29 2-4-3 Natural evolution (Darwin's law of natural selection) and its relationship with artificial intelligence methods 29 2-4-4 Genetic algorithm. 32

    2-4-5 genetic algorithm mechanism. 34

    2-4-6 genetic algorithm operations. 37

    2-4-6-1 coding. 37

    2-4-6-2 Evaluation. 37

    2-4-6-3 combination. 37

    2-4-6-4 jump. 37

    2-4-6-5 decoding. 38

    2-4-7 Algorithm chart with its pseudo code. 38

    2-4-7-1 pseudo code and its explanation. 38

    2-4-7-2 genetic algorithm chart. 40

    2-4-8 objective function. 41

    2-4-9 coding methods. 41

    2-4-9-1 Binary coding. 42

    2-4-9-2 substitution coding. 42

    2-4-9-3 value coding. 43

    2-4-9-4 coding tree. 44

    2-4-10 Showing strings. 45

    2-4-11 population. 46

    2-4-11-1 Creating the initial population. 46

    2-4-11-2 population size. 46

    2-4-12 calculation of fitness (value function) 47

    2-4-13 types of selection methods. 48

    2-4-13-1 Roulette wheel selection. 49

    2-4-13-2 Selection of steady state. 51

    2-4-13-3 The choice of elitism. 51

    2-4-13-4 competitive selection. 51

    2-4-13-5 Choosing to cut off the head. 52

    2-4-13-6 Brindle's definitive choice. 52

    2-4-13-7 Choice of modified generation replacement. 53

    2-4-13-8 Match selection. 53

    2-4-13-9 Random match selection. 53

    2-4-14 Types of composition methods. 53

    2-4-14-1 binary transfer. 54

    2-4-14-2 Real displacement. 56

    2-4-14-3 single point combination. 57

    2-4-14-4 two-point combination. 58

    2-4-14-5 n point combination. 58

    2-4-14-6 uniform composition. 58

    2-4-14-7 Arithmetic composition. 59

    2-4-14-8 order. 59

    2-4-14-9 cycles. 60

    2-4-15 possibility of combination. 60

    2-4-16 Analysis of displacement mechanism. 61

    2-4-17 mutation. 61

    2-4-17-1 Binary mutation. 63

    2-4-17-2 True mutation. 64

    2-4-17-3 Bit inversion. 64

    2-4-17-4 Changing the placement order. 64

    2-4-17-5 Inversion. 64

    2-4-17-6 Value change. 65

    2-4-18 The benchmark for the conclusion of the genetic algorithm implementation. 65

    2-4-19 strengths of genetic algorithms. 66

    2-4-20 Limitations of GAs. 68

    2-5 Analysis of linear prediction coefficients (LPC) 69

    2-5-1 Calculation of LPC coefficients. 70

    The third chapter. 73

    A review of major methods of speech improvement. 73

    3-1 Introduction. 73

    3-2 Spectral subtraction method. 74

    3-3 Wiener filter method. 76

    3-4 speech improvement using statistical models. 78

    3-4-1 Logarithmic estimator based on minimizing mean square error (Log MMSE) 78

    3-4-2 Using hidden Markov model (HMM) for speech enhancement. 80

    3-5 methods under the signal space. 82

    3-6 Speech enhancement using wavelet transform. 83

    3-7 Comparison of methods and examination of strengths and weaknesses. 85

    3-7-1 Review85

    3-7-1 Comparative studies conducted between some speech optimization methods 86

    2-3-2 A summary of the characteristics and strengths and weaknesses of different methods. 87

    3-8 important points and considerations in the design of the speech improvement system. 89

    3-8-1 Use of combined systems. 89

    3-8-2 Use of sub-band processing and its benefits. 89

    3-8-3 Using the second microphone. 90

    Chapter Four: Suggested methods. 92

    4-1 Introduction. 92

    2-4 Suggested methods. 93

    4-2-1 Improving audio signals using genetic algorithm and LPC analysis in wavelet subtraction method. 93

    4-2-1-1 Wavelet coefficients spectral subtraction method (WSS) 94

    4-2-1-2 Modification of the wavelet coefficients spectral subtraction method (IWSS) 95

    4-2-1-3 Noise estimation. 96

    4-2-1-4 genetic algorithm. 97

    4-2-1-4-1 Selection operator. 97

    4-2-1-4-2 cutting operator. 98

    4-2-1-4-3 mutation operator. 98

    4-2-1-4-4 initial population. 98

    4-2-1-4-5 objective function. 98

    4-2-2 Improvement of audio signals using mean square error method in wavelet space 98

    4-2-2-1 Log MMSE estimator in wavelet space. 99

    4-2-2-2 noise estimation. 100

    Chapter Five: Results and Experiments. 101

    5-1 Introduction. 101

    5-2 Implementation details. 102

    3-5 results of audio signal improvement using genetic algorithm and LPC analysis in wavelet subtraction method. 103

    4-5 results of sound signal improvement using the mean square error method in wavelet space 106

    Chapter 6: conclusions and suggestions. 109

    6-1 Conclusion. 109

    6-2 Suggestions for future work. 111

    References   112

    Source:

     

    [1] P.C. Loizou, Speech Enhancement: Theory and Practice, CRC Press, Boca Raton, FL, 2007. section 3, p 46

    [2] Borden, G., Harris, K., and Raphael, L., Speech science Primier, 3rd ed., Baltimore, MID: Williams and Wilkins.

    [3] Rosenberg, A., Effect of glottal pulse shape on the quality of natural vowels, J. Acoust. Soc. Am., 49(2), 583-588 (1971).

    [4] S. G. Mallat, "A Theory for Multiresolution Signal Decomposition: The Wavelet Representation," IEEE Transactions on Pattern Analysis Machine Intelligence, July 1989.

    [5] A. Sayadi, "Preliminary Introduction to Wavelet Transform," School of Electrical Engineering, Sharif University of Technology, Isfand 87.

    [6] C. S. Burrus, R. A. Gopinath and H. Guo, Introduction to Wavelets and the Wavelet Transforms: A Primer. Upper Saddle River, Prentice Hall, Inc., 1998.

    [7] www.fa.wikipedia.org/wiki/ zhentik.htm

    [8] www.en.wikipedia.org/wiki/Gergor_Mandel.htm

    [9] 1387 Abid, Tehran, "Genetic Algorithms and Optimization of Composite Structures", Bavari A, Salehi M.

    [10] www.fa.wikipedia.org/wiki/Algorithm_Jhentik.htm

    [11] JamJam Newspaper - Click Addendum - Number 209, Tehran, "Research on Genetic Algorithm", Shahmiri A 1387

    [12] "Applicable training of Genetic Algorithm in MATLAB software" Rezaei A, Ranjbaran 1386, Tehran, Azar Publications

    [13] D.A Coley.,"An Introduction to genetic Algorithms for scientists and engineers", word scientific, 2000

    [14] Goldberg D.E,"Genetic Algorithms in Search Optimization and Machine Learning", Addisson Wesley Longman Inc., 1997.

    [15] S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Process, 1979.

    [16] M. Berouti, R. Schwartz, and J. Makhoul, enhancement of speech corrupted by acoustic noise, proc. IEEE ICASSP, Washington DC, April 1979, 208-211.

    [17] Zenton Goh, Kah-Chye Tan, B T G Tan, "Postprocessing Method for Suppressing Musical Noise Generated by Spectral Subtraction" IEEE trans. on Speech and Audio Processing 1998

    [18] Bodin, P. and Villemoes, L. F. (1997). Spectral subtraction in the time-frequency domain using wavelet packets. IEEE Workshop on Speech Coding for Telecommunications. Proceedings, New York, IEEE, 47{48.

Speech signal enhancement in the time-frequency domain