Methods and equipment for adaptive filtering of speech signal.
Methods and equipment for adaptive filtering of speech signal.
V. I. Zolotarev,
Ph.D., senior researcher.
General information on interference and adaptive filtering.
Recording, analysis and processing of audio information are one of the most important factors in carrying out activities to organize information security. In this case, it is often necessary to process the audio signal in order to improve its quality and intelligibility. When conducting auditory control or receiving tape recordings of a speech signal in real conditions, this signal is affected by various interferences, which reduce the quality of the useful (speech) signal, including its intelligibility, up to and including communication failure. The task of reducing the level of interference in order to restore the meaning of the message is extremely relevant for a number of practical situations.
The effect of interference on the useful signal can be simplified by the following models. Fig. 1a shows a model of the effect of additive noise on the speech signal, i.e. the noise is added to the useful signal. This model corresponds to the situation when the recording is made in an open space and the noise can be wind noise, street and construction noise, etc.
Fig. 1b shows a model of the effect of additive and multiplicative interference. In this case, before information reaches the receiver (the human ear), the additive mixture (speech signal plus acoustic noise) passes through a transmission path that has a frequency-dependent transfer characteristic. Thus, the additive mixture undergoes additional multiplicative distortions: the mixture is multiplied by the resonances of the transmission characteristic of the path (convolved with the impulse response of the path «H»). This model corresponds to signal recording indoors or signal transmission via radio and telephone paths.
Fig. 1a   ; Fig.1b
The task of eliminating or reducing the level of additive and multiplicative interference is complicated by the variability of the characteristics of acoustic interference (noise of wind, foliage, passing transport, music, etc.) and transmission paths (the speaker walks around the room, turns his head, etc.). Thus, to effectively eliminate speech signal distortions, it is necessary for the device performing this function to constantly monitor changes in the characteristics of the interference over time and constantly adjust its impulse response in accordance with these changes. Such capabilities are possessed by devices that use adaptive filtering for the purpose of isolating interference, or rather its assessment, with its subsequent compensation in a mixture of the useful signal and interference.
The distorted signal can be presented as a single-channel signal, i.e. as a mixture of the useful signal and interference (noisy speech signal — ZRS), or as a two-channel signal, when in addition to the main channel — ZRS, there is also a reference channel, the signal in which is as close as possible to the interference present in the ZRS. According to the type of presentation of the input signal, single-channel and two-channel adaptive filtering devices are distinguished. Simplified block diagrams of single- and dual-channel devices are shown in Fig. 2 and 3, respectively, where an adaptive filter or processor is shown, consisting of two blocks: a transverse filter (for calculating the noise estimate «n^») and a LPP processor (for calculating the filter impulse response or the vector of linear prediction coefficients «W») and a separate adder for calculating the compensation result «e».
In the LPC processor, the values of W are calculated in such a way that the value n(j) predicted at time j compensates for the noise component n(j) with a minimum remainder. The values of W, n^ and «e» are calculated at each sampling period. Adjustment to full compensation of the noise component is not performed instantly, but over a certain time (adaptation time), which is regulated using the adaptation coefficient m.
&nb sp; Fig.2
If there is only a single-channel ZRS, compensation is performed according to the scheme in Fig. 2. In this case, the reference signal is formed from the ZRS. According to this scheme, additive noises with periodic components (for example, noises from various motors, engines, music, etc.) can be reduced, and the influence of multiplicative interference, including reverberation distortions, can be reduced.
To perform noise compensation in a two-channel SRS, an adaptive filtering scheme is used, shown in Fig. 3, where the SRS is received through the main channel, and only the noise (n1) correlated with the noise (n) in the SRS is received through the reference channel. The adjustable delay is intended to compensate for the acoustic signal delay that occurs in one of the channels (Fig. 3 shows the compensation of the delay in the main channel). In the presence of a corresponding signal in the reference channel, this scheme can compensate with varying efficiency for virtually any additive noise.
  ; Fig.3
In both variants of the input signal representation, adaptive filtering is performed according to the same procedure. In the digital adaptive filter, at each sampling period, the p projections w(i) of the vector W are calculated and the convolution of W with the input signal is calculated. As a result, at the j-th moment in time, for the original signal x(j), the value of the output signal e(j) is determined, where the interference component is compensated.
The adjustment (adaptation to external conditions) of the vector W is carried out on the principles of optimization according to the criterion of the minimum average value of the output signal module. When obtaining the computational adaptation algorithm, the mathematical apparatus of optimal filtering is used. The convergence of the algorithm is carried out according to the steepest descent method, and, to simplify the calculations, the stochastic approximation of the gradient according to Widrow-Hopf is used.
In the single-channel version, the adaptive decolution algorithm is used to process the ZRS, and in the two-channel version, adaptive compensation. The fundamental difference between the processing options is in the formation of input signals that are used in the subsequent computational procedure. In the single-channel version, both input signals (the main and the reference) are formed from one input signal, with the original input signal being the main one, and the reference signal being formed from the original one using a single delay. In the two-channel version, the main and reference signals actually exist and are directly used in the subsequent computational procedure. The computational procedure itself is the same for both options and has the form:
w(j,i) = w(j-1,i) + m x(j-1-i) Sgn e(j), &nb sp; (1)
where j=1,2,3… — is the current discrete time (each moment in time is separated from the next moment by the sampling period Td); i=1,2,3,…,p is the ordinal number of the projection of the vector W.
In accordance with this algorithm, the LPP processor (see Fig. 2 and 3) calculates (predicts) for each sampling period Td the p linear prediction coefficients (p projections W) for the next j-th moment of discrete time. The adaptation coefficient is used to regulate the convergence rate of the algorithm and, ultimately, the speed of tracking changes in the noise characteristics. The predicted value W(j) is used in the transversal filter processor to calculate an estimate of the noise value n^(j) at the j-th moment of time and the values of the compensated (output) signal e(j):
n^(j) = w(1,j)x(j-1)+…+ w(i,j)x(j-i)+…+ w(p,j)x(j-p) (2)
e(j) = x(j) — n^(j) = s(j)+n(j)-n^(j)   ; (3)
Expression (2) represents a discrete convolution of the input signal with the vector of linear prediction coefficients. As adaptation proceeds, the noise estimate becomes increasingly close to the noise itself, and its compensation in the input signal becomes more complete.
It is worth noting some points that are useful for practical work with a single-channel signal. In the limit, the adaptation of W occurs before the input signal is fully decorrelated, i.e. before white noise is obtained at the output. In this case, it does not matter what kind of interference causes the spectral envelope of the input signal to have irregularities: additive interference with a colored spectrum or convolution with resonances of the transmission path. This also applies to the speech signal itself, which is a product of convolution of the voice and noise excitation sources with the impulse response of the articulatory tract, i.e. If the adaptation rate, which is regulated by the adaptation coefficient m, is chosen incorrectly, it is possible not only to compensate for the interference, but also to significantly distort the speech signal.
In a real situation, the decorrelation of interference (it is assumed that the adaptation rate is chosen wisely and the speech signal in the ZRS does not suffer additional distortions caused by adaptive processing) can never be complete and its depth is limited by the dead zone of the device. In turn, this zone is determined by the constant component (the finiteness of the ADC bit grid, the arithmetic units of the processor and the resolution of the filter «k», k = p Td) and the variable component (the constancy of the interference statistics and the numerical value of the adaptation coefficient). In the limit, with stationary or periodic interference and the adaptation coefficient tending to zero (the adaptation rate is minimal), the dead zone is minimal and is determined only by its constant component.
In the presence of non-stationary interference, for example, musical interference, which can be considered as a frequency-modulated signal whose spectrum is wider than that of a normal signal, and for its decorelating it is necessary to expand the operating frequency band of the device, i.e. to reduce Td, additionally reduce the filter resolution by reducing the number of CLPs (the value of «p»), since the time constant of wideband filters is smaller and they respond faster to a changing input signal and also increase the adaptation rate to track the changing characteristics of the interference when calculating the vector W.
In the presence of multiplicative interference in the form of «steady-state» reverberation, its effect can be compensated for by increasing the filter's resolving power due to both factors and by choosing the average adaptation rate.
Based on the above, we can formulate general requirements for an adaptive filtering device designed to effectively reduce the level of various classes of interference in a single-channel air defense system. This device must have an adjustable operating frequency band, an adjustable number of LPCs and an adjustable adaptation rate, with an upper limit to reduce the impact of adaptive filtering on the speech signal.
Добавить комментарий