Methods and equipment for adaptive sound filtering.

Methods and equipment for adaptive sound filtering

V. I. Zolotarev,
Head of Department «Business Security»,
PhD, Senior Researcher

General information about interference and adaptive filtering
Recording, analysis and processing of audio information are one of the most important factors in carrying out activities to organize information security. In this case, it is often necessary to process the audio signal in order to improve its quality and intelligibility.

When conducting auditory control or receiving tape recordings of a speech signal in real conditions, this signal is affected by various interferences, which reduce the quality of the useful (speech) signal, including its intelligibility, up to and including communication failure. The task of reducing the level of interference in order to restore the meaning of the message is extremely relevant for a number of practical situations.

The effect of interference on the useful signal can be simply represented by the following models. Figure 1a shows a model of the effect of additive noise on the speech signal, i.e. the noise is added to the useful signal. This model corresponds to the situation when the recording is made in an open space and the interference can be wind noise, street and construction noise, etc. Figure 1b shows a model of the effect of additive and multiplicative interference. In this case, before the information arrives at the receiver (the human ear), the additive mixture (speech signal plus acoustic noise) passes through a transmission path with a frequency-dependent transfer characteristic.
Thus, the additive mixture undergoes additional multiplicative distortion: the mixture is multiplied by the resonances of the transfer characteristic of the path (convolved with the impulse response of the path «H«). This model corresponds to the recording of a signal indoors or the transmission of signals via radio and telephone paths.

Fig. 1a &nbsp ; Fig.1b

The task of eliminating or reducing the level of additive and multiplicative interference is complicated by the variability of the characteristics of acoustic interference (wind noise, foliage, passing vehicles, music, etc.) and transmission paths (the speaker walking around the room, turning his head, etc.).
Thus, for effective elimination of speech signal distortions, it is necessary for the device performing this function to constantly monitor changes in the characteristics of interference over time and constantly adjust its impulse response in accordance with these changes. Such capabilities are possessed by devices using adaptive filtering for the purpose of isolating interference, or rather, evaluating it, with its subsequent compensation in a mixture of the useful signal and interference.

The distorted signal can be presented as a single-channel signal, i.e. as a mixture of the useful signal and interference (noisy speech signal — ZRS), or as a two-channel signal, when in addition to the main channel — ZRS, there is also a reference channel, the signal in which is as close as possible to the interference present in the ZRS.

Depending on the type of input signal representation, single-channel and dual-channel adaptive filtering devices are distinguished. Simplified block diagrams of single- and dual-channel devices are shown in Fig. 2 and 3, respectively, where an adaptive filter or processor consisting of two blocks is shown: a transversal filter (for calculating the noise estimate «n^«) and a LPP processor (for calculating the filter impulse response or the vector of linear prediction coefficients «W«) and a separate adder for calculating the compensation result «e«.

In the LPC processor, the values of W are calculated in such a way that the value of n(j) predicted at time j compensates for the noise component n(j) with a minimum remainder. The values of W, n^ and «e» are calculated at each sampling period. Adjustment to full compensation of the noise component is carried out not instantly, but over a certain time (adaptation time), which is regulated using the adaptation coefficient m.

metodi i apparatura adaptivnoi filtracii zvuka 2

&nb sp; Fig.2

If there is only a single-channel ZRS, compensation is performed according to the scheme in Fig. 2. In this case, the reference signal is formed from the ZRS. According to this scheme, additive noises with periodic components (for example, noises of various motors, engines, music, etc.) can be reduced, and the influence of multiplicative interference, including reverberation distortions, can also be reduced.

To perform noise compensation in a two-channel ZRS, an adaptive filtering scheme is used, shown in Fig. 3, where the ZRS comes through the main channel, and only noise —n1 comes through the reference channel.«, correlated with the noise «n» in the SAM system. The adjustable delay is intended to compensate for the acoustic signal delay that occurs in one of the channels (Fig. 3 shows the delay compensation in the main channel). In the presence of a corresponding signal in the reference channel, this scheme can compensate with varying efficiency for almost any additive noise.

metodi i apparatura adaptivnoi filtracii zvuka 3

Fig. 3

In both variants of representing the input signal, adaptive filtering is carried out according to the same procedure. In the digital adaptive filter, at each sampling period, the calculation of «p» projections w(i) of the vector Wand calculating the convolution of W with the input signal. As a result, at the j-th moment in time for the original signal x(j), the value of the output signal e(j) is determined, where the noise component is compensated.

Adjustment (adaptation to external conditions) of the vector Wis carried out on the principles of optimization according to the criterion of the minimum average value of the output signal module. When obtaining the computational adaptation algorithm, the mathematical apparatus of optimal filtering is used. The convergence of the algorithm is carried out according to the steepest descent method, and, to simplify the calculations, the stochastic approximation of the gradient according to Widrow-Hopf is used.

In the single-channel version, the adaptive dеcоnvоlution algorithm is used to process the ZRS(«scan»), and in the two-channel — adaptive compensation. The fundamental difference between the processing options is in the formation of input signals that are used in the subsequent computational procedure. In the single-channel option, both input signals (main and reference) are formed from one input signal, while the original input signal is the main one, and the reference is formed from the original using a single delay. In the two-channel option, the main and reference signals actually exist and are directly used in the subsequent computational procedure. The computational procedure itself is the same for both options and has the form:

w(j,i) = w(j-1,i) + m x(j-1-i) Sgn e(j), &nbsp ; (1)

where j=1,2,3… — current discrete time (each moment of time is separated from the next moment by the sampling period Td); i=1,2,3,…,p — ordinal number of the projection of the vector W.

In accordance with this algorithm, the LPC processor (see Fig. 2 and 3) for each sampling period — Td calculates (predicts) for the next j-th moment of discrete time «p» of the linear prediction coefficients («p» of the W projections). With the help of the adaptation coefficient, the convergence rate of the algorithm is regulated and, ultimately, the speed of tracking changes in the characteristics of the interference.

The predicted value of W(j) is used in the transversal filter processor to calculate the interference estimate — n^(j) at the j-th moment in time and the value of the compensated (output) signal е(j):

n^(j) = w(1,j)x(j-1)+…+ w(i,j)x(j-i)+…+ w(p,j)x(j-p) (2)
e(j) = x(j) — n^(j) = s(j)+n(j)-n^(j) (3)

Expression (2) is a discrete convolution of the input signal with the vector of linear prediction coefficients. As adaptation proceeds, the noise estimate becomes increasingly closer to the noise itself. and its compensation in the input signal becomes more complete.

It is worth noting some points that are useful for practical work with a single-channel signal. In the limit, the adaptation of W occurs until the input signal is fully decorrelated, i.e. until white noise is obtained at the output. In this case, it does not matter what kind of interference causes the spectral envelope of the input signal to have irregularities: additive interference with a colored spectrum or convolution with resonances of the transmission path.
This also applies to the speech signal itself, which is a product of the convolution of the voice and noise excitation sources with the impulse response of the articulatory tract, i.e. if the adaptation speed, which is regulated by the adaptation coefficient m, is chosen poorly, it is possible not only to compensate for the interference, but also to significantly distort the speech signal.

In a real situation, the decorrelation of interference (it is assumed that the adaptation rate is chosen wisely and the speech signal in the ZRS does not suffer additional distortions caused by adaptive processing) can never be complete and its depth is limited by the dead zone of the device. In turn, this zone is determined by the constant component (the finiteness of the ADC bit grid, the arithmetic units of the processor and the resolution of the filter «k», k = p Td) and the variable component (the constancy of the interference statistics and the numerical value of the adaptation coefficient). In the limit, with stationary or periodic interference and the adaptation coefficient tending to zero (the adaptation rate is minimal), the dead zone is minimal and is determined only by its constant component.

In the presence of non-stationary interference, for example, musical interference, which can be considered as a frequency-modulated signal whose spectrum is wider than that of a normal signal, and for its decorelating it is necessary to expand the operating frequency band of the device, i.e. to reduce Td, additionally reduce the filter resolution by reducing the number of CLPs (the value of «p«), since the time constant of wideband filters is smaller and they respond faster to a changing input signal and also increase the adaptation speed to track changing noise characteristics when calculating the vector W.

In the presence of multiplicative interference in the form of «settled» reverberation, its effect can be compensated by increasing the filter's resolution due to both factors and by selecting the average adaptation rate.

Based on the above, we can formulate general requirements for an adaptive filtering device designed to effectively reduce the level of various classes of interference in a single-channel air defense system. This device must have an adjustable operating frequency band, an adjustable number of LPCs, and an adjustable adaptation rate, with an upper limit to reduce the impact of adaptive filtering on the speech signal.

Digital adaptive filters DAF-P
The developers of «Business Security», with 20 years of experience in developing digital adaptive filtering devices for various purposes, have begun releasing two new models of digital adaptive filters since autumn 1998, which fully meet the above requirements.
This is a two-channel digital adaptive filter DAF-P, model 3413, operating in single-channel mode, as well as its single-channel version DAF-P, model 3414 in two modifications: DAF-P-500 and DAF-P-1000 with a maximum number of linear prediction coefficients of 670 and 1340, respectively.

The devices are portable, operate in real time and can be used both for direct auditory monitoring and for processing magnetic recordings. They are designed to improve the intelligibility of speech signals distorted by radio and telephone lines, musical and background noise (bars, restaurants, network interference), some types of transport interference, reverberation (various rooms), low-frequency noise of the tape recorder mechanism. In two-channel mode, the intelligibility of speech signals contaminated with any interference is increased, in the presence of the corresponding interference in the reference channel. The efficiency (interference suppression depth) can reach 20 dB on such interference as the speaker's speech or music. The intelligibility is increased due to automatic noise suppression in the noisy speech signal. The devices are extremely easy to use and do not require special skills and knowledge from the operator.

Unlike the previously released products of the KORS, PAKORS, Signal-AP series and the portable models 3412 and 3451, the products under consideration have unique capabilities for matching variable parameters (parameters and their characteristics that can be changed by the operator) of these products with the frequency-time characteristics of noise and distortion present in noisy speech signals.
These capabilities are achieved not only by changing the number of LPCs, but also by changing the bandwidth of the processed signal, which allows for a sharp increase in the efficiency of compensation for boundary interference such as fast music or «established» reverberation distortions characterized by a high degree of «jaggedness» of the frequency response. In the first case, the combination of maximum bandwidth/minimum number of LPCs is used, in the second — minimum bandwidth/maximum number of LPCs.

The products under consideration have effects that further improve the quality of processed signals, such as, for example, pseudo-stereophonic listening, correction of the frequency response of the playback path.

Below are the technical characteristics of the DAF-P, model 3413. The characteristics of single-channel modifications correspond to those presented above, with the exception of the two-channel mode.

The devices have small dimensions, weight and power consumption. The design allows them to be used both for solving independent problems and built into specialized complexes.

Developing digital technologies are quickly penetrating the field of technical security equipment. They are successfully used, for example, in multi-channel digital sound recording systems or digital adaptive filters used to process and improve the intelligibility of speech signals.