New solution for protecting confidential negotiations.

New solution for protecting confidential negotiations.

Valery Ivanovich Zolotarev, PhD in Engineering

 

NEW SOLUTION FOR PROTECTING CONFIDENTIAL NEGOTIATIONS

In the information age, the well-known principle applies: whoever owns the information, owns the world. There are more than enough people who want to take over the world in this way, which means there is a steady demand for information obtained illegally. In such a situation, the headache of the owner of the information is its reliable protection. In other words, in the information field, there is an eternal struggle between the projectile and the armor, the attacking side and the defending side.

The attacker pays worthy attention to information, the carrier of which is a speech signal or speech information. In general, speech information is a set consisting of semantic information, personal, behavioral, etc. As a rule, the greatest interest is represented by semantic information, the loss of which can be indirectly estimated by the loss of speech intelligibility. In the future, by speech information we will understand only semantic information.

There are quite a lot of ways to protect speech information and technical means implementing these methods, and they are constantly being improved, because as scientific and technical thought develops, the attacking side has increasingly sophisticated technical means that allow not only to improve the quantitative characteristics of known technical channels of information leakage, but also to create new channels. The maximum number of technical channels of information leakage can be organized to intercept acoustic speech information, for example, during confidential meetings, negotiations. The most effective protection in this case

is the acoustic protection of the premises in which confidential negotiations are conducted and, at best, the acoustic protection of the acoustic speech signal. The problems that arise when organizing such protection and their technical solutions are the subject of this article.

 

PROTECTION METHODS. Terms, definitions, brief description.

 

Acoustic protection of the acoustic speech signal (speech) here means reliable masking of speech by an acoustic masking signal (noise) acting in the speech frequency band and having a “smooth” spectral characteristic.

Reliable masking is achieved when the resulting additive acoustic mixture of speech and noise, otherwise noisy speech, at any point of the controlled volume has a verbal intelligibility of no more than 20% (in practice, this corresponds to the perception of individual exclamations and individual «familiar words»). At the same time, the participants in the negotiations should be provided with the most comfortable acoustic conditions possible under the given circumstances. It should be noted that the concept of «comfort» as applied to the perception of speech information does not yet have an unambiguous interpretation among specialists, and even more so, a scale for assessing comfort has not been developed. We traditionally associate comfort with such a human reaction to unnatural conditions as fatigue, which can be theoretically assessed. But to date, due to the required large volume of research, correlations between the types and depth of speech signal distortions and the degree of fatigue have not yet been determined. Therefore, when assessing comfort, we will use exclusively qualitative intuitive assessments, while relying on common sense.

There are several types of acoustic speech protection equipment, which can be divided into two groups: acoustic room protection equipment and acoustic speech protection equipment.

The equipment related to the first group is the producer of a barrier acoustic interference along the enclosing structures and, as a rule, is used together with vibration protection equipment. In this case, relatively comfortable acoustic conditions are created for the personnel, but the entire volume of the room is not acoustically protected with all the ensuing consequences.

The second group of equipment includes acoustic noise generators, which are located near the place where negotiations are taking place and mask the speech of the negotiators with their noise. In this case, the negotiators are not protected from the effects of acoustic noise. Comfort and the level of masking in this case leave much to be desired. A higher level of masking and comfort is provided by equipment that, in addition to an acoustic noise generator, uses headsets of intercoms designed for operation in high noise. This type includes the TF-011D equipment, which uses telephone-microphone headsets, and the OKP-6 with telephone-laryngophone headsets. When using these devices, the hearing of the negotiators is protected from acoustic noise by the ear pads of the headphones, through which the speech of their partners is presented to the negotiators. The speech of the negotiators is perceived by a microphone located near the mouth of the speaker, or by a laryngophone from the throat of the speaker. The speech masking reliability is high, especially for OKP-6, but the need to use headsets may not always be convenient for users.

To maintain high speech masking reliability while getting rid of headsets and replacing them with headphones is the task set by the group of developers represented by the author of this article.

 

SOLVING THE PROBLEM

 

When solving the task, the developers faced the following problem. It is known that during a telephone conversation, the average speech level on the microphone membrane of the telephone handset is 97.5 dB in the frequency band of 100 — 10000 Hz. The headset microphone is located at approximately the same distance from the speaker's mouth as the telephone handset microphone and, accordingly, the speech level on the headset microphone will be approximately the same. When acoustically masking speech, in order to create, on the one hand, sufficiently reliable masking, and on the other — an acceptable level of comfort (satisfactory speech transmission quality, see Table 1), it is necessary to create a noise field with a level of 86 dB around the speakers. In this case, the speech/noise ratio on the headset microphone is plus 10 — 10. 12 dB.

 

Table 1.

 

Speech transmission quality W, % S, % A, % Bsh, dB Speech/noise,

dB

ideal 100-99
excellent 99-98
good 98-93
(95)
58 35 81 +16
satisfactory 93-87
(90)
47 25 86 +11
extremely
acceptable
87-77
(82)
33 18 92 +5
connection failure 77-60
(68)
18 12 97 +0
reliable camouflage 15 4 5 104 -17

 

Explanations to Table 1. The table contains the results obtained on the basis of data by Sapozhkov M.A. and Pokrovsky N.B. for a speech level of 97.5 dB and “white” noise. Here: W — verbal intelligibility (average values ​​for the presented range are given in brackets); S — syllabic intelligibility; A — formant intelligibility; Bш — “white” noise level.

 

At a distance of 1.2 — 1.5 m, the speech signal level decreases to 72 — 78 dB (measurements were made on a traditional test phrase in rooms of 600 cubic meters and 50 cubic meters). If the noise level remains at 86 dB throughout the entire volume of the protected room, within a radius of approximately 1.3 m from the speaker's mouth, the speech/noise ratio will already be on average — 10 dB and will deteriorate even more with distance. Based on the data in Table 1, we can conclude that at a distance of more than 1 m from the speaker's mouth, the speech quality is below the «communication failure» level, and at a distance of more than 2 m, reliable masking will be guaranteed. Here, some clarifications and clarifications should be made to what has been said.

1. Speech quality “communication breakdown” is characterized by complete incomprehensibility of the main text and, according to various sources, corresponds to W values ​​in the range from 77% to 60%, and in some publications the lower limit of the range is 50% of words.

2. The data given in Table 1 correspond to a speech level of 97.5 dB and it is not entirely correct to use these data for other levels, but for illustrative purposes this is quite acceptable.

From the above reasoning it is clear that moving the microphone away from the speaker's mouth, i.e. refusing to use a headset, leads to a decrease in speech intelligibility up to a breakdown of communication at distances of more than one meter from the speech source. In other words, placing the microphone far from the speaker's mouth simulates a situation with an eavesdropping microphone. And acoustic masking is aimed against this. For this purpose, white noise is used as a masking noise. The fact is that there are no algorithms or hardware and software implementations that allow for a real increase in the intelligibility of speech contaminated with white noise with a negative speech/noise ratio. The McCulley algorithm and other modifications of the spectral subtraction algorithm, aimed at combating white noise, allow for an increase in listening comfort, but not in speech intelligibility for positive speech/noise ratios. Thus, noisy speech intercepted by acoustic monitoring means cannot be noise-cleaned.

Under certain conditions, it is possible to compensate with a high degree of suppression any stationary noise, including “white” noise. Such compensation can be implemented using a digital two-channel adaptive filter (in the article “Adaptive Filtration Equipment”, “Confidential”, N1-2, 1999, the author considers this possibility). In relation to the problem under consideration, a two-channel adaptive filter (DAF) was used in the developed equipment for acoustic protection of confidential negotiations — Confidential Negotiations Digital System (CNDS)*

Photo 1. Equipment for acoustic protection of confidential negotiations — Confidential Negotiations Digital System (CNDS)

Note

 

* Certificate for a utility model with priority from 05.05.99.

 

OPERATING PRINCIPLE OF THE EQUIPMENT FOR PROTECTION OF CONFIDENTIAL NEGOTIATIONS (CNDS)

 

The basic principle is that the generated masking noise n is fed not only to the electroacoustic emitter, but also to the reference input of the DAF (the structural diagram of the CNDS is shown in Fig. 1). The second, main, input of the DAF receives a signal x from the output of the receiving microphone, which plays the role of the headset microphone. This signal is an additive mixture of the speech of the negotiating participants s and noise n1, which is noise n, but has undergone changes during conversion to an acoustic signal and due to the acoustics of the room where the negotiations are taking place.

Fig. 1. Structural diagram of the CNDS

If these changes are linear (the power amplifier and the emitter do not limit the noise signal), then n and n1 are correlated. The speech/noise ratio in this mixture is worse, the further the receiving microphone is from the speaker's mouth. By the way, note that this scheme uses only one microphone for all participants in the negotiations, i.e. we have abandoned headsets. Using the reference channel signal, in accordance with the adaptive algorithm, the noise component in the speech and noise mixture is compensated in the DAF, and the speech, thus purified, is presented to the participants in the negotiations via headphones. The convergence of the algorithm is carried out using the steepest descent method, and, to simplify the calculations, the stochastic approximation of the gradient according to Widrow-Hopf is used. To accelerate convergence, the minimum of the filter error module is used as an optimality criterion.

 

CNDS IMPLEMENTATION. Description of the algorithm, some operational features, test results.

 

The basis of the CNDS equipment is a specialized digital processor that implements the functions of a generator, a digital two-channel adaptive filter (DTAF) and control functions.

Compensation for the noise component in a speech-noise mixture is provided by the DTAF. The DTAF operation algorithm can be represented by the following expressions:

 

s(j) = x(j) — y(j) (1)

(2)

 

v(j+1,i) = v(j,i) + m sgn[s(j)] n(j-r-i), (3)

where:

x(j) — the next sample of the main signal;
n(j) — the next sample of the reference signal;
y(j) — the next sample of the noise estimate;
s(j) — the next sample of the output signal (error signal);
v(j,i) —  next value of the filter weighting coefficient;
m —  coefficient determining the adaptation speed;
p —  number of filter weighting coefficients;
r —  acoustic signal delay;
j —  discrete time value, j= 1,2,3,…;
i —  number of the filter weighting coefficient, i= 1,2,3,…,p;
sgn[.] —  signal sign [.].

The purified speech (filter error signal), in accordance with (1), is defined as the difference between the main channel signal and the predicted noise value (noise estimate), which is calculated as a convolution of the reference channel signal (noise) with the weight coefficients of the transversal filter in accordance with (2). The impulse response of this filter (or the vector of weight coefficients of dimension p) is updated at each discrete moment of time j in accordance with (3). Adaptation (automatic tuning) of the weight vector is carried out at a certain rate (adaptation rate) until the minimum of expression (1) is reached, i.e. until the masking noise in the signal arriving at the headphones is practically completely suppressed. Thus, during the time interval when adaptation occurs (adaptation time), the masking noise will decrease in level and the speech of the negotiating participants will be heard in the headphones, appearing against its background. In the future, if the acoustic environment in the room does not change, the values ​​of the weighting coefficients will stabilize and the tracking process will begin, which is characterized by the presence of a variable sign of the gradient (the second term in (3)) and its minimum absolute value. During this period of time, almost “clean” speech will be present in the headphones. If the acoustic environment changes, for example, the negotiating participants will allow themselves to make sharp gestures, the CDAF will again switch to the adaptation mode and noise will again be heard in the headphones. To reduce the influence of this effect, the adaptation rate, which is regulated by the coefficient m, is chosen to be maximum (from above it is limited by the condition of convergence of the algorithm). Of course, with intense gesticulations of all negotiating participants, maximizing the adaptation rate will not lead to the desired effect and will force the negotiating participants to cool their ardor. This is an unplanned useful limitation. The following can be attributed to the planned restrictions (protections) that exclude the possibility of situations that allow interception of speech information that is not properly masked.

    1. Protection against unauthorized reduction of acoustic noise level.

This protection is implemented as follows. The device itself with a built-in microphone is located at a distance of 1 — 1.5 meters from the speakers (in the middle between them). To the left and right of the speakers at a certain distance there are two speakers that emit masking noise of such a level that the speech/noise ratio at the microphone cutoff is about minus 15 — 19 dB. If for some reason the masking noise level decreases to values ​​​​when the speech/noise ratio improves to approximately minus 10 — 12 dB, the adaptation process will turn off and masking noise will appear in the headphones. For the participants in the negotiations, this will indicate the occurrence of an abnormal situation.

    1. Protection against exceeding the specified upper limit of the speech level.

When speaking in raised tones, a sound resembling a crack will be heard in the headsets of the negotiating participants, which will also indicate an abnormal situation.

  1. Protection against violation of the equipment placement topology.

When deploying the equipment or during its operation, the distance between the speakers or between the speakers and the main device can be set to less than the specified limit. In this case, the space in the rear of the speakers can be poorly masked. To prevent this, the parameter r (2) and (3) is introduced into the calculation process, determining the maximum value of the acoustic delay of the noise signal. If the acoustic delay (i.e. distance) in the passage of masking noise from the speaker to the microphone is less than the specified one, then there will be no adaptation and there will be only noise in the headphones.

The measurements and tests showed the following results.

1. The measured formant intelligibility of the noisy speech signal in the working frequency band (5 kHz) at the microphone cutoff is 3 — 5%. The following should be noted here. Since the masking noise is “white” noise and undergoes insignificant spectral changes when the acoustic field is formed by the speakers, it is entirely acceptable to measure the formant intelligibility and draw conclusions based on the results of these measurements.

2. The depth of suppression of the masking noise in the signal presented to the headphones is 26 — 30 dB with the number of weighting factors equal to 1300.

3. The speech of the negotiators, recorded on a dictaphone located in the breast pocket of one of the negotiators, is completely unintelligible.

In conclusion, it should be noted that since the speech signal presented to the negotiators via headphones is practically cleared of masking noise in the CNDS equipment, the comfort of the working conditions of the negotiators will be determined only by the degree of muffling of external noise by the ear pads of the headphones used.

Мы используем cookie-файлы для наилучшего представления нашего сайта. Продолжая использовать этот сайт, вы соглашаетесь с использованием cookie-файлов.
Принять