Problems and solutions to the problem of detecting modern voice recorders..

Problems and solutions to the problem of detecting modern voice recorders.

Problems and solutions to the problem of detecting modern voice recorders.

Gudkov Sergey Alexandrovich
Special equipment 2001, No. 3

The complexity of the task of detecting modern voice recorders is that, on the one hand, it is necessary to register very weak electromagnetic radiation of the working voice recorder. For this, a sensitive electromagnetic field meter is needed. On the other hand, it is necessary not to react to industrial interference and to radiation of other devices, which can be very strong. Moreover, the frequency range, nature and form of electromagnetic oscillations from the voice recorder and from interfering sources are the same.

Terminology:

Analog signal— an electrical signal whose amplitude at any given moment can take on many values in a certain range of levels, called the dynamic range.

Digital signal — an electrical signal whose amplitude at any given moment can take on two certain values, one of which is the logical level «1», and the other is the logical level «0». Hence, the digital form of recording a signal is a sequence of zeros and ones recorded in a memory chip, on a magnetic or optical medium.

Signal spectrum — a representation of an electrical signal taken over a certain period of time, which is represented by a set of amplitudes obtained at the output of a group of bandpass filters through which this signal is passed. Moreover, the bandpass filters have the same passband, these bands do not intersect and the distance between the central frequencies is equal to the passband. The number of filters determines the number of harmonics in the spectrum.

Interference, interference spectrum — in this article — these are all signals and their spectral components that are not the result of the voice recorder's operation.

From the user's point of view, a modern voice recorder detector must solve three problems:

provide an acceptable detection range for most voice recorders;
minimize the probability of missing a signal;
minimize the probability of false alarms.

In order to assess the scope of work to create such a detector, it is necessary to examine all groups of modern dictaphones for the electromagnetic radiation they generate, since it may be the only telltale sign for a recording dictaphone.

Dictaphones can be divided into two groups based on the electromagnetic radiation they generate: those with an electric motor in their design and those with memory chips for recording information.

The first group includes the following devices:

built on the classical principle of recording electrical signals on magnetic tape in analog form and implying the presence of a tape transport mechanism, plus not having an erasure and magnetization generator (EMG);
the same as item 1, but having an EMG.
built on the principle of recording electrical signals on magnetic tape in digital form on a DAT cassette and having a more complex tape-driving mechanism similar to the mechanism of a video recorder;
built on the principle of recording electrical signals on magnetic or optical disk media in digital form, for example on a minidisk developed by SONY (magnetic media), or on a laser rewritable disc (optical media). They also have an electric motor.

Further, this group of dictaphones will be called — «kinematic».

The nature of the electromagnetic radiation generated by this group of voice recorders is the same. The source of maximum radiation is the electric motor and the GSP (only for subgroup 2). The shape of the signal from the electric motor is pulsed with the main harmonic in the range from 80 to 300 Hz. Other harmonic components of this signal fall into this range with smaller amplitudes. Radiation from the GSP is close to sinusoidal and is in the range from 20 to 60 kHz.

Another group of voice recorders is based on the principle of recording electrical signals in a digital memory chip crystal. Non-volatile memory (flash memory) or, less commonly, dynamic or static memory, which requires a constantly connected power source, can be used. In the future, this group of voice recorders will be called «digital» [4].

Structurally, «digital» voice recorders can be made in two versions:

the voice recorder function is the main one;
the voice recorder function is additional.

The second subgroup includes devices:

some models of cell phones;
most «pocket» minicomputers, such as PocketPC;
MP3 players with recording capability.

It should be noted that theoretically the term «digital» voice recorder is defined as a device that records speech information on a carrier in digital form. Moreover, the carrier can be a disk or tape. Such devices have a kinematic mechanism and in this article are referred to as «kinematic» voice recorders.

What is the source of radiation in «digital» voice recorders? According to the nature of the radiation, «digital» voice recorders can be divided into subgroups:

having a pulse voltage converter, for example, if a single 1.5 volt battery is used as a power source;
having a removable flash memory design;
carrying out compression of speech information by means of a specialized signal processor;
having a liquid crystal display;
having various connected accessories, such as a remote microphone, remote control, etc.;
having a housing capable of shielding the radiation of the voice recorder.

Research has shown that the maximum radiation level of «digital» voice recorders for all subgroups, as a rule, lies in the range from 20 to 120 kHz. For voice recorders with a pulse voltage converter, the strongest level is observed at the conversion frequency. Such voice recorders can be detected at a maximum range of more than a meter.

In dictaphones with removable flash memory, there is inevitably a cable of several dozen conductors, several centimeters long. It transmits address and data signals for recording in memory. These signals are digital, which means they have steep fronts and an amplitude equal to the supply voltage (usually 3 volts). Such a number of long conductors with such signals produces noise-like bursts in some frequency ranges. If a signal processor is used, which is typical for equipment from Western manufacturers, the spectral bursts are amplified, since such a processor consumes more than 50% of the energy required for the dictaphone to operate. Dictaphones of these two subgroups can be detected at a distance of 50 cm to 1 meter.

In voice recorders with a liquid crystal display, the latter is also a source of electromagnetic field formation. Moreover, its energy increases with the size of the display, as well as if it is graphic, and especially color. The presence of such displays is more typical for devices in which the voice recorder function is additional — cell phones, minicomputers, etc. The detection range of such devices can exceed 1 meter.

For voice recorders with a connected external microphone or remote control, the connecting cable is an additional relatively powerful source of radiation.

For voice recorders in metal cases, the detection range drops sharply, since the radiation is shielded by the case and, depending on the quality of the shielding, ranges from several units to 30 cm. However, there is a possibility of the formation of low-frequency subharmonics, against the radiation of which such shielding is ineffective. In any case, voice recorders in metal cases belong to the class of special equipment and are specially developed to minimize radiation.

From the point of view of electrical engineering, a dictaphone consists of a set of closed electric circuits, some of which have significant inductance, which leads to the formation of electromagnetic radiation with a certain directional pattern and intensity around the working dictaphone. It follows that any dictaphone can be detected by some electronic device at a certain distance.

Let's consider the problem of measuring the level of the magnetic component of the electromagnetic field created by a dictaphone. To do this, let's assume that there are no other sources of the field. The simplest solution to this problem is presented in the form of a structural diagram in Fig. 1 .

The magnetic antenna (MA) has an amplitude-frequency characteristic that identifies the required frequency range. The signal amplifier (SA) from the antenna must have minimal intrinsic noise, which determines the sensitivity of the entire system and, consequently, the detection range. Theoretically, the threshold detector (TD) response level can be set at the maximum intrinsic noise of the amplifier. Accordingly, exceeding this level will indicate the presence of a field source on the indicator device (ID). The possible distance to the voice recorder for such a detector is determined by its own noise and ranges from tens of centimeters to 2 meters, depending on the type of voice recorder. In real conditions, at some point in space, there is always a certain integral level of electromagnetic radiation created by many other, close and distant sources. This level can significantly exceed the intrinsic noise of the detection device. Moreover, some sources (for example, alternating current in a 220 V network) create a very high field level and actually block the possibility of measuring other fields. These conditions necessitate using not one coil as a magnetic antenna (MA), but two coils spaced at some distance and connected differentially. Such a magnetic antenna becomes a gradiometer [2]. In this case, a significant weakening of the influence of a remote source is achieved, especially with an increase in the distance between the coils. Unfortunately, the signal level from a nearby source (dictaphone) also drops. But this is the price to pay for the very possibility of measuring the field of a nearby source. Considering the effect of «parasitic» electromagnetic fields, to register the radiation of a dictaphone, a signal level measurement unit (SLMU) is required, which will set the threshold detector (TD) level to the measured value upon receipt of a command from the control device (CD). This is controlled by the operator performing the detection. It is evident that registration of dictaphone radiation in such a device is possible only if this radiation is greater than the background level in a given place. Accordingly, the actual detection range now strongly depends on the background level and can drop several times. This is a physical limitation for broadband detectors. The channel for detecting audio and video recording equipment in the ST 041 device [7], manufactured by the company «Smersh Technics» (St. Petersburg), is built on this principle.

Let us consider ways of increasing the efficiency of this device. To do this, it is necessary to solve at least two problems: to reduce the device's own noise and to try to distinguish electromagnetic field sources by frequency. The own noise of the device considered above was determined by the noise characteristics of the amplifier microcircuit and the width of the measurement frequency range. It follows that decreasing the frequency band will lead to a decrease in the detector's own noise. This problem is solved by using a group of bandpass filters covering the frequency range of interest. Increasing the number of filters improves the signal-to-noise ratio. In addition, these same filters also solve the second problem — they allow the signal to be localized by frequency. As a result, the device acquires the ability to «see» very weak sources of electromagnetic radiation against the background of very strong ones, which is absolutely impossible for a broadband detector. Another device for detecting dictaphones, ST 0110 [6], manufactured by Smersh Technics, is built on this basis. The device is based on algorithmic models [1,2] previously used in the PTRD-018 device.

The structural diagram of the ST 0110 device is shown in Fig. 2 .

There is another problem related to the real strong non-stationarity of the electromagnetic field created by «parasitic» remote sources. These sources can appear and disappear, and also change the intensity of their radiation in a completely chaotic manner. For the detector, this will be similar to the situation of turning on and off a voice recorder. This contradiction can be resolved by using two synchronously operating calculation paths. In this case, the results of these calculations are mutually subtracted. In the ideal case, this will lead to complete suppression of the influence of remote field sources. The device will respond only to the appearance and disappearance of a source in the near zone, but not equidistant from the magnetic antennas (Fig. 3). The radius of the near zone increases with the distance («base») between the magnetic antennas (MA 1, MA 2). This distance should be commensurate with the distance to the detected object.

The problem of non-stationarity of remote sources is hampered by the non-absolute correspondence of the amplitude-frequency characteristics (AFC) of two synchronously operating paths and the non-absolutely precise coincidence of the angle between the source field vector and the orientation of the magnetic antennas, as well as interference and reflection of electromagnetic waves. Such discrepancies lead to the need to reduce the sensitivity of the device to some extent to reduce the probability of false alarms.

Let's return to Fig. 2. In order to suppress the influence of remote sources (monitors, TVs, office equipment and other equipment) as much as possible, parallel and synchronously operating paths for the first and second channels are introduced into the device. These channels operate independently of each other until the stage of subtraction of averaged spectra. Mutual subtraction and subsequent processing allow us to determine which of the antennas the source is located closer to, i.e. not to lose the ability for spatial localization.

The signal processing in each channel is performed as follows. Two independent magnetic antennas (MA LF and MA HF) convert the magnetic component of the electromagnetic field into an electric signal, which is fed to the signal amplifier (SA). The bandwidth of the low-frequency magnetic antenna and amplifier (MA LF and SA) is 50-400 Hz, which is sufficient to detect «kinematic» voice recorders. A gradiometer is used as a magnetic antenna for this frequency band. The bandwidth of the high-frequency magnetic antenna and amplifier (MA HF and SA) is 20-120 kHz, which is aimed at detecting «digital» voice recorders. Then the amplified signals are fed to the analog-to-digital converter (ADC), converted to digital form and all further operations are performed by the computer.

The studies have shown that the resulting dynamic range of the detector should be more than 120 dB. This is mainly due to the existence of radiation from the 220 volt network. The dynamic range of signals from dictaphones does not exceed 70 dB. From below, the range is limited by the detector's own noise. Hence, it is necessary to try to minimize the influence of 220 volt network harmonics at the initial stage of processing, thus reducing the dynamic range of calculations to 70 dB and simplifying the implementation of the device. For this purpose, a comb rejection filter (CRF) is implemented in software and hardware, which provides signal suppression at frequencies multiple of the fundamental harmonic of the supply network voltage. The suppression level is up to 60 dB. Due to the instability of the network frequency, a tracking frequency meter (FM) is also required, measuring the network frequency of 50 Hz, with the accuracy necessary for tuning to reject the fundamental harmonic. As a result, it became possible to use an ADC with a dynamic range of 70 dB and perform further calculations within the framework of sixteen-bit calculations. The maximum signal amplitude is monitored by the signal dynamic range control unit (SDRCU), which issues an overload sign and prohibits further calculations. If the digitized signal does not go beyond the dynamic range of the ADC, a fast Fourier transform (FFT) is performed, which results in the decomposition of the signal into harmonic components, which is equivalent to using a group of bandpass filters. The number of such filters was selected based on a study of the time instability and frequency band of signals for a number of typical dictaphones. The optimal number was 256 harmonics for the low-frequency and high-frequency bands. A disadvantage of the FFT is the so-called Gibbs effect, which is expressed in the broadening of the spectral peaks in the lower part of the amplitude range and the appearance of many side lobes [3]. To reduce side lobes, before calculating the FFT, a Kaiser-Bessel weighting function (window) is applied to the signal, which suppresses side lobes most strongly and allows resolving closely spaced large and small signals by frequency. The price for this is some broadening of the spectral peaks in the upper part of the amplitude range, which can be easily compensated by increasing the Fourier transform points.

After calculating the FFT, the signal spectrum enters the averaging unit (BU) to suppress noise components in the spectrum and isolate stable spectral components. Exponential averaging over the ensemble is used [3]. The averaging coefficient is selected experimentally, based on the instability level of the spectral components of the test recorder signals. The averaging process is controlled by the spectral energy control unit (SECU), which blocks further calculations when integral spectral bursts for the instantaneous (not averaged) spectrum are above a specified threshold. This prevents the device from responding to impulse noise, vibrations and other short-term disturbances of the electromagnetic field. Then, the averaged signal spectra for the first and second channels are mutually subtracted by the module in the MOD(2-1) and MOD(1-2) blocks, resulting in the removal of harmonics that are identical in frequency and amplitude. The remaining spectrum harmonics are fed to the spectrum comparison unit (SCU), where each harmonic is compared with the harmonics of the spectrum coming from the interference spectrum generator (ISG). In addition to the harmonic difference modulus, the formation of the comparison result is affected by the behavior of neighboring spectrum harmonics. The ISG unit operates at certain moments on commands from the control unit (CU), for example, during adaptation to the surrounding electromagnetic environment. The spectrum comparison result is fed to the threshold detector (DD), the response threshold of which determines the sensitivity of the entire system. At the final stage of analysis, the threshold detection result is subject to selection in time, i.e. only those events whose duration exceeded a specified time interval are selected. This occurs in the time selection unit (TSU), which allows ignoring relatively short signals, which in this case are equated to false ones. The time selection interval is selected within the range from 30 seconds to two minutes. From the output of the time selection unit (TSU), the voice recorder detection indicator is fed to the indication device (ID). The period of a single analysis of a pair of channels is mainly determined by the time of forming a sample of readings for the low-frequency path (kinematic voice recorders) and is approximately one second. If only digital voice recorders are detected, the channel polling rate increases fourfold.

The ST 0110 device works in a set with a PocketPC minicomputer or with any IBM-compatible desktop computer, including a laptop. The maximum number of channels (detection zones) is 16, for a desktop computer it can be expanded to 32 or more.

To illustrate the differences in the radiation of different voice recorders, below are the spectral characteristics obtained using the ST 0110 device, at the stage of completing the signal processing cycle — after the threshold detector.

In Fig. 4 and 5The frequency characteristics of typical background radiation in office conditions are given. In the low-frequency part (up to 300 Hz), harmonics multiple of 50 Hz are observed, as well as spectral components from the frame scan of monitors. It should be emphasized that the influence of these sources of «parasitic» radiation is weakened by using a gradiometer and a rejection filter by more than 60 dB, and the graph reflects only what could not be suppressed. In the high-frequency part (from 10 to 110 kHz), harmonics from the line scan of monitors, a TV and a laptop are visible. Moreover, the listed equipment is located at a distance of 3 to 10 meters from the magnetic antenna.

Fig. 6 shows the frequency characteristics of the radiation of the Olympus V-90 digital voice recorder at a distance of 40 and 100 cm from the magnetic antenna. In this device, the radiation is created by a voltage converter, the frequency of which slowly changes as the battery discharges.

Fig. 7The frequency characteristics of the radiation of the Olympus D1000 digital voice recorder at a distance of 25 and 50 cm from the magnetic antenna are given. The radiation spectrum of this voice recorder is unstable, and its main part is in the band from 30 to 50 kHz.

In Fig. 8 and 9The frequency characteristics of the radiation of the «kinematic» voice recorder with GPS — Sony M-909 are given. The distance to the magnetic antenna is 25 and 70 cm. The signal from the GPS here (40 kHz) is more powerful than the signal from the electric motor (108 Hz).

In Fig. 10 and 11The frequency characteristics of the radiation of the «kinematic» voice recorder Olympus S724 at a distance of 30 and 90 cm from the magnetic antenna for different tape speeds are given. For this device, only spectral components caused by the rotation of the electric motor are observed.

What ways of further increasing the efficiency of the voice recorder detector can be identified? On the one hand, this is an improvement in the noise characteristics of the analog path by using modern and specialized signal amplification microcircuits. On the other hand, the translation of the analysis of the spectral characteristics of the electromagnetic field into the plane of neural networks. Mathematical support for neural networks is the development of the theory of adaptive filtering. Detection of voice recorders is a poorly algorithmized task. To solve such problems, either the constant work of qualified experts or adaptive automation systems are necessary, which are neural networks. Neural networks are capable of generating a nonlinear model of the process based on the results of adaptive network training.

Also, the detection range can be increased by converting the analyzed frequency range to a higher frequency region up to 300 MHz [5] and, additionally, registering the electric component of the emitted electromagnetic field. But the implementation of all these methods will lead to a significant increase in the cost of the device.

Literature.

A.A. Zharov, M.B. Stolbov, S.A. Gudkov, V.M. Danilov «Device for detecting signals» //Patent for invention No. 2140656. Registered in the State Register of Inventions on 10/27/99.
A.A. Zharov, M.B. Stolbov «It's hard to look for a voice recorder in a dark pocket» especially if you don't have PTRD 018» //Confidential, 1997, No. 1, pp. 53-58.
A.P. Kulaichev, «Computer Process Control and Signal Analysis». Moscow: NPO «Informatics and Computers», 1999, pp. 7-127.
V.S. Ukov, «Special Equipment with Solid-State Memory: Versatility, Quality, Reliability» //Special Equipment, 2000, No. 4, pp. 21-28.
Comparative Analysis of Digital Voice Recorders.» http://ess.ru/dbtexts/analmat/dmanal/dmanal.htm