Digital processing of images of dynamic spectrograms of audio signals in the problems of voice communication security..
Dvoryankin Sergey Vladimirovich,
Candidate of Technical Sciences
DIGITAL PROCESSING OF IMAGES OF DYNAMIC SPECTROGRAMS OF AUDIO SIGNALS IN VOICE COMMUNICATION SECURITY PROBLEMS
Introduction
According to estimates by domestic and foreign experts, a significant portion of information transmitted via public telecommunication channels is voice messages. This state of affairs will definitely remain in the future, since it is difficult to find any equivalent replacement in many communication and information transfer systems for such a universal tool of human communication as speech, which has unique features of the presence effect, emotional coloring, authentication, information redundancy and others, inherent only to this communication (negotiation) process. That is why the tasks of protecting speech information occupy one of the leading places in solving the general problem of information security.
At present, there is an urgent need to create new special software and hardware technical means and speech information protection systems based on standard computing devices, in which significant savings in time and material resources spent today on the development of traditional special equipment can be achieved. In addition, the service life of this type of new equipment can be increased by updating, first of all, both software and hardware components. The lag today is observed only in general methods of digital processing of audio signals, as applied to solving various problems of ensuring the security of speech communications. And here, as nowhere else, new computer technologies for obtaining descriptions and processing a speech signal (SS) are needed.
Frequency-time descriptions of audio signals and speech
As with most other studies on this topic, to facilitate understanding of the subsequent calculations, we can introduce the definition of a phonoobject, which here and below will be understood as a real object generating and emitting signals in the audio frequency range, which, when converted into digital form, can be recorded and stored in the computer memory as separate files for the purpose of subsequent processing and/or transmission. We will also note that the category of a phonoobject can include not only human speech, but also sounds of a different nature, including various types of noise and interference that interfere with the correct and high-quality auditory perception of the speech signal, impair or distort its understanding. By traces of a phonoobject we will understand such a parametric description that allows either to completely recreate its sound, or to restore and voice a «new» audio signal based on the changed and specified properties in this parametric description.
It should also be noted that a complex phono-object, which is understood as a simultaneous set of some simple sounds, can be represented as a sum of the phono-objects that make it up. Thus, a voiced section of speech with quasi-harmonic interference can be represented as a superposition of interference and a speech signal, which in turn can be considered as a set of individual overtones that are also part of the sound fragment being studied. In such an example, all the given sound components are conveniently considered as a set of narrow-band signals, bearing in mind that all the spectral components of each elementary sound are grouped in a band that is relatively narrow compared to a certain central frequency. However, sometimes the complex phono-object itself can also be conveniently considered as a narrow-band process.
From the analysis of numerous publications, it can be concluded that the main concepts that have to be used when discussing most issues of ensuring the security of speech communications using computer technologies are the concepts of speech intelligibility-unintelligibility and the closely related concepts of detection, restoration and reconstruction of the parameters of narrow-band signals (traces of phono-objects), the totality of which constitutes the original, studied audio or speech signal (phono-object). By modifying, changing or removing these parameters, it is possible to achieve a solution to a specific task. Therefore, the development and improvement of computer technologies for the security of speech communications will, first of all, depend on the adopted quantitative measures for assessing narrow-band signals that make up audio signals and speech transmitted — received in public communication channels and/or stored on various material carriers.
Based on the above, it is clear that in order to understand the processes of audio transformations, through digital processing of images of dynamic spectrograms, it is desirable to choose a model of analytical representation of the sound signal, which would be convenient to work with in the future. As such a model, one can use an analytical description of the sound signal in the form of a sum of narrow-band signals according to Hilbert.
The results of recent studies have shown that the data required to calculate the parameters (amplitudes and phases) of traces of phonoobjects can be contained in dynamic spectral scans of a speech signal — amplitude-phase, frequency-time descriptions of instantaneous speech spectra with a given observation (analysis) step in time and frequency — and, above all, in images of narrow-band amplitude sonograms. Such scans, often called matrices of dynamic spectral states (MSS), can be obtained in the course of dynamic spectral analysis-speech synthesis (DSAS), sliding over the original signal with a selected analysis window with a transition from the samples weighted by it to their frequency image based on the adopted orthogonal basis. An example of this kind of technology is short-term Fourier analysis-synthesis of sound signals, often used in digital speech conversion systems.
Traces of background objects of various nature in the form of amplitude and phase parameters of narrow-band signals of their components, as will be shown below, appear on images of dynamic spectrograms in the form of a set of contours (lines) of brightness differences or tracks (chains) of local and global extremes of color saturation in levels of one color. Here and below, we will understand image processing as the execution of various operations on data that are fundamentally two-dimensional in nature and do not always take non-negative values.
At present, there are a large number of good software digital analyzers and editors of audio signals, designed for visual analysis of audio signals in the time (oscillograms, signal power level graphs, etc.) and, of course, frequency (sonograms, cepstra, etc.) domains. Among imported software products of this kind, Cool Edit Pro 1.2, Dart Pro, Sound Forge, Wave Lab, Wave Studio, etc. should be noted, among domestic ones — SIS 5.2, Win-Audio, Lazur, Signal Quick Viewer 2 (SQV2), Signal Viewer (SV), etc. A number of sound editors have the ability to perform some types of audio signal processing, which can also be used to solve a limited number of PC security problems using computer technologies. These tasks include, first of all, filtering of the RS and removal of «simple harmonic, impulse and noise interference in the speech message received from the communication channel. Such simple types of processing in most cases are performed mainly in the time domain with possible evaluation of the obtained processing results in the frequency domain, based on the analysis of sonograms. But only in a number of professional software products, specially designed for solving the most serious problems of RS protection, is it possible to perform complex types of processing, including in the frequency domain, based on the performed analysis of images of dynamic sonograms. Thus, in the new version of one such software product — «Lazur», promoted on the market of special equipment by OJSC «Novo», a direct possibility of selecting an interesting section of the spectrogram image of the studied background object is implemented with the application to it of either its own embedded methods of digital image processing, or a powerful arsenal of tools provided by well-known graphic editors such as Adobe Photoshop after transporting the image selected in «Lazur» to them. image area with the possibility of subsequent reverse insertion and synthesis of the modified graphic image in this way. All sonogram (spectrogram) figures presented in this work were created or constructed using this software product. Moreover, in the sonogram (spectrogram) images, the time parameter is plotted along the abscissa axis, and the frequency parameter — along the ordinate axis, starting from the lower left corner of the image. The maximum power of the signal under study in the node of the frequency-time grid is indicated in black, the minimum — in white, and intermediate values - in gray levels.
The main approach to the analysis and processing of audio signals in the tasks of protecting speech messages
In this paper, the author proposes a new approach to constructing special software and hardware-software for audio and speech conversion based on standard computing technology, combining the idea of converting an audio signal into graphic images (images of spectrograms and phasegrams) and back from an image into an audio signal or speech, without loss of information content or intelligibility, with the capabilities of known and promising methods and software products for digital image processing. It is shown that the main core of this approach is the development and application of methods for analyzing, restoring, reconstructing and synthesizing traces of narrow-band signals (phono objects) that make up the original sound and are present on the frequency-time grid of these images. We will give only a small fraction of examples of the practical implementation of the proposed approach to the analysis-synthesis and processing of audio signals by restoring and reconstructing traces of narrow-band audio signals that make up phono objects on the presented images of their spectrograms.
Information analysis of traces of background objects
Very often additional information, and sometimes even key data on the investigated background object, can be obtained by conducting an information analysis of its traces, or traces of background objects included in the given audio signal, using appropriately calculated spectrogram images.
Fig. 1. Traces of background objects of various natures, shown on sonograms
At the top — traces of a conversation between a man and a woman;
In the center — traces of acoustic barrier equipment and speech;
Below is an example of computer editing of speech by a given speaker's voice.
Thus, in the upper panel of Fig. 1, on the narrow-band spectrogram of the audio signal received from the telephone channel, a fragment of a conversation between two subscribers is presented, one of whom is a man and the other is a woman. Traces of three types of phonoobjects: overtones of the speech of the male and female voices in the vocalized non-pause sections of the sonogram and noise inclusions in the pauses — are clearly visible in the presented image of the upper panel of Fig. 1. The trajectories (contours) of maximum contrast or chains (tracks) of local maxima of gray levels are the very traces of the narrow-band components of phonoobjects that we are studying. They are especially noticeable in the highlighted central section of the sonogram image of Fig. 1 in the form of light lines passing through the center of gray and black stripes of the same color. Note that the sonogram in this figure is very similar to those narrow-band sonograms, the so-called «visible speech» prints, which were previously widely used to analyze speech signals, and, above all, to identify the speaker's voice. Similar sonograms with «visible speech» are used for these purposes even now, but only thanks to the described approach to speech processing has it become possible to reconstruct the audio signal directly from the identified traces of phonoobjects present in the images of these sonograms using specially calculated and constructed sonograms that are indeed very similar to «visible speech» images.
The traces of the male voice in the left and right parts of the spectrogram on the upper panel are noticeably different from the female ones (the middle part of the spectrogram) in that they have a lower fundamental frequency in the vocalized sections than the female ones, i.e. the harmonic lines of the fundamental tone of the male voice in these sections are more closely spaced, while the female overtones are much further apart in frequency. By analyzing this spectrogram, it is possible to determine the time boundaries of the phrases spoken by each of the interlocutors, and to more successfully conduct subsequent identification of the person by voice. Some signs in this spectrogram (traces of the disconnected phone call in the right part) indicate a certain acoustic environment around one of the subscribers, from which it can be concluded whether he is speaking from home or from a street payphone.
In the middle panel of Fig. 1, on the spectrogram of the sound signal received from the acoustic channel (air environment), traces of the sound of special acoustic barrier equipment are visible in the form of alternating columns of horizontal lines similar to speech overtones (speech-like interference), with columns of traces of powerful noise. Such a rapid alternation of sections of various types of interference is intended to complicate the operation of the adaptive filtering equipment of the RS, if it were used to clean the received speech message from noise and interference and restore its intelligibility. In the center of the spectrogram, traces of the suppressed speech signal are visible, the leakage of which through the acoustic and vibroacoustic technical channels was intended to prevent this equipment for protecting speech information in confidential negotiations rooms. Having analyzed this spectrogram, it is possible to draw certain conclusions about the degree of compliance of the actual efficiency of suppression of speech signal leakage channels provided by this acoustic barrier equipment with that declared in its technical documentation and to take appropriate measures either to increase, if necessary, the degree of efficiency of suppression of confidential RS, or to the possibility of restoring such a distorted speech message by the technical specialists of the intruder”, who carry out NSD to confidential speech information.
The lower panel of Fig. 1 shows a sonogram of artificial speech obtained by computer editing, by gluing together graphic images of individual phonemes, sounds, from a previously accumulated dictionary of standard phrases of the “parodied speaker”. Despite the fact that the gluing places are retouched quite well and thus the speech of the given speaker synthesized according to new, modified graphic images sounds quite good, nevertheless, visually, traces of editing on this sonogram are still clearly noticeable. Especially at the ends of each individual new phrase and in the places of gluing sections with different amounts of overtones.
Compression of speech messages
The tasks of compression of the sonogram can also be solved by processing the sonogram images. The processing scheme is as follows: first, the sonogram is converted into its graphic image, the sonogram, within the boundaries of the selected analysis window during the DSAS; then this sonogram image is compressed using one of the image compression methods, and the compression coefficients are transmitted to the communication channel; based on the obtained compression coefficients, the image of the original sonogram is reconstructed at the receiving end of the communication channel, and then a new sonogram is synthesized based on this image. The advantage of this method of speech encoding is that only one initial description of the sonogram is used — a sonogram with traces of phonoobjects, based on which it is possible to obtain virtually any necessary speech encoding rate determined by the bandwidth of the communication channel at a given time, while maintaining the highest possible intelligibility and sound quality of the reconstructed speech. The results of a number of recent studies have shown that by applying fractal or special Wavelet-based compression methods to sonogram images, it is possible to achieve a minimum encoding rate of 800 bits/s while maintaining verbal intelligibility of about 80%.
Fig. 2. Examples of speech compression
At the top is a sonogram of the original, studied fragment of speech;
In the center is a sonogram of the speech signal, restored after compression at a rate of 1000 bits/s, using one of the algorithms for compressing images of the original sonogram;
At the bottom is a sonogram of the speech signal, restored after compression to 800 bits/s with the exclusion of information about the fundamental tone.
The sonogram of the original speech section, the image of which will be used for compression by digital image processing methods and in other speech processing methods presented in the work, is shown in the upper panel of Fig. 2. Above the sonogram, a rough oscillogram of the entire studied RS is drawn with the location of the selected fragment indicated on it.
The sonogram of the same speech section, restored after compression by the proposed method to a rate of 1000 bit/s, and the sonogram of the same section of the RS, restored after compression to 800 bit/s by removing information about the melody of the fundamental tone using digital image processing methods are shown, respectively, in the middle and bottom panels of Fig. 2.
It can be seen that the sonogram of speech restored after compression at a rate of 1000 bit/s is more similar to the sonogram of the original RS than the sonogram of the signal restored after compression of the image obtained by equalizing the fundamental tone. That is why the first restored RS sounds better and more natural than the second, with their equally high intelligibility.
Improving the comfort of speech signal perception
Very often the perception of speech messages received from communication channels leaves much to be desired only because the frequency band of the RS is shifted from its true position. It is possible to place the RS spectrum in the original frequency boundaries by scrolling the sonogram image of the received speech by the value of the required shift with subsequent synthesis of a new RS based on the changed values of the MDSS.
In some applications, it is very important to listen to the recorded speech at a fast or slow speed without changing the timbre of the speech. This can be achieved by performing the necessary temporal scaling of the sonogram of the original speech, either by stretching it in time or compressing it, but without going beyond the frequency band of the original speech. Having synthesized the resulting modified sonograms, we obtain the speech of the same speaker, but reproduced either at a fast or slow speed, while preserving all the features inherent in this speaker.
Fig. 3. Changing the tempo and timbre of speech
Above — sonogram of speech synthesized with a tempo acceleration of 30% in relation to the signal in the upper panel of Fig. 2;
In the center is a sonogram of speech synthesized with a tempo slowed down by 30% in relation to the signal in the upper panel of Fig. 2;
At the bottom is a change in the timbre of speech by reducing the frequency scale of the original sonogram in the upper panel of Fig. 2 by 30%.
In Fig. 3, the top and middle panels show, respectively: a sonogram of the RS synthesized in the original frequency band, but with time compression — at a tempo accelerated by 30% relative to the original speech in the top panel of Fig. 2; a sonogram of the RS synthesized in the original frequency band, but with time stretching — at a tempo slowed by 30% relative to the original speech in the top panel of Fig. 2. The bottom panel of Fig. 3 shows a sonogram of the speech signal with a modified timbre, synthesized without taking into account the phase components in the «Lazur» PP after changing the vertical frequency scale in Adobe Photoshop, i.e. with frequency compression of the sonogram of the original RS by approximately 30%, but with preservation of the normal speech tempo during playback
The speech messages, the sonograms of which are presented in the upper and middle panels of Fig. 3, sound as natural and intelligible as if they were spoken by the same person, but at a faster or slower tempo. Their sonograms are very reminiscent of the sonogram of a fragment of the original speech (upper panel of Fig. 2), but reproduced in the appropriate scale. When compressing the image of the speech spectrum by frequency (lower panel of Fig. 3) or when stretching it, we arrive at the same effects of the sound of the new signal, which are usually observed when voicing the original speech at changed sampling frequencies.
Cleaning speech signals from noise and interference
This is the most common group of tasks in the field of PC security. The tasks of speech spectrum correction, noise and interference removal arise both in cases of speech signal transmission via a poor-quality communication channel, and in cases of deliberate interference. Currently, there are a large number of various technical means on the market, including hardware and software and purely software, designed for various options for cleaning PCs, with the help of which a number of tasks in this group are more or less successfully solved. However, when using computer technologies based on the proposed approach to speech processing through processing images of its sonograms in the DSAS mode, it is possible to achieve the most effective results in terms of time and financial resources for cleaning speech in the most difficult cases of interference. This is achieved due to the fact that it is possible to achieve a certain flexibility and versatility in eliminating various kinds of factors interfering with intelligible and high-quality auditory perception through identifying, stratifying and eliminating traces of such phonoobjects in images of graphic images — spectrograms or sonograms of the entire original signal.
Fig 4. Removing traces of strong interference from the speech signal
Above — a sonogram of speech with interference significantly exceeding the speech level;
In the center is a sonogram of the purified speech section after processing using the quadratic dependence of the nonlinear component of the full phase in a simplified model of the audio signal;
At the bottom is a sonogram of the purified speech section after processing using a refined model of speech as a set of narrow-band signals according to Hilbert.
The results of cleaning when implementing this approach can be controlled not only by ear, but also visually through the analysis and modification of sonogram images of the original and restored speech signal at each successive stage of processing. Thus, it is possible to implement such digital processing algorithms that could not be implemented before. Moreover, the entire process of cleaning the RS from interference can be reduced to an easily understandable process for the user of eliminating or erasing traces of interference on the sonogram image with subsequent retouching of the remaining traces of the RS, similar to editing images in common graphic editors.
The most effective application of digital sonogram image processing methods is when it is necessary to eliminate stationary or slowly changing quasi-harmonic interference present in the speech signal. This is the example of removing traces of strong interference from the RS shown in Fig. 4.
The upper panel shows a sonogram, already known from various publications, with the original RS with interference superimposed on it, exceeding speech by almost 25 dB. Traces of interference are clearly visible on the spectrogram image in the form of wavy thick black lines. When reproducing such a signal, only one interference will be heard, and the speech message, kindly provided to the author in the form of a file of digitized data by his currently living colleague and classmate Yu. Romashkin for similar experiments on his voice, will not be heard at all. Since speech masked by such a powerful interference is absolutely inaudible, and, accordingly, unintelligible, such a RS distorted by interference can also be considered as a speech message subjected to technical closure due to the introduction of interference, preventing its correct auditory perception.
The middle and lower panels of Fig. 4 show a sonogram of the RS reconstructed during the initial detection and stratification of interference traces on the same previously selected section with subsequent synthesis and subtraction of the synthesized interference from the original distorted speech. After such synthesis, the overtones of speech, which it had previously concealed, are clearly visible on the reconstructed sections of the sonogram in places where there were former interference traces. Yu. Romashkin's speech, reconstructed in this way on the selected sections, is audible, understandable and intelligible.
Note, however, that the reconstruction variant presented in the middle panel of Fig. 4 used the quadratic dependence of the nonlinear component of the phase on time most frequently used by researchers for synthesizing interference that is part of a complex original phono-object described by a simplified model. Therefore, in the restored sections, traces of interference, although greatly weakened, are still weakly visible and it is barely audible against the background of intelligible speech.
If we more correctly calculate the phase and amplitude components of the narrow-band components of the interference included in the original signal, as well as the function of the weighting window during the DSAS operations, then we can achieve even better results in restoring speech distorted by interference. This version of synthesizing interference based on its traces with its subsequent subtraction from the original signal is shown in the lower panel of Fig. 4, where traces of interference interfering with auditory perception are practically absent in the same selected areas. Naturally, speech restored in this way will sound even more natural.
Once again, we note that using the capabilities of powerful graphic editors such as Adobe Photoshop for modifying images of selected sections of sonograms of distorted speech signals with subsequent synthesis, it is possible to achieve even more impressive results in their noise cleaning and restoration of intelligibility. In this regard, approaches to single- and dual-channel asynchronous cleaning of the RS in the presence or even absence of a reference signal are especially interesting. Also encouraging are the experimentally verified results of studies on the analytical continuation of the harmonic structure of the speech signal and the restoration of upper formants in the affected sections of images, i.e. in those places of the frequency-time grid where these parameters were weakened, distorted or absent either due to poor conditions for receiving the acoustic signal or due to a malfunction and/or incorrect choice of characteristics of the sound recording equipment used. This is all the more interesting since the theoretical foundations for restoring the matrix of distorted images due to the conditions of their limited size and the non-negativity of the values included in them have already been developed by mathematicians.
Computer steganophony, speech signature and technical speech closure
Fig. 5 shows examples of the use of the proposed approach in computer steganophony tasks, which are an integral part of the new rapidly developing area of information security — computer steganography. Thus, the upper panel of Fig. 5 shows the possibility of converting images of any content into an audio file, the dynamic spectrogram of which will visually coincide quite well with the image of the progenitor of this sound. As an example, a spectrogram of sound synthesized from a scanned image of a photograph of the author of this work is shown.
Fig. 5. Examples of computer steganophony
Above – oscillogram and spectrogram of the sound signal synthesized from a photograph;
Below is a spectrogram of steganophonic markers in the form of printed and handwritten text inscriptions converted into audio form.
In addition, it is possible to implement such a method of placing steganophonic markers, which consists of drawing conventional signs, text or inscriptions on top of or instead of the sonogram of the original audio signal, with subsequent synthesis for transmission to a public communication channel. After such transformations, a new audio signal is obtained, the spectrogram of one of the variants of which is shown in the lower panel of Fig. 5.
Using the approaches described here to converting sound into image and back, we can propose a new additional measure to protect confidential documents from falsification and counterfeiting — a «voice signature» (SS). Using this technology, at the end of a document, for example a contract, along with the usual signature and seal, each of the contracting parties puts a sonogram with a SS, in which the most important points of the document, closely related to its semantic content — the subject, amount and terms of the contract — are voiced by the responsible person. Changing these positions in the document is possible, but it will no longer be possible to change the SS. Note that on an A4 sheet of paper, an ordinary laser printer can print 2-4 minutes of continuous speech of telephone quality.
As an experimental test of this idea, pre-scanned sonograms from articles in the magazines «Spetstekhnika», «Konfident» and other printed and electronic publications were voiced. As a result of synthesis of these sonograms obtained with the help of various software products, the meaning and even individual features of the sound of the speech messages contained in them were completely restored. Moreover, after listening to a number of sonograms given as pictorial examples of the implementation of some noise reduction methods in completely different works, the author was surprised to recognize his own voice.
It is clear that in order to represent sounds as graphic images in some applications, it is possible and necessary to use other pictorial representations of audio signals. However, to facilitate understanding of the applied processes of digital speech processing, we will rely on traditional dynamic sonograms. In this vein, we will consider the following examples of technical closure of the RS.
One of the methods of technical speech closure, when the RS is simply masked by powerful interference, has already been considered by us in the context of the discussion of the issue of cleaning the distorted RS from quasi-harmonic interference. The sonogram of such a sound mixture is shown in the upper panel of Fig. 4.
Fig. 6. Examples of technical speech closure
At the top is a spectrogram of a signal synthesized from the sonogram in the upper panel of Fig. 2 with rotations of the three time-frequency elements selected on it by 1800 (time-frequency inversion), 900 (clockwise) and 1800 (time inversion);
At the bottom is a spectrogram of a variant of technical speech closure in the form of twisting the sonogram image in the upper panel of Fig. 2 into a spiral with subsequent synthesis;
The upper panel of Fig. 6 shows a sonogram of a new sound signal obtained on the basis of synthesis from the original sonogram image in the upper panel of Fig. 2, modified by various rotations of the selected frequency-time elements. For the first time, the possibility of obtaining a new sound signal, the spectrum of which is rotated relative to the original by 900, and not only by 1800 as in the cases of frequency and/or time inversion, has been realized.
The lower panel of Fig. 6 shows another possible method of technically closing speech, as a result of which we obtain a speech signal synthesized from a spiral image of the sonogram of the original speech (upper panel of Fig. 2). The new RS is absolutely unintelligible and when it is voiced, sounds similar to the whistle of a dolphin are heard due to the cyclic movement of the lower most powerful harmonics of the «old» speech signal across the entire frequency band of the selected communication channel. It is possible to restore technically closed speech in this way by performing the reverse «unwinding» of the sonogram of unintelligible speech.
Based on the proposed approach, other options for solving the problems of computer steganophony, speech signature and technical speech closure are also possible. It should be noted that the above methods of installing and identifying steganophonic markers and introducing unintelligibility into the original RS with its subsequent restoration do not always require synchronization of processing processes, as a result of which they can be used in communication channels not only during reception and transmission, but also in RS storage modes. Therefore, they can find their application in a wide range of speech-transforming and sound-processing devices, as well as in the transfer and storage of processed RS on audio cassettes and diskettes. It is clear that the combined use of FAPSI-certified cryptographic closure algorithms in the proposed methods of computer steganophony and/or technical speech closure will reliably increase the resistance of such systems to attempts by an intruder to obtain protected confidential speech information.
Conclusion
In modern voice communication security systems, computer technologies for digital signal and image processing are increasingly used. The main requirements of today for such systems are the speed and efficiency of performing various speech signal processing procedures using standard inexpensive computer telephony hardware, namely: a personal computer, a sound card, a telephone line interface device and/or a modem. These requirements can be met by using digital methods of dynamic spectral analysis-synthesis (DSAS) of speech and audio signals.
Here, a new approach to the construction of special software and hardware for sound and speech conversion based on standard computing technology was considered, combining the idea of converting an audio signal in the DSAS process into the form of graphic images (images of spectrograms and phasegrams) and back from an image into an audio signal or speech without loss of information content or intelligibility with the capabilities of known and promising methods and software products for digital image processing. It was shown that the main core of this approach is the development and use of methods for restoring and reconstructing traces of narrow-band components of phonoobjects present in the calculated dynamic images of spectrograms and sonograms.
The given examples of using the proposed approach in relation to solving the most common problems of ensuring the security of speech messages showed its high potential capabilities in the implementation of various, even very complex and previously unrealizable, algorithms for processing audio signals, which are already applicable today to creating computer systems for protecting speech messages in public communication channels. This approach can become the basis for designing new PC security systems and assessing the effectiveness of using voice message protection devices already available on the special equipment market.