Expert examination of the GSM format.
Expert examination of the GSM format.
Examination of phonograms for the presence/absence of signs of editing or changes made during or after recording is one of the main diagnostic tasks of forensic phonography.
For example, we will use a simple scheme for presenting a phonogram to an expert:
1. The recording was not made by operatives, but by a private individual who does not have access to the encrypted traffic of the provider providing this communication service. In this case, a voice recorder is most often used, brought to the cell phone. (According to a real examination, in 2003, operatives monitored and recorded GSM conversations on an audio cassette using stationary equipment (probably from the provider), while the packet transmission of the cell phone was missing from the soundtrack).
2. Computer editing of the soundtrack is performed. After editing, the soundtrack is recorded via an acoustic channel or flash memory, or via a voice recorder input equipped for recording.
3. The audio cassette is presented as material evidence and examined by an expert.
4. Typical questions:
- Is the soundtrack original?
- Are there any signs of changes made during or after recording on the soundtrack?
- Are there any signs of computer editing on the soundtrack?
A little about the GSM format itself:
School of Phonographers October 13-25 in Moscow on the basis of the LSE of the Ministry of Justice of Russia:
=============================================================
GSM is a digital communication system in which the subscriber's input speech signal is converted into digital form using a device in the telephone set itself. The subscriber's speech is divided into portions, each 0.02 sec long. For each portion, a special algorithm is used to determine the main parameters of the signal (parameters of the speaker's speech tract model), which are coded and transmitted in compressed form to the correspondent in the communication channel. The speech coding algorithm is described in the recommendations of the GSM standard (RPE-LPC/LTP-coding with regular pulse excitation, linear predictive coding and long-term prediction). (See ETSI-GSM M. Mouly, M-B. Pautet. The GSM System for Mobile Communications. 1992. — p. 701. A. Mehrotra. Cellular Radio Performance Engineering. Artech House. Boston-London. 1994. — p. 536. P. Vary. GSM Speech Codec. Conference Proceedings DCRC, 12-14 October. 1988.). The speech signal at the receiving end is calculated (as they say, «synthesized») based on the transmitted parameters. In this case, the structure of the reconstructed signal is greatly simplified in relation to the original audio signal (the volume of data on the speech signal is reduced by approximately 5-10 times). The overall quality of the speech signal in the GSM telephone channel and the recognizability of the speaker in relation to the standard telephone channel deteriorate. This can be noticed by any user of such a telephone connection. This is also proven by objective measurements.
The specified compression algorithm includes a «tone-to-noise» detector. All currently known tone detectors have a pronounced defect: false tone detection in intense noise signals. As a result, the hissing sounds of oral speech and many acoustic signals of a noise nature are «voiced». The algorithm uses the so-called «post-filtering» — smoothing out all defects of the reconstructed (synthesized) speech signal at the output end with a special filter. As can be seen from the above, speaker identification by physical speech characteristics, diagnostics of the acoustic environment and linguistic analysis of phonetic speech characteristics in such a signal are significantly complicated. The main problem is of a fundamental methodological nature and lies in the fact that the synthesized speech signal is subject to study, from which the coding algorithm has excluded many essential features identifying the speaker, the environment and the communication channel. Without resolving the issues of the reliability of the reproduced signal and the limits of permissibility of its distortion specifically for the GSM channel communication signal, the expert study cannot be complete.
To increase the number of free communication channels according to the GSM standard, the so-called Discontinuous transmission of a speech signal is used. Discontinuous transmission is a method based on the fact that during the entire conversation a person speaks less than 40% of the time. In GSM standard telephones, during any conversation, a special unit operates in each telephone set: Voice Activity Detector. In pauses between replies, this unit switches off data transmission from the subscriber's device, and in order that empty pauses do not create auditory discomfort for subscribers, these pauses are filled with the so-called «comfort noise» generated by a special generator of this «comfort noise». The pauses are filled with noise, the spectral composition of which is close to white, and the audience's perception is vaguely reminiscent of the noise of pouring water, or interference of an analog tone telephony signal. Thus, each phonogram in the GSM channel is «assembled» from the subscribers' replies, between which an artificial signal of «comfort noise» is inserted. The entire edited signal, that is, the joints – the edited transitions between the transmitted speech signal and the synthesized noise – are smoothed out by a special, so-called “postfilter”.
Thus, the speech signal at the output of the encoding/decoding procedure according to the GSM algorithm always has sections of discontinuity in the transmission of the speech signal in the pauses between the subscribers' remarks and filling these sections of termination of the transmission of the useful signal with a homogeneous artificial signal of «comfort noise». This specificity of the speech signal of conversations using the GSM network introduces new problems when detecting traces of phonogram editing. If someone edits a new phonogram based on one or more phonograms of conversations of subscribers talking on the GSM cellular telephone network, then when placing editing transitions in the pauses between the conversation's remarks, detecting such points of editing transitions is a complex expert task requiring special research methods. (Timko E.V., Uskov K.Yu. Problems of forensic examination of digital phonograms, Proceedings of the Kyiv Research Institute of Forensic Expertise, 2001, the text of the article is available on the Internet: http://expert.ua) Moreover, it is not difficult to perform such editing, both with the help of computer complexes for digital editing of phonograms, and with the help of modern high-quality analog tape recorders, using the temporary recording stop mode. This task can be even more complicated if the edited phonogram was re-run through the telephone network, which can add to it the natural continuous noise of the telephone channel.
The fact is that in the sections of speech pauses, by the very nature of digital coding according to the GSM algorithm, there is not a real sound signal, but an artificial signal of «comfort noise». In the case of using for editing phonograms in which the conversations were conducted by subscribers in the same relatively quiet surrounding sound environment from the same telephone sets, it is usually impossible to detect «simple» signs of editing in the sections of editing transitions. The fact is that between the replicas of a standard phonogram in the GSM channel there are sections of insertion of artificial «comfort» noise. It is quite difficult to distinguish sections inserted by the GSM coding algorithm itself during transmission from sections of noise artificially inserted during the editing of the phonogram together with subsequent replicas. In any case, in places of such editing transitions there are no clicks, jumps in the level and frequency range of noise, impulses of switching on/off the recording equipment, fragments of words or phrases, violations of the logical unity of the conversation. Let us quote the opinion of well-known experts in the field of phonogram editing from the Kyiv Research Institute of Forensic Expertise (Timko E.V., Uskov K.Yu. Problems of Forensic Research of Digital Phonograms, Proceedings of the Kyiv Research Institute of Forensic Expertise, 2001, the text of the article is available on the Internet: http://expert.ua): “Traditional research methods for phonogram editing are of little use for the specified technique. First of all, this is due to the fact that when restoring a phonogram, post-filtering of the restored signal is performed for the purpose of smoothing. For this reason, as well as due to the inadequacy of the transmission of pulse signals, interference in phonogram files manifests itself only at the context-dependent (linguistic – S.K.) level.”
So how does the GSM 6.10 format distort information and possibly destroy all traces of editing?
To do this, we will conduct an experiment.
1. Let's create a file of 20 sec white noise with a level of (-6dB, i.e. 16345 samples). In two places we'll insert pauses of 5 sec.
2. Separately generate 75Hz level 200 samples 30 seconds long and overlay on the first file. We will consider 75 Hz as a signal. Signal to noise ratio by amplitude = 20Lg10(200/16375)=-38 dB, and by spectral density see the figure we see its complete absence:
Fig. 1.
with a window resolution of 32767 and spectrum averaging over a noise section of 5 seconds. Our signal is practically invisible, i.e. its level by spectral density is lower than the noise.
We examine the phase of the 75 Hz signal see figure No. 2:
Fig. 2.
The phase is quite linear, visualized perfectly.
Let's save the file in GSM 6.10 format and examine it on the same interval:
Fig. 3.
Noise appeared in areas where there was previously only our signal, but the 75Hz signal remained unchanged.
The phase characteristic of our signal, see Fig. No. 4:
Fig. 4.
remained exactly the same (a complete copy) as in the second figure before conversion.
Thus, the GSM 6.10 format itself did not give us any surprises. It did not do anything terrible to our harmonic, it only increased the amplitude by one and a half times and that's all. And now if at some stage an insertion, deletion, etc. occurs on a phase break, we will immediately notice it.
And so, for this we need the presence of continuous harmonics that penetrated the recording channel at the time of recording the original phonogram.
And this will also apply to those sections where a real noise support of comfortable noise is used.
We have now conducted a study of only the speech compression format.
2. And now the real examination — March 2004
Fig. 5.
The soundtrack is a recording of a conversation between two men using a cellular channel on a voice recorder brought to the handset. Auditorily, the difference between the two speakers and the features of the acoustic channel of the recording of one speaker and the acoustic features of the speaker recorded after his speech was reproduced by the cellular handset are clearly audible. The spectrum study revealed the following features:
— The presence of spectral bands, starting from approximately 650 Hz with an interval of 212.019 Hz. When studying the phase components, it was found that they are all harmonics multiples of 212.019 Hz, and at the same time the complete absence of the first two harmonics. The study of the phase of the 636 Hz harmonic showed its sufficient linearity and the possibility of studying the gap (cutting, removal) or superposition (interference).
— A feature of the transition to pause mode was identified: after one participant stops talking, the signal from his side continues to be transmitted to the channel for 0.42, 0.83 or 1.25 sec. As can be seen, the post-transmission of the signal is a multiple of 0.42 sec. (perhaps this delay is associated with the precise determination of the signal-to-noise ratio on the transmitting side after the conversation stops).
— The pause study showed the presence of strict time intervals for the transmission of synchronization packets (let's call this transmission during the pause synchronization) and a complete binding of all packets, including synchronization packets, to a time interval of 4.71 ms (212.019 Hz). Is this a standard for all GSM channels? My opinion is that it is not, but there is a need to check this further. This is possible within some acceptable limits near these numbers, but the fact that they are quartz-sealed — the linearity of the phase is visible and the discharge of the recorder's battery is visible on this linearity (the phase smoothly slides as the speed of the tape changes). Synchronization packets are transmitted strictly every 12 ms. In this case, after every three single packets, a packet consisting of 9 single packets is transmitted, see Fig. No. 6:
Fig. 6.
Let's summarize the intermediate synthesizing part:
1. In the GSM channel, everything is quartz-sealed and tied to strict time counts (there is a quartz generator of frequencies and time intervals).
2. We now know:
— 212.019 Hz main packet transmission frequency and synchronization packets synchronized to this frequency (every 12 ms during a pause).
3. Delay in switching to pause mode: 0.42, 0.83 or 1.25 sec (transmission tails before a pause are multiples of 0.42 sec).
4. Synchronization packets are transmitted strictly every 12 ms. In this case, after every three single packets, a packet consisting of 9 single packets is transmitted. If the channel has just switched to pause mode and a speech signal immediately appeared, then the exit from the pause may not be a multiple of 12 ms, but its front coincides with the frequency of 212.019 Hz.
Thus, we have collected part of the bouquet of features that an expert can study.
Let's continue our research and take a closer look at the frequency of 864 Hz. (the fourth harmonic from the main one)
Fig. 7.
Secondary phase modulation with a periodicity of 0.56 Hz (the grid is set with an interval of 0.56 Hz). In this case, the phase of the frequency under study is clearly visible in the pause (synchronization) periods, which is clearly visible in the figure. Thus, we begin to collect the characteristics of the dictaphone on which the initial recording was made (one of the feeder or receiver nodes rotates at this speed, or the tape in the cassette rotates unevenly (rubs against the edge of the case with such a periodicity)). This characteristic is slowly changing over the playback time and only in one direction (the tape is wound, the diameter of the reel changes), without interference (one of the significant characteristics of a single recording, i.e. one of the characteristics of the original), which gives us a good field for microscopic analysis of the entire phonogram. Thus, on this secondary phase modulation in further research, we will be able to restore the speed of the magnetic tape and possibly track the discharge of the dictaphone battery.
Yes, but we forgot to look at our constantly researched components 24,45,50,74,75, 80,85,90,100,150, 200,250,300 Hz. Logically, they should not be there (recording on a dictaphone).
Fig. 8.
It turns out that 50 Hz is present, and even of a huge magnitude — 87 dB, we know for sure that this is not ours, but 219 Hz is our harmonic (the voice recorder used for digitization at the expert's workplace produces exactly this frequency), we do not study it. But where did the 50 Hz component come from, we will figure it out later.
We measure the linearity of the 50Hz phase, Fig. 9:
Fig. 9.
1. The amplitude is simply enormous = 1.1 count (see the figure above).
2. The phase is smooth, linear, without breaks.
3. There is no interference (in case of overlap).
4. An attempt to track the closely located 50Hz +- 3Hz component was unsuccessful (also a good sign).
4. There is no way this interference could have come from the GSM channel, but it is on the tape.
5. The question arises of making a request to the investigator about the details of the recording (it seems that the cell phone was powered from the network, a voice recorder was placed nearby and the recording was made. If it is confirmed, then it will probably be the original, but it is too early to say «categorically» if it is confirmed — there were cases of 50Hz on the original and after editing they appeared and were absent).
6. Let's look to the left on the average spectrum of Fig. 8, there are some bursts there too.
Fig. 10.
7. We find 2.6, 4.5, 5.86, 8.9, 10.8, 14.1, 17.9 Hz. Now we will also deal with them: at what moment they appeared, and how they are multiples of the smoothly changing frequency of rotation of the tape reels, and to the change in phase of 50 Hz.
Fig. 11.
4.4, 8.8 Hz are phased with each other. See Fig. 11. In this case, the frequency increases during playback (the phase is running up)
Fig. 12
5.86, 10.856, 14.47423, 18.09234 are also phased with each other. In this case, the frequency remains fixed (the phase is strictly linear).
When compared with the behavior of 50Hz, a complete lack of synchronicity in the behavior of the frequency and phase of the harmonics under study was revealed.
A study of the amplitude characteristics of all harmonics showed their uniformity and constancy throughout the entire phonogram.
The first impression is that something has been done wrong. Why are there harmonics that are independent of playback time, while the others are clearly tied to tape tension? And at the same time, 5.86, 10.856, 14.47423, 18.09234 are not multiples of each other. The proposal is as follows: the last 4 harmonics are the result of interference of two or maybe three frequencies that have a common frequency stabilization (reference generator), and the difference between them or from their difference multiples got into the recording channel. They have never been observed during digitization before. (Thoughts out loud: “I bought an uninterruptible power supply, replaced my switch, I’ll try the old-fashioned way without a UPS”). I’m re-digitizing the phonogram again, “ the result is different — 5.86, 10.856, 14.47423, 18.09234 immediately disappeared (We will have to remember to turn it off during digitization in the future). Now only two remain: 4.4 and 8.8 — these are the ones that are native to the recording device that made the recording (and in the process of their study we must once again make sure that they are native to the recording channel of the original soundtrack).
In order to study the synchronicity of packet transmission, a phonogram with a periodically changing shift was completely copied and superimposed on itself in order to identify signs of obvious insertions or deletions. The beginning of the packet fronts was studied throughout the phonogram:
Fig. 13.
Now only a full study of the behavior of these three harmonics begins, comparing them, combining each burst by listening to the phonogram. Study of 50 Hz, search for interference with another similar or closely located, multiple, etc. Let's not forget to look closely at the noise spectral component of the entire phonogram and the averaged frequency response of both speakers in fragments.
After completing the study:
1. The presence of 50 Hz after receiving additional information about the recording production process turned out to be acceptable.
2. When studying 50 Hz, there was no interference with another similar or closely located frequency, a detailed study of the aliasing tails, which are good signs of the absence of digitization and the phonogram's previous presence in digitized form.
3. No frame scanning frequencies of monitors were detected — the same good sign of the absence of digitization and the presence of the phonogram in a previously digitized form.
4. The speed of movement of the magnetic tape during the recording process was completely restored (a sign that I cannot imagine how it is possible to deliberately fake).
5. A study of the beginning of the packet fronts throughout the phonogram showed absolute synchronicity of the packet fronts and their multiple of the time interval of 12 ms.
6. A detailed study of the 864Hz frequency showed that during pauses, when only short synchronization is transmitted to the channel, and the 864Hz frequency is phased with the synchronization, this harmonic was perfectly restored EVEN IN PAUSES (if you remember that in a pause, after three singles, one packet with nine singles goes, and they are tied and phased with the reference, the same is present in pauses and is perfectly visualized).
The synthesizing part is now much easier for an expert to write: (confidence confirmed by the study performed).
For example, we will make an edit to see it on three harmonics simultaneously and on the same phonogram, while cutting out a piece by the pauses of the transmission (as was said how to make an edit at the beginning of the article) and inserting the same thing in a pause of the transmission in another place:
Fig. 14.
A piece was cut out at 3m.13sec and inserted at 4m.53sec. In this case, in the first place we find a break in three harmonics. At the place of insertion — double breaks (see vertical marks). If you look closely and remove the first mark where we cut out, then by the asymptotic behavior this place of cutting can still be found approximately.
Now I suggest re-reading the article about the problems of research on installation given at the beginning of the article.
Feedback and suggestions, who disagrees with what, additions, changes Your personal experience, interesting moments. I will be glad to hear from you. e-mail: illidiy@orel.ru