Kaganov A.Sh.
Russian Federal Center for Forensic Examinations

1 The concept of forensic video-phonographic examination (FVE)

1.1 The concept and methods of FVE

In modern investigative and judicial practice, investigative situations arise more and more often, in which there is a need to conduct an expert examination of video and sound recordings. Such situations arise especially often during the investigation of criminal cases related to extortion, blackmail, corruption. The subject of such an examination is the establishment of factual data related to the image recorded on the videogram or the sound signals recorded on the phonogram (for example, human speech).
The object of the study in the production of this type of examination are recordings of images (videograms) and (or) sound (phonograms). The latter can record sounds of various origins: human speech, singing, music, the noise of a working mechanism, etc., data on which can be used in the analysis of certain investigative situations. This type of examination is divided into two types accordingly: videographic and phonographic [1].
The basic principles of video-phonographic examination are formed on the basis of scientific data borrowed from natural, technical, humanitarian and legal sciences. These data are transformed for the purposes of justice in accordance with the nature of forensic science. On the basis of the transformed data of various sciences, a parent expert science is created, a new branch of forensic science knowledge, which can be called, by analogy with the parent sciences of other forensic examinations (forensic handwriting, forensic ballistics, etc.), forensic videophonography. The latter covers two areas: the study of image recordings (forensic videography) and sound (forensic phonography).Speaking about forensic videographic examination, it should be noted that the adoption of new legislation and the expansion of the rights of operational services in the use of technical means, the availability of video recording equipment have opened up a broad prospect for the use of video recording in the process of solving and investigating crimes. Equipping law enforcement agencies with this equipment has led to a multiple increase in the number of cases of its use in operational activities and investigative actions. Video recordings are often attached to the materials of criminal cases as judicial evidence — documents and appendices to the protocols of investigative actions [2].
The saturation of the domestic market with household, semi-professional and professional video recording and video editing equipment suitable for making changes to previously filmed information footage leads to the need to check the authenticity of video materials included as evidence in criminal cases, including during forensic examinations of video recordings.
Identification of the maximum possible set of features characterizing both the video equipment from the video recording and the video recording itself enables the expert to identify changes made to the video recording and to identify a specific instance of the video recording equipment. Identification of a specific instance of the video recording equipment from the available video recording is also necessary because in most cases the expert does not have the opportunity to examine the video recording device itself. Existing methods for identifying the identification features of video equipment and video media are quite effective and have been properly tested in expert institutions.
Moving on to forensic phonographic examination, we recall that the use of phonograms for forensic purposes was legally permitted in the 90s [3]. The activity of an expert examining a phonogram has certain features determined by the nature of the object of study and the questions posed by the investigator (court). Solving expert problems in most cases requires a heuristic search for solutions and a situational approach. Phonographic examination is the only type of expert activity whose main tool is not sight, but hearing. An expert — a phonographer — must not only generalize the complex information contained in a phonogram, but also subdivide it in such a way as to simultaneously judge the content of what was said, the linguistic features of the speaker's speech, his personality, emotional state, the conditions in which the phonogram was recorded, etc.To understand the properties of a video image, voice or speech, it is necessary to construct their mental, ideal model. It is formed using concepts called attributes. The relationship between a property and an attribute is the relationship between the material and the ideal, built on the basis of the material. The system of attributes of each human voice, speech, as well as video or sound recording device in the process of expert examination must correspond to the system of attributes of the mental model, which the expert operates.

1.2 Scientific foundations of SVFE

The study of a video signal for the purpose of identifying the identification features of video equipment and video media is a complex type of study and requires the expert to have special knowledge in such areas of science and technology as physics, mathematics, cybernetics, information transmission, the theory of television measurements, magnetic recording of signals, and the theory of radio engineering.
Similarly, the scientific development of methods and techniques for studying spoken speech is based on the application of the provisions of physiology, biomechanics, medicine, mathematics, physics, problem solving theory, linguistics, psychology, electronics, cybernetics and other sciences to study its nature and various properties. The contribution of each basic science to the formation of the theory of forensic phonography lies in explaining the patterns that determine the structure and stability of the properties of speech as the most important factor in the accumulation and transmission of information, thoughts, moods, feelings. Speech is a specific form of human activity that serves for communication between people. It is inextricably linked with consciousness, thinking, and the human psyche and is formed on the basis of the language of the environment. Human speech is the subject of study by many sciences. Forensic phonography studies the provisions borrowed from:
physiology — the doctrine of human speech activity as a reflex function of the central nervous system; the presence of coordination and feedback in reflex processes;
— the doctrine of higher nervous activity, considering the sound design of speech as a conditioned reflex activity of the brain; the presence of functional systems (dynamic stereotypes) in the work of the brain during the generation of a speech utterance;
— the mechanism of the signaling system that carries out the formation of speech; the word as a specific signaling stimulus, as an essential element of the second signaling system;
from psychology — the main provisions of the psychology of speech;
— the doctrine of the general nature, functions and mechanism of speech;
— psychological features of speech development in children;
— pathological speech disorders and features of its development in children with anomalies;
— the doctrine of complex mental formations (intelligence and its connection with the individuality of the speech process) and personality traits (character, emotionality, temperament);
from linguistics — sections of phonetics that describe the acoustic and articulatory properties of speech sounds;
— the study of the formation of sound units, that is, the pronunciation activity of a speaking person in the articulatory or anatomical-physiological aspect [4];
— the study of sounds produced by the pronunciation organs (acoustic aspects of speech sounds).
Of great importance is the use of data from experimental — phonetic studies of oral speech accumulated in the field of experimental phonetics (applied linguistics). Many methods and technical means used in experimental phonetics can be used by forensic experts;
from anatomy — the structure and functional features of the vocal apparatus — organs involved in speech production (mouth, nose, pharynx, larynx, trachea, etc.).
Information about pathological speech changes is of significant importance for the development of patterns of identification research by voice and speech.
Cybernetics, or more precisely forensic cybernetics, its ideas, concepts, as well as methods, technical means and mathematical apparatus, allows forensic videophonography to create methods and techniques for solving expert problems based on mathematics, to develop algorithms and machine programs for solving problems of forensic video and phonography.
The human voice and speech are the subject of research in many sciences, including forensic examination. Thus, scientific research, begun at the All-Russian Research Institute of the Ministry of Internal Affairs of the USSR [5], and then at the All-Russian Research Institute of Forensic Science of the Ministry of Justice of the USSR and at the Georgian Central Scientific Research Laboratory of Forensic Science [6], on human identification by voice and speech created the prerequisites for the development of methodological and theoretical foundations for this type of forensic examination.
The peculiarities of identifying a person by a magnetic recording of his image or by a recording of his voice and speech include the fact that information about the sought object, perceived by an expert or technical means, can be removed from its source — display. Information can be transferred in space, saved in time, transmitted to another cognitive subject or technical device (for example, a computer). This peculiarity is closely connected with the concept of the identification period of forensic video-phonographic examination. (Recall that the identification period is the period of time between display and examination, during which the possibility of identification is preserved taking into account the patterns of change of the identified object over time.)
Another important feature of forensic phonographic identification is that the methods used to identify a person by voice and speech are complex. They organically intertwine the methods and techniques of various basic (parent) sciences that underlie forensic phonography; this determines the need for the participation of various specialists in the research process at different stages of the identification study.

1.3 Investigative-expert situations and tasks of SVFE.

Questions resolved by examination
Investigative situations that require the use of special video and (or) phonographic knowledge and are encountered in practice can be divided into the following categories:
1. Search, related to obtaining information about the identity of the wanted person based on the display of his appearance, voice and spoken language.
2. Identification, related to the task of identifying the person in the presence of the person being checked and the samples or used traces of the audio and video recordings necessary for comparative study.
3. Situational, related to establishing the conditions of the environment, acoustic environment and other circumstances of the displayed event based on video phonograms.
4. Informational and evaluative, requiring an assessment of the reliability of the displayed information in connection with the possibility of its falsification or subsequent changes to the original recording.
In accordance with the specified situations, the following questions can be resolved within the framework of forensic video-phonographic examination:
Questions related to search investigative situations
1. What information characterizing the speaker can be obtained from the phonogram of his speech?
2. Is the speech recorded on the phonogram male or female?
Questions related to identification investigative
situations
3. Do the voice and speech recorded on the magnetic phonogram belong to Count A?
4. Were the speech fragments recorded on the presented phonogram pronounced by Count A, Count B or someone else? Which fragments were pronounced by Count A and which by Count B?
5. What type of tape recorder was used to produce the sound recording?
6. Was this soundtrack made on the tape recorder presented for expert examination?
7. Are the sound signals on the soundtrack signals of a specific source (indicate the source)?
8. In what format, color television system (CTV), recording mode was the video recording made on the presented video cassette?
9. What is the serial number of the copy of the presented video recording?
10. Was the presented video recording made by this or another video camera?
Questions related to situational investigative situations
11. Does the presented magnetic phonogram contain fragments with recordings of spoken words?
Is the speech recorded on the phonogram unprepared or prepared (i.e. memorized in advance and pronounced by heart or by reading)?
13. How many people took part in the conversation recorded on the presented phonogram?
14. Was the speech recorded on the phonograms delivered by the same or different persons?
15. Is the presented phonogram a continuous recording?
16. Is this magnetic phonogram an original or a copy?
17. Were the oral speech recordings presented on the phonogram made on one or more tape recorders?
18. What is the nature of the sound signals recorded on the presented phonogram?
19. Could this phonogram have been recorded in the specified location?
20. What are the main characteristics of the room in which this phonogram was recorded?
21. Does the operator who made the recording have the necessary technical skills (judging by the quality of the recording)?
Questions related to information and evaluation investigative situations
22. What text is recorded verbatim on this soundtrack (individual fragments of it)?
23. Does the presented soundtrack contain signs of editing, selective recording, superimposition of one recording on another?
24. Are there signs of mechanical editing on the presented video recording?
25. Are there signs of interruption on the presented video recording?
26. Is the presented video recording an original or a copy?
27. Are there any signs of electronic editing on the presented video recording?
28. Are there any signs of erasure of part of the presented video recording?
29. Were all fragments of the presented video recording made by the same video recording device?
30. Are there any signs of discrepancy between the video recording on video cassette No. 1 and the video recording on video cassette No. 2 in terms of recording duration, quality, shooting angle, serial number of the copy, etc.?
31. Are there any signs of a typical design of video recording equipment on the presented video recording?
32. What are the dimensions of the objects inside the video frame?

2. Structure of information sources

2.1 Subject of the SVFE study

The subject of forensic examination is a scientific and practical concept. It is closely related to the concept of forensic examination as a science. But if in the theory of forensic examination it has a scientific character, then in forensic examination it relates directly to expert activity. The subject of forensic examination is the resolution of the tasks of examination to establish factual data reflected in the material carriers of information about them, by methodological means of expert research [7].
For forensic video-phonographic examination, the specification of these provisions may have the following form. The subject of forensic video-phonographic examination is the factual data on the phonogram (videogram) submitted for examination, on the sound or video recording device with which the said phonogram was made, on the voice and speech of the speaker, which are recorded on the given phonogram, on the circumstances in which the phonogram (videogram) was recorded, etc.
Thus, the subject of each forensic video-phonographic examination is characterized by the corresponding objects, tasks, methods (techniques), the theoretical and practical aspects of which are provided for by forensic theory in general and the theory of this type of examination in particular.

2.2 List-map of information fields of the forensic video-phonographic examination

Legal facts are established by a set of evidence. The forensic expert's opinion is included in the said set and is a part of it. Such an opinion reflects the connection between the questions posed to the expert, the sources of information, the information field, the objectives of the study, the subject of the study, the methods of study and the proof of legal facts. This connection is well illustrated by the list-map of the information fields of forensic video-phonographic examination, which is given below.
1. Search investigative situations
Question posed to the expert:
1. What are the personality characteristics of the speaker whose voice and speech are recorded on the original soundtrack?
2. Is the sounding speech recorded on the soundtrack male or female?
Source: magnetic media of various types;
Information traces of speech sound sources, field: recorded on media of various types;
Research objective: establishing personality traits by voice and speech;
Research method and analysis of features, research: constituting the speech portrait of the speaker (i.e. establishing personality traits with varying degrees of completeness);
Proof: data characterizing the personality of the performer
Legal fact: The identity of the performer of the oral text recorded on the phonogram.

2. Identification investigative situations
Question posed to the expert:
Do the voice and speech recorded on the original phonogram belong to the person(s) whose voice and speech sample(s) is presented for comparative identification research?
Source: magnetic media of various types;
Information traces of speech sound sources,
field: recorded on media of various types;
Research objective: identification of a person by voice and speech;
Research method: Comparative identification study of voice and speech;
Evidence: Identity (difference) of the voice and speech of the speaker on the original phonogram and on the sample phonogram;
Legal fact: Establishing the performer of the oral text recorded on the phonogram.

Question posed to the expert:
Was the videogram (phonogram) presented for examination recorded on a video or
sound recording (video recorder, tape recorder, voice recorder, etc.) submitted for research?
Source: magnetic media of various types;
Information traces of video and sound recording devices and field: channels for transmitting video and sound
information (telephone channel, listening device, hidden video recording devices);
Research objective: identification of video and (or) sound recording devices;
Method of comparative identification research: video and sound recording devices;
Evidence: identity (difference) of video and (or) sound recording devices;
Legal fact: Establishment of the video and (or) sound recording device with the help of which the video or phonogram was
produced.

Question posed to the expert: Does the non-speech signal recorded on the phonogram
submitted for examination belong to a specific sound source?
Source: magnetic media of various types;
Information traces of non-speech sound sources,
field: recorded on various types of media;
Research objective: identification of the source by the sound recorded on the phonogram;
Method of comparative identification research: study of sounds of non-speech origin;
Evidence: identity (difference) of sounds on the original phonogram and on the sample;
Legal fact: Establishing the source of the sound recorded on the phonogram.
Question posed to the expert:
Was the videogram (phonogram) submitted for examination recorded on a specific type of video or
sound recording device (VCR, tape recorder, dictaphone, etc.)?
Source: magnetic media of various types, sound and video recording devices;
Information traces of video and sound recording devices and the field: channels for transmitting video and sound
information on magnetic media (phonogram); results of signal processing using mathematical methods
(oscillogram, spectrogram, cepstrogram, intonogram, etc.); characteristics of sound and video recording equipment;
Research objective: establishing the properties of the means or material of video and (or) sound recording;
Comparative diagnostic research method: study of video and (or) sound recording equipment;
comparative diagnostic study of features characterizing a video or phonogram — a copy and a video or phonogram — the original; Evidence: identity (difference) of features characterizing a particular type of recording device with the features revealed on the phonogram under study; identity (difference) of features characterizing the phonogram under study with the features characterizing the phonogram-copy;
Legal fact: establishing the type of means or material of video and (or) sound recording; establishing the method of making a video or phonogram; primacy (secondary) of a video and (or) phonogram.

3. Situational investigative situations Questions put to the expert:
1. Is the oral speech recorded on the soundtrack presented for the study free, pre-learned or read?
2. Under what conditions was the recording of the conversation recorded on the original soundtrack made: indoors, outdoors, in a car?
Source: magnetic media of various types;
Information traces of speech and non-speech sources
field: sounds recorded on media of various types;
Objective of the study: analysis of the conditions under which the conversation recorded on tape took place;
Method of comparative diagnostic study: study of the signs of the speaker's condition;
study of the signs of speech impairment in the analyzed speaker;
comparative diagnostic study of non-speech sounds accompanying the conversation recorded on the soundtrack;
Evidence: identity (difference) of sounds on the original phonogram and on the sample or on the phonogram from the reference — information fund (RIF) of the expert;
Legal fact: conditions under which the oral speech recorded on the phonogram under examination was produced; conditions under which the conversation recorded on tape took place.

Question posed to the expert:
1. Was the person whose voice and speech were recorded on the original phonogram in a state of alcoholic or narcotic intoxication?
Source: magnetic media of various types;
Information traces of speech sources of sounds,
field: recorded on various types of media;
Research objective: establishing the state of the speaker's personality by voice and speech;
Method of analysis of features characterizing the research: the psycholinguistic state of the speaker (i.e. analysis of personality traits with varying degrees of completeness);
Proof: data characterizing the personality of the performer of the oral text
Legal fact: The status of the performer of an oral text recorded on a phonogram.

Question posed to the expert:
Is the phonogram (videogram) submitted for examination an original or a copy?
Source: magnetic media of various types, audio and video recording devices;
Information traces of video and sound recording devices and the field: channels for transmitting video and audio
information on magnetic media (phonogram); results of signal processing using mathematical methods
(oscillogram, spectrogram, cepstrogram, intonogram, etc.); characteristics of sound and video recording media;
Research objective: establishing the properties of video and (or) sound recording material;
Comparative diagnostic research method: research of video and (or) sound recording media;
comparative diagnostic study of features characterizing a video or phonogram — a copy and a video or phonogram — the original;
Proof: identity (difference) of features characterizing the phonogram under study with the features,
characterizing the copy of the phonogram;
Legal fact: establishing the type of means or material of video and (or) sound recording; establishing the method of making a video or phonogram; the primary (secondary) nature of the video and (or) phonogram.

Questions put to the expert:
1. Is the oral speech recorded on the phonogram presented for examination free, memorized in advance, or read?
2. Under what conditions was the conversation recorded on the original soundtrack: indoors, outdoors, in a car?
Source: magnetic media of various types;
Information traces of speech and non-speech sources of the field: sounds recorded on media of various types;
Research objective: analysis of the conditions under which the conversation recorded on the tape took place;
Comparative diagnostic research method: study of the signs of the speaker's state;
study of signs of speech disorder in the analyzed speaker; comparative diagnostic study of sounds of non-speech origin accompanying the conversation recorded on the phonogram;
Proof: identity (difference) of sounds on the original phonogram and on the sample or on the phonogram from the reference — information fund (RIF) of the expert;
Legal fact: conditions under which the oral speech recorded on the phonogram under study was produced; conditions under which the conversation recorded on tape took place.
Question posed to the expert:
How many people took part in the conversation recorded on the soundtrack submitted for examination (in its individual fragments)?
Source: magnetic media of various types;
Information traces of speech sound sources recorded in the field: on media of various types;
Research objective: to establish the number of participants in the conversation under examination;
Perceptual (auditory) methods, research methods: instrumental methods of voice and speech research;
Evidence: identity (difference) of voices and speech of speakers on the objects of research of the original phonogram;
Legal fact: established number of participants of the conversation recorded on the phonogram.

4. Information and evaluation investigative situations
Question put to the expert:
What text is recorded verbatim on the phonogram presented for research (on its individual fragments)?
Source: magnetic media of various types;
Information traces of speech sound sources, field: recorded on media
of various types;
Research objective: establishing the verbatim content of the text and its properties;
Research method: perceptual (auditory) methods, instrumental methods of voice and speech research.
Proof: identity of the oral text, which is recorded on the original phonogram, the resulting transcript;
Legal fact: the established content of the conversation recorded on the original soundtrack

Questions put to the expert:
1. Are there any signs of editing, selective
recording or any changes on the soundtrack submitted for examination?
2. Are there any signs of electronic editing, erasure of part of the video recording submitted for examination?
Source: magnetic media of various types;
Information traces of video and sound recording devices and the field: channels for transmitting video and sound information on magnetic media; sound and video recording devices;
Research objective: establishing the properties of video and (or) sound recording materials;
Comparative diagnostic research method: studying traces of video and (or) sound recording;
comparative diagnostic research of features characterizing a video or sound recording device;
Proof: identity (difference) of traces on the original phonogram and on the sample phonogram or on the phonogram from the expert's SIF;
Legal fact: establishing a change in a magnetic recording (video and (or) phonogram);

2.3 Equipment used in the work of an expert — video and phonographer

Magnetic video and sound recording technology has a number of specific features that can distort the information field and affect the accuracy and reliability of the display of traces of video and sound recording media or channels for transmitting video and sound information on magnetic media. These features should be taken into account when assessing the quality of the recording.
For example, the quality of the sound recording is considered satisfactory when the naturalness of the sound is preserved and the maximum correspondence of the sound to the original is ensured.
Technical progress in electroacoustics has led to the emergence of such technical and technological improvements and innovations as Dolby noise reduction, high-quality cross-bias magnetization, monocrystalline ferrite magnetic heads and heads with low wear of the working surface of magnetic tapes based on chromium dioxide, etc. All this has made it possible to significantly improve the quality of magnetic recording devices. Suffice it to say that if earlier the optimal speed of magnetic tape in studio tape recorders was 38.1 cm/s, then at present it is 9.53 cm/s. In modern tape recorders and voice recorders for mass consumption, the speed is set at 4.76 cm/s, 2.38 cm/s and even 1.2 cm/s.
Studio tape recorders, of course, provide the highest quality of sound recording, therefore the phonograms obtained with their help clearly reflect the characteristics of the voice and oral speech.
First-class tape recorders of domestic and foreign production are only slightly inferior in quality to studio ones, therefore distortions of the acoustic parameters of the speech signal are very insignificant, and they can be neglected in the process of identification studies.
At the same time, household tape recorders of the fourth class have significantly lower quality indicators: the operating frequency range at the linear output is 80 — 6300 Hz, the relative level of interference in the playback channel is 43 dB, in the recording-playback channel — 40 dB, the total distortion coefficient at the linear output is 5%.
Incomplete display of the identification features of voice and speech on phonograms due to distortions introduced by low-class tape recorders, although it complicates the identification process, but, as the expert practice of the laboratory of forensic video-phonographic examinations in recent years shows, does not exclude the possibility of solving expert problems.
Forensic examination may require magnetic phonograms recorded on devices that, as a result of operation, non-standard manufacture, reconstruction in amateur radio conditions, repairs, etc., have parameters that do not meet the requirements of GOST. Electroacoustic parameters of such technical means of sound recording can take many random values and, therefore, affect the expression of the identification features of oral speech in a variety of ways. The diversity of technical means leads to various distortions of speech signals, and this, in turn, requires individual analysis of the said signals in each individual case.
This necessitates, at the stage of preliminary examination, determining and evaluating deviations in the values of the features of oral speech that arose as a result of instrumental distortions and, if possible, taking measures to restore their expression. Therefore, in the process of conducting an expert examination to establish the main electroacoustic parameters of those technical means of sound recording, with the use of which the phonogram was obtained, it is advisable to conduct their preliminary examination. In the course of such a preliminary examination, all characteristics of the magnetic recording apparatus that can affect the reliability of the values of the features and the degree of their expression are determined. The specified electroacoustic characteristics are established according to known standard methods of testing equipment described in the technical literature.
Conducting tests of technical means of sound recording is not mandatory for all phonograms submitted for examination. It is required only when the quality of the phonograms, according to the results of the auditory assessment, is unsatisfactory.
One of the most effective methods of conducting the instrumental part of the comparative identification study of phonograms is the electroacoustic analysis of speech fragments. This method is implemented in many modern hardware and software systems of domestic and foreign production: SIS (developed by the Center for Speech Technologies in St. Petersburg), Cool Edit, Cool Pro (manufactured by Syntrilium software corp. USA), CSL (manufactured by Key Elimetric USA), MEDAV (manufactured by Germany), etc.An example of such a complex is a measuring and computing complex, which includes a spectrum analyzer of the sonograph type (for example, DSP SONA-GRAPH model 4300) and a personal computer (in the IBM standard). The operation of the sonograph is based, as is known, on the use of electrical filters, to which sound signals are fed, preliminarily converted into electrical oscillations. Filters allow the sound signal to be examined both as a whole and in separate bands.
Using this method, the expert, during preliminary listening to the original audio recording, selects those fragments of speech that contain characteristic features of the speaker. These features may indicate both a deviation from the norm, speech defects, etc., and variants of the norm used by the speaker. Then the expert studies the audio recordings of the suspect's speech samples and selects fragments that contain features similar to those found in the original recording.
The sonograph makes it possible to present the studied fragments in the form of a three-dimensional spectrogram (i.e. in the form of a three-dimensional model in the coordinates «time-frequency-intensity»). For comparative studies, spectrograms of two types are mainly used: broadband and narrowband. Broadband spectrograms allow one to identify formant features, while narrowband ones record the harmonic structure of speech sounds.
The main identification features of the human voice, studied in the process of studying spectrograms, are formants (i.e. peaks of spectral maxima of the speech signal), their movement, and changes in the fundamental tone of the voice.
The presence of a personal computer in the complex makes it possible not only to automate the process of comparing phonograms, but also to build special algorithms for processing information, the input to which are the sonograph readings presented in digital form, and the output — the above-described acoustic features characterizing the speaker.
In the process of comparison, there is usually an incomplete match of the values of the features due to their natural variability, differences in recording conditions and some measurement error. According to I.A. Zimnyaya, if the deviation does not exceed 15%, the features can be considered to be the same [8].
The measuring and computing complex intended for the extraction and analysis of speech fragments should include: a studio digital tape recorder, a spectrum analyzer (sonograph or its hardware and software analogue), a personal computer (IBM PC compatible), and peripheral devices of a personal computer.
Thus, when analyzing speech fragments using the electroacoustic method with the help of the described complex, the following main stages can be distinguished:
— selection of speech fragments suitable for conducting a comparative electroacoustic identification study;
— obtaining spectrograms of the selected fragments using a sonograph in the coordinates «time-frequency-intensity»;
— changing the characteristic features marked on spectrograms and transferring digital arrays to the computer memory;
— determining features that cannot be measured directly using information processing algorithms implemented in the computer;
— automated comparison using a computer of features identified on the original and comparative spectrograms;
— visualization of the obtained results using a sonograph and peripheral devices of a personal computer.
The proposed algorithm for implementing the electroacoustic part of the expert study not only allows to significantly simplify the process of processing spectrograms in order to isolate the information field necessary for conducting the study (which requires a lot of time and manual labor in the absence of a computer), but also provides the opportunity for a wide visualization of the results of the expert study. The latter allows to present the results of the expert's work in a simple and clear form.
It should be borne in mind, however, that the use of a sonograph or its digital analogues is only possible when working with slightly noisy speech material, since spectral analysis in general and analysis using a sonograph in particular are very sensitive to the presence of noise and interference in the phonogram. Therefore, hardware systems based on the idea of sonogram (spectrogram) analysis cannot be considered as a universal means of conducting the instrumental part of an expert study.
In the case of analyzing noisy phonograms, it is more appropriate to use an approach based on the study of prosodic characteristics of speech. Prosodic characteristics of speech include the frequency of the fundamental tone of the voice (FTV), duration and intensity. The set of statistical features used in the identification study of phonograms by this method requires the analysis of longer speech fragments.
This method is implemented, for example, in the SIS hardware and software complex (developed by the Center for Speech Technologies in St. Petersburg). SIS has a set of algorithms for isolating the FO, and the choice of a specific algorithm is made by an expert taking into account the quality of the signal and the acoustic conditions in which the recording of the studied phonogram was made. Control over the correctness of frequency or FO periods can be carried out by an expert using cepstral analysis algorithms.
The technical means by which these informative features of the sounding speech can be isolated and analyzed is also a mathematical version of the voice analyzer based on the modular system for Fourier analysis MFA 104, developed in the late 80s at the Academy of Sciences of the GDR. The advantage of this complex is that it is equipped with a special functional part for the isolation of features characterizing the PF of the speech signal.
During speech signal processing, the system enables the receipt of a graphical display of the change in the fundamental tone value, marking pause sections of speech and sections with «breaks» in the fundamental tone. It is also possible to obtain a graphical display of the continuous change in the maximum and minimum fundamental tone value, a synchronized display of the fundamental tone change and the intensity of the sound signal over a period of 80 s (then the synchronization process is repeated cyclically).
The algorithm included in the mathematical analogue of MFA 104 allows, in parallel with the removal and graphic display of information, to calculate a number of values: average NFT; maximum and minimum NFT; the ratio of the duration of unvoiced sections of speech to the duration of voiced ones; the ratio of the length of voiced sections to the total length of the analyzed fragment, etc. Some of these values are themselves features by which the emotional state of the speaker is determined, others serve as initial data for calculating «indirect», that is, mathematical, features. Both of these features, as is known, are successfully used in the process of producing phonographic examinations.
The analyzer, built on the principle of MFA 104, and the features extracted with its help can be successfully applied for forensic identification of the speaker by voice and speech.
To resolve issues related to forensic examination of video recordings, a visual analysis of the image of the presented videogram, an instrumental analysis of the video signal corresponding to the image and listening to the audio channel are carried out. To form the information field of the task of analyzing a video image, the expert must have the following minimum hardware complex at his disposal:
— a multi-system (PAL, SECAM, NTSC) video recorder with the ability to work in SP, LP recording modes (for example, Panasonic NV-G500EM);
— two video recorders of the S-VNS recording format with the ability to operate in frame mode and with a frame counter, a tracking and audio signal adjustment scale;
— two monitors with the ability to work:
in the «Cross» and «Frame» modes;
with a resolution of at least 500 TV lines and the ability to switch
over two lines;
— Panasonic WJ-MX12 mixing console;
— video printer;
— oscilloscope with the ability to isolate a television
line;
— PC/AT 486 class personal computer with the ability to input and process a video frame;
— video camera.

3 Methods of studying video and phonograms

3.1 Study of voice and speech

The study of magnetic recordings of spoken speech is within the competence of phonographic expertise, i.e. one of the types of video-phonographic expertise — a new type of expertise included in the class of forensic expertise.
Identification research of a person by voice and speech involves the use of three types of analysis — auditory, linguistic, instrumental [1].
During the audit part of the study, the expert analyzes mainly the first group of internal factors that influence the variability of spoken speech, i.e. factors that are associated with the individual anatomical features of the hearing organs and speech-forming tract, which depend on the gender and age of the speaker; articulatory characteristics of the voice and speech — the level of intensity (loudness), timbre, melody, rhythm, speech tempo; nationality (native language) of the speaker — the presence/absence of an accent; individual speech skills — emotional coloring, expressiveness, dynamism of speech. During the audit analysis, mainly those features are studied that are usually called general in forensic theory. The result of this part of the study is the establishment of a match (difference) between a number of features of the auditory group, identified in the process of analyzing the voice and speech of the person subject to identification, in the original recording, with similar features in the sample of the voice and speech of the subject (similarity of materials in the general auditory perception of voice and speech, timbre of the voice, manner of speaking, some specific speech features).
The linguistic part of the expert study is carried out in order to identify the linguistic identification features present in the speech material. For this purpose, various methods of auditory and linguistic analysis of the text are used.
Separate analysis of each of the magnetic phonograms submitted for examination involves the identification, study, recording and description of the linguistic features of oral speech. The objects of this part of the expert examination are oral speech and the units (elements) that make it up. The linguistic identification features are the features of the implementation of units of oral speech by a specific person. Thus, in the process of the linguistic part of the expert examination, those features that are usually called private in the theory of forensic examination are mainly analyzed.
The instrumental part of the study of phonograms submitted for examination can be carried out in various ways depending on the quality of the material being examined. This part of the study is aimed at identifying the characteristics of the corresponding group. For this purpose, measurements and analysis of the prosodic and spectral characteristics of speech signals recorded on phonograms are carried out.

3.2 Study of the sound environment, conditions, means and materials of magnetic sound and video recordings

Initially, some issues of diagnostics of the sound environment recorded on a phonogram were considered in the works of A.A. Levi, for example in [9]. In these works, it is rightly noted that the sounds of the environment surrounding us can be successfully used to establish objective truth, since, recorded on a phonogram, they can reflect the dynamics of the phenomena that caused them, and sometimes have greater significance for the investigation than such traditional objects of forensic investigation as fingerprints, burglary tools, etc., which represent only the final result of trace formation.
The importance of diagnostics of the sound environment should be judged by the circumstances of incidents, clarified only on the basis of the analysis of the sound texture of phonograms by acoustic phenomena that accompanied the process of sound recording. It is this fact, in our opinion, that is especially indicative for understanding the great operational and investigative informational potential of the sound environment.
Magnetic recording materials, which include magnetic tape — the recording medium, also quite often become objects of forensic video-phonographic examination. The study of magnetic sound and video recording materials consists of three parts: trace, audit and instrumental, and is carried out in accordance with GOST 13699-91.
Magnetic recording devices include tape recorders, dictaphones, video recorders, etc. These devices quite often become objects of forensic video-phonographic examination.
Identification of magnetic recording media, in particular sound recording media — is one of the tasks that form the subject of forensic examination of video and phonograms, i.e. forensic video-phonographic examination. To solve these tasks, special knowledge in the field of traceology is required, as well as knowledge in the field of cybernetics, mathematics, physics, electroacoustics, electronics. Therefore, studies of this kind should be carried out within the framework of a comprehensive examination. Specialists — forensic scientists, mathematicians, electroacoustics, radio engineers, chemists — can be involved in the production of such examinations.
In order to conduct an identification study of tape recorders, the RFC SE under the Ministry of Justice of the Russian Federation has developed methods [10] that allow identifying features recorded on magnetic tape and generally of a «parasitic» nature. Recording and analysis of features are carried out using a specialized software and hardware signal analyzer manufactured by the Center for Speech Technologies (St. Petersburg).

3.3 Study of video recordings

The study of magnetic recording of an image is within the competence of videographic expertise. At present, it is necessary to speak about two directions in videographic expertise: the study of video images and the study of conditions, means, materials and traces of magnetic video recordings.
The need to study the conditions, means, materials and traces of magnetic video recordings most often arises when information and evaluation investigative situations arise, when it is necessary to assess the reliability of the displayed information in connection with the possibility of its falsification or with subsequent changes to the original recording. Most often, the following questions may be put to the expert:
— Are there any signs of mechanical editing of the presented video recording?
— Are there any signs of interruption of the presented video recording?
— Are there any signs of electronic editing of the presented video recording?
— Are there any signs of erasure of part of the presented video recording?
To answer them, it is necessary to conduct, firstly, a visual analysis of the image of the videogram submitted for examination; secondly, an instrumental analysis of the video signals corresponding to the image of the videogram submitted for examination [2].

4 Evidentiary value of the results obtained by the expert and the establishment of legal facts

4.1 Identification features used in SVFE

As in any forensic examination, identification features in SVFE can be divided into general and specific. This division is valid for both identifying the speaker and identifying video and sound recording devices, the sound environment, and identifying the sound source.

Identification features of spoken speech
Forensic examination of spoken speech is based primarily on the analysis of its identification features, which, as noted above, can be divided into general and specific. General features reflect the properties inherent in human speech as a whole, while specific features reflect individual aspects of its elements and speech skills. The combination of values of general and specific features individualizes a person. General and specific features are interrelated, but the general features are decisive. These include the level of oral speech proficiency, the compliance of speech with certain regulatory requirements of grammar and orthoepy. On the other hand, the tempo, rhythm, fluency of speech and other specific features of spoken speech are divided into three groups: ordinary, pathological, sporadic.Ordinary — these are signs that arise in the process of normal functioning of the speech-forming tract and are caused by articulatory-vocal skills. They form the basis of expert examination of sounding speech.
Pathological (signs of speech disorders) — signs that appear as a result of congenital and acquired anomalies of speech, voice and hearing.
Sporadic — single, inconstant signs of voice or speech that arise under the influence of random factors (pain, alcohol or drug intoxication, colds, etc.).
Based on the method of selection, identifying signs of voice and speech can be divided into two main groups: perceptual and instrumental.
Perceptual — these are features identified on the basis of human perception. Among them are auditive features, which involve an auditory assessment of voice quality, and linguistic features identified on the basis of auditory perception and by linguistic analysis of the spoken text presented in graphic form.
Instrumental — these are features perceived using technical means. Among them, the following features can be distinguished: acoustic — any parameter of the sound wave of the voice can correspond to an electrical analogue in the form of a digital or graphic model, mathematical — obtained using special mathematical models of speech production.
The features selected for identification research are assessed from the point of view of their identification significance, stability, and mutual independence.

Identification features of sound recording media
Traces remaining on magnetic recording media and used to identify tape recorders contain various features.
For ease of consideration, all identification features, depending on their origin and the methods by which they are selected, are divided into three groups:
1. Traditional trace features
2. Trasologo — electromagnetic features.
3. Electroacoustic features.

Features that indicate possible editing or any changes to phonograms
The fact that oral speech recorded on magnetic tape actually belongs to a specific person does not exclude that the said phonogram may be composed of various individual fragments of one or more phonograms and combined (edited) using technical means.
According to GOST 13699-91, editing in this case is the deliberate selection and combination in a certain order of fragments of one or more phonograms of oral speech with the purpose of deliberately changing the original content or meaning of the statements of one or more persons participating in the recorded conversation.
Editing is performed in several ways [5]:
The simplest is mechanical editing, i.e. cutting out individual fragments of phonograms and gluing these sections of magnetic tape overlapping or end-to-end (using special glue or adhesive tape). Here, however, it should be noted that at present this type of editing is practically not encountered in expert practice.
Mechanical editing is also possible with subsequent re-recording of the edited soundtrack. In this case, the research is significantly complicated by the fact that visual detection of editing locations is impossible. However, as was established during the research, the places of splicing by specific impulses on the signalogram (self-recorder) can be detected not only on the spliced magnetic tape, but also on the magnetic soundtrack copied from it.
Editing by erasing individual words, phrases, fragments of oral speech using a permanent magnet or high-frequency current from the tape recorder's erase head is also possible. Such editing can be done either without subsequent recording on the erased areas or with recording new fragments of oral speech on them that are similar in duration to the erased ones. If no new recording was made on the erased section of the magnetic tape, the noise level on it will be lower than the noise level in the speech pause, which is clearly visible on the spectrograms. Possible erasure of individual fragments of the recording without subsequent recording on these sections is also determined by visual examination of the magnetic tape (using, for example, a magneto-optical crystal). In this case, the group affiliation of the tape recorder on which the erasure was made can be determined by the erased areas.
If the magnetic soundtrack does not show a drop in the level of speech pauses to the level of demagnetized sections of the tape, it can be stated that the tape does not show signs of editing by erasing recorded and adding new elements of oral speech. This corresponds to the known principles of magnetic sound recording and is explained by the fact that at the moment of a pause when recording through a microphone, the minimum level of the pause is higher than the level of demagnetized sections of the tape, since it is determined by the acoustic noise of the room and the electrical noise of the recording amplifier.

4.2 Identification features of video recording equipment.

Signs that indicate possible editing or any changes to videograms. Let us consider the parameters that can serve as identification features of video recording equipment:
1. The color tone of the sections of the parameters of the complete video signal determines the amplitude characteristics of the corresponding pulses. The color tone of the sections of the video signal on the monitor «Cross» corresponds to a certain level and filling of the video signal pulses.
2. The location and width of the sections of the complete video signal characterize the repetition rate and duration of the corresponding pulses.
Any change in the corresponding parameters indicates an interruption of the recording process, since it may indicate the presence of a sign of editing on the videogram and the use of another video recording device during the recording of the videogram under study.

4.3 Structure of the expert's report

The expert's report — video phonographer (as well as the act of video phonographic examination) consists of three parts: introductory, research and conclusions.
The introductory part shall indicate: the name of the examination, its number, whether it is additional, repeated, comprehensive or commission; the name of the body that appointed the examination; information about the expert and the persons present during the examination (investigator, etc.); the date of receipt of materials for examination by the expert institution, the date of commencement of expert proceedings and the date of signing the conclusion (act); the basis for conducting the examination (resolution or determination, when and by whom it was issued); the name of the materials received for examination, the method of delivery and type of packaging of the objects under examination; petitions for submission of additional materials, declared by the expert, the results of their consideration; the circumstances of the case that are essential for giving an opinion; questions put to the expert for resolution, in the wording in which they are given in the resolution (determination) on the appointment of the examination, if the question is put on the initiative of the expert, it is also set out in the introductory part.The research section of the report describes the research process and its results, and provides a scientific explanation of the established facts. The research section sets out: the state of the object of the expert examination (in the case of a recording of spoken speech, for example, the characteristics of the speech signal, the length of the phonograms, the signal-to-noise ratio, etc. are indicated); the methods of the expert examination applied and the features selected for analysis (for example, spectral features, prosodic features, etc. of spoken speech); analysis of phonograms — samples of the voice and speech of the subject of the examination; references to reference and regulatory materials (resolutions, orders, instructions, guidelines, GOSTs), which the expert was guided by in resolving the issues raised; the results of investigative actions (interrogations, inspections, experiments, etc.), if they are important for substantiating the conclusions; references to illustrations, appendices and the necessary explanations for them; expert assessment of individual stages of the study (in the case, for example, of an examination for the presence/absence of signs of editing — assessment of the trace, audit, instrumental parts of the study) and all the results obtained as a whole as the basis for formulating the corresponding conclusions on the examination. If it was not possible to answer some of the questions posed, the reasons for this are indicated in the research section.
In the research section of the conclusion (act) of the comprehensive examination, the studies of each of the experts are presented separately. In the research section of the conclusion (act) of the repeated examination, the reasons for the discrepancy between the results of the studies and the results of the primary examination, if any, are indicated.
The synthesizing part of the expert's conclusion — the video phonographer — completes the research part and includes a summary assessment of the set of features that were identified by the expert during the analysis, in terms of their sufficiency for the expert to formulate a conclusion.
The expert's conclusions are presented in the form of answers to the questions put to him and in the order in which the questions are set out in the introductory part of the report (act), or the impossibility of resolving the issue is indicated.
Conclusions on the circumstances about which the expert was not asked questions, but which were established by him during the study, are set out at the end.
The conclusions are set out in clear and distinct language that does not allow for different interpretations. In cases where the conclusion cannot be formulated without a detailed description of the results of the study, set out in the research part and containing an exhaustive answer to the question posed, references to the research part of the report are allowed.

5 Preparation of materials necessary for the appointment and successful conduct of a forensic video-phonographic examination, and evaluation of the expert’s opinion

5.1 Presentation of the necessary samples and technical means to the expert

In order to conduct a forensic identification study of a magnetic recording of spoken speech, the expert must be provided with comparative materials — phonograms with recordings of the voice and speech pronounced by the person being tested. The most important requirement for the quality of samples is their comparability with the object being studied. In [11] recommendations are given for the production of phonograms — samples of voice and speech, but the generally accepted rules for obtaining comparative material for expert identification of a person by the properties of his voice and speech have so far been formulated only in a fairly general form. Developing such rules suitable for all cases is a difficult and responsible task, since when obtaining a sample it is necessary to take into account the specific circumstances of the case and the investigative situation in which the recording presented for examination, which is material evidence in the case, was made. It is best if a specialist in forensic phonography or a forensic prosecutor can participate in the production of samples. An investigator who carries out this work independently must first become thoroughly familiar with the technique of making phonograms.
When making a phonogram with a recording of the speech of the person being tested, it is necessary to first listen very carefully to the original (main) phonogram, establish the situation and conditions of its production in order to develop a plan for the production of expert samples.
In order to ensure the suitability of speech samples for comparative research and their comparability with the main recording, it is necessary to strive to obtain comparative materials in similar technical and acoustic conditions. The properties of the speech signal recording depend on the location of the microphone relative to the sound source, the technical characteristics of the microphone and tape recorder, the acoustics of the room and the acoustic noise accompanying the recording. The recording must be made on the device that was used to record the main soundtrack, and if this is not possible, then using equipment of the same type. If the soundtrack is a recording of a telephone conversation, then it is also desirable to produce comparative materials using a telephone and a telephone adapter. Failure to comply with the recording conditions leads to distortion of the spectral and prosodic composition of the features being studied and, consequently, to a decrease in their identification significance.
Along with technical characteristics, the comparability of the original soundtrack and voice and speech samples from the point of view of the communication situation plays an important role.
In the process of preparing to obtain comparative samples, the investigator must determine which phrases and words should be recorded, what psychological conditions should be ensured and in what way (establishing psychological contact with the person being tested, precise formulation of questions, planning the content of the conversation).
In the process of experimental recording, various forms and methods of selecting speech samples are used:
monologue — a story on a topic related to the content of the main phonogram. Using this form of obtaining samples, they achieve their comparability in meaning, theme, intonation structure of individual phrases and statements;
dialogue — a recording of the investigator's questions and the person being tested's answers. The investigator must think through the content of the questions in advance in order to obtain comparative materials comparable to the main soundtrack. During the recording, the person being tested should not be interrupted; he must be given the opportunity to speak freely, even if he deviates from the topic;
reading a text similar or close in content to the main recording. In order to smooth out the difference between reading and spontaneous speech, it is recommended to repeatedly (5-10 times) pronounce the text fragments prepared by the investigator (judge);
repetition — the text prepared by the investigator is recorded on magnetic tape and then serves as a guide for the persons whose speech is recorded for comparison. Moreover, at the direction of the person taking the samples, the same text can be pronounced by the person being tested in different variations, for example, slowly, quickly, with a hand placed on the mouth, etc.
When using any method of obtaining comparative materials, it is important that the sample has the maximum similarity with the speech recording on the original phonogram. The investigator (judge) must strive to ensure that the speech of the person being checked corresponds to the original material in terms of tempo of pronunciation, expressiveness, manifestation of certain emotions, tonality. This is achieved by skillfully composing the text for the recording, creating adequate psychological conditions during the conversation between the investigator (judge) and the person being checked.
An essential factor ensuring the comparability of comparative materials is the absence of deliberate distortions in them, often made by the person being checked in order to disorient the expert. Therefore, the investigator must establish rules that will help neutralize such attempts.
Literature data indicate that distortion methods that are not associated with external manipulations (closing the mouth with a hand, etc.), but with functional modifications of the voice and manner of speaking, even in a fairly educated person who has a good command of speech, most often achieve only an insignificant result.
The methods of selected phonetic distortion are different, they depend on the knowledge of the language, the level of speech practice, and also on the preparedness of the person being tested. The dynamics and style of speech, the form of expression of thoughts (choice of words and construction of sentences) are most easily subject to modification. Such identification features of the audit group as articulation, accentuation, evenness of speech and its tempo are also quite easy to distort. The most difficult to distort for a long time are the properties of the voice: the pitch of the main tone, shades of sound, fullness of the voice and rhythm (phase, usual degree of tension).
If the investigator, when selecting comparative materials, notices that the person being tested is trying to distort his voice and speak in a manner that is not typical for him, it is necessary to create conditions that will make it difficult to carry out this intention. Long, casual conversations that are recorded on magnetic tape can lead to the desired result. Sometimes, witnesses who know the voice and manner of speaking of the person being tested help to establish an attempt at distortion. Attempts to distort the voice when selecting samples of voice and speech must be reflected in the protocol of this investigative action.
The investigator may not always notice the shortcomings of the samples received. Often, it is only during the expert examination that it is possible to establish that the samples sent do not meet the requirements of comparability and to determine what additional samples are needed to successfully resolve the questions posed to the expert.
In complex cases, when preparing materials for examination, as indicated above, it is necessary to involve specialists in the field of forensic phonography. Such specialists are able not only to assist the investigator in selecting for subsequent seizure from among the available sound documents those that can serve as free samples of the voice and speech of the person being examined, but also (after listening to the recording under examination) to give specific advice on the conditions for making samples, and, if necessary, to directly help obtain experimental samples of voice and speech.
In order for the magnetic recording not to undergo changes during storage and transportation, it is necessary to observe certain rules:
soundtracks should be stored in cardboard or plastic cases, which are placed in polyethylene bags and placed in a vertical position;
the permissible temperature in the storage location is not lower than -10°C and not higher than +35°C, the magnetic recording must not be exposed to significant heat (sun rays, etc.), the relative humidity is about 45 — 75%, if the magnetic tape was stored in a very dry place, it is necessary to restore its elasticity by keeping it for some time in the recommended humidity conditions and only then using it for recording or reproducing sound;
magnetic tape should not be left in a tape recorder (VCR), since it heats up and does not cool down immediately during operation, and under the influence of the heat generated, the magnetic tape dries out and deforms;
during storage and transportation, magnetic recordings must be protected from constant and variable magnetic fields (devices, electric motors, transformers, metal detection points at airports, etc.);
for transportation, the magnetic tape should be placed in a box, which is then wrapped in a thin layer of foil (made of aluminum or another metal) to protect the recording from the effects of a magnetic field.
It is also necessary to prevent the possibility of replacing the phonogram during storage and shipment, and to avoid cases of changing the magnetic recording, for which purpose it is advisable to keep the cassette sealed.

5.2 Some features of the evaluation of the conclusion of a forensic video-phonographic examination by the person or body that ordered this examination

The evaluation of the conducted phonographic research requires checking the correctness of the selection and design of voice and speech samples, since the reliability of the expert's conclusions depends on the quality and completeness of the comparative materials. In addition, it is of great importance to check the initial data and scientific provisions used by the expert, as well as to check the correctness of his choice of appropriate research methods and applied technical means.
Mathematization of expert research, application of various cybernetic methods, without which video-phonographic research is impossible, require certain training from the investigator, in the absence of which, in case of doubts, he must contact specialists who are proficient in these methods. Such consultations do not require procedural registration, however, if an error is found in the expert's conclusion, the investigator, as a rule, must appoint a repeat examination.
The validity of the expert's conclusion presupposes scientific, technical, logical and methodological literacy of the conducted research and the presentation of its results, as well as confirmation of the expert's conclusions with relevant facts, arguments and scientific provisions. In video-phonographic studies, such arguments of an objective nature are a detailed description of the identification features, as well as a detailed description of the course and results of the expert experiments; visual aids attached to the conclusion — graphs, spectrograms, histograms, tables, etc. — must be studied no less carefully than the text part of the conclusion.
When assessing the expert's conclusion, the investigator and the court first check whether all the questions posed have been answered, whether the expert's conclusion does not contradict the initial data, and whether there is a logical connection between this conclusion and the results of the study. The assessment of the reliability and scientific validity of the conclusions is closely linked to the assessment of the applied research methodology.
From the above it is clear that the evaluation of the conclusion of a forensic phonographic examination requires knowledge of the fundamentals, theoretical concepts, and basic methods of forensic videophonography, as well as an understanding of the sciences underlying forensic videophonography and the capabilities of modern spectral-analytical and computing technology.
Methods of video-phonographic research are based on quantitative methods. The role of computers and automated image and sound analyzers is constantly increasing, which introduces a number of additional difficulties into the evaluation of the conclusion. Forensic experts try to eliminate them by solving the problems of language, content and form of the expert's conclusion.
The language used to describe video-phonographic examinations is largely subordinated to the terminology of those sciences whose apparatus is used in the type of examination under consideration. This language contains mathematical and linguistic symbols, formulas, and various graphic constructions. Most of the methods and results of using a computer to examine, for example, a voice, do not have illustrativeness in the usual sense. The expert receives information about the object under examination after he has programmatically isolated and compared the voice characteristics using a computer. Therefore, when evaluating such conclusions, the investigator and the court must check the data entered into the computer (which the expert must fully and clearly describe in the conclusion).
The results of the analysis and comparison of speech features perceived by the expert by ear during the audit analysis are even less clear. Their assessment can only be facilitated by translating the spoken texts into written ones, which are provided with the appropriate notes.
One of the features of forensic video-phonographic examination is the use of programs in expert research that use various probabilistic-statistical methods for assessing the coincidences and differences of identification features (for example, voice and speech features). The capabilities of such methods are currently far from unlimited, so the expert cannot always come to a categorical conclusion and is sometimes forced to resort to a probable form of conclusions.
The practical usefulness of probable conclusions is beyond doubt. They can be used to build versions, determine the direction of the investigation, discover other evidence and thus facilitate the disclosure of the crime.

5.3 Special terms and keywords of the SVFE

Acoustics — the study of sound, sound vibration processes; sound conditions of any room.
Speech acoustics — a section of general acoustics that studies the structure of the speech signal, speech production processes and speech perception.
Acoustic-phonetic (phonetic) features of oral speech — features that reflect the acoustic properties of the speech-forming tract and human articulatory skills and their situational manifestation at the phonetic level. This group of features is perceived by ear and is detected using technical means, serves as the basis for instrumental analysis of oral speech phonograms; in some cases, the features can be assessed quantitatively.
Acoustic — related to auditory perception, representing oral speech as a whole and its elements, as a physical phenomenon, as a sound oscillatory process.
Amplitude — the value of the maximum deviation during an oscillatory process from some average position (equilibrium position); characterizes the magnitude of the oscillatory movement (size of displacement).
When studying harmonic sound vibrations, amplitude is understood to mean sound pressure in a signal, expressed by the amplitude of current, voltage or other electrical quantity at the output of a sound-converting device (microphone).
Amplitude-frequency characteristic of the recording-playback channel — dependence of the magnitude of signals at the output of the playback channel on the frequency of pre-recorded signals received at the input of the recording channel from a source with an unchanged level value.
Video recording and video playback equipment — a set of technical means used in video recording and video playback: video cameras, video recorders, video players.
Sound recording and sound playback equipment — a set of technical means used in sound recording and sound playback: microphones, amplifiers, mixers, tape recorders, dictaphones.
Articulation — 1. The set of actions of the pronunciation organs in the formation of speech sounds. 2. The process and assessment of speech intelligibility in the tested communication channel or for the output signal of the tested audio equipment.
VCR — a device designed for magnetic recording of video and audio information and its reproduction.
Video signal — an electrical signal with a wide frequency spectrum, carrying information about the image.
Video phonogram — a signalogram obtained as a result of video and sound recording.
Types of oral speech spectrograms — forms of spectrograms depending on the methods of spectral analysis and parameters of sound spectrographs: narrow-band, wide-band, dynamic spectrograms, spectral slices (frames, sections), CLP spectra, etc.
Visualization of magnetic tracks — the process of developing recorded magnetic tracks for the purpose of subsequent measurement and determination of their relative positions.
Signal dropout — a short-term interruption or significant weakening of the reproduced signal caused by defects in the recording medium or the operating features of the recording and playback device.
An utterance is a minimal product of textual activity, including the mental, physiological, intellectual and linguistic abilities of the speaker (writer).
Pitch of voice (sound) is a subjective qualitative measure of the sensation of voice (sound), associated with the impact of its fundamental frequency and timbre on the organs of hearing. A quantitative assessment of the pitch of voice (sound) is expressed by the value of the frequency of sound vibrations and does not always coincide with a subjective assessment.
Pitch of voice (sound) — the quality of the voice, depending on the frequency of vibrations of the vocal folds per unit of time: the more vibrations occur per unit of time, the higher the pitch (and, generally speaking, the sound).
Harmonic components (harmonics, overtones) — elementary, pure vibrations that together form more complex forms of sound vibrations; derivatives of the fundamental frequency of these vibrations, exceeding it by an integer number of times. The set of values of G.s. determines the timbre of the voice and is individual for each speaker.
Vowel sounds — speech sounds formed by the free passage of air in the mouth, consisting mainly of voice (voice tone) in the almost complete absence of noise.
Diagnostic features of oral speech — features by which it is possible to establish the territorial, social, physiological and other characteristics of the speaker.
Dialect (from the Greek dialektos — conversation, speech, adverb) — a variety of a given language used as a means of communication by people connected by a close territorial, social or professional community. A distinction is made between territorial and social dialects.
Dialogue is a process of direct exchange of messages between two subjects, in which there is a constant change in the roles of the speaker and the listener. Dialogue between people usually implies the presence of a purposeful exchange of messages, mutual understanding of the partners, a certain equivalence of all activities in the process of exchanging messages, expansion of knowledge, skills and abilities of each of them.
Dialogic speech (from the Greek dialogas — conversation, talk of two) — a form (type) of speech consisting of an exchange of statements-replicas, the linguistic composition of which is influenced by direct perception, activating the role of the addressee in the speech activity of the addressee.
Range — a value characterizing the maximum limits of change in the features of sounding speech; the difference between the maximum and minimum values of a feature.
A dictaphone is a device designed for recording speech, as well as for recording dictation with subsequent text printing («spin»).
Genres of colloquial speech (SS). Three main genres are distinguished depending on the number of communicants: monologue, dialogue, polylogue and corresponding subtypes.
Sound is a flow of energy that causes mechanical vibrations of particles of an elastic medium; from a physiological point of view, it is mechanical vibrations of an elastic medium that are subjectively perceived by the human hearing organs, causing certain sensations in the human body.
Speech sounds are the minimal units of the speech chain that result from complex human articulatory activity and are characterized by certain acoustic and perceptual (associated with speech perception) properties. Speech sounds are segmental means, since they are related to the minimal linear units of language — phonemes. Supersegmental sound means, such as tone, stress, intonation, are related to units of greater length, such as a syllable, word, syntagma.
Intonation is the unity of interconnected components: melody, intensity, duration, speech tempo, and timbre of pronunciation. Some researchers also include pauses in the components. Together with stress, intonation forms the prosodic system of language.
Communicators — partners in speech communication (the speaker is the addresser, the listener is the addressee).
Context — a fragment of text that includes the unit selected for analysis, necessary and sufficient to determine the meaning of this unit. Microcontext — the minimal environment of the unit in which it realizes its meaning; macrocontext — the environment of the unit under study, allowing us to establish its function in the text as a whole.
Lexicon — the set of words in a language, its vocabulary.
The linguistic level is one of the planes of language examination, determined by the properties of units identified by the researcher. The following levels are usually distinguished: phonetic, phonological, morphological, lexical, syntactic, semantic, and pragmatic.
Magnetic recording is a recording system in which recording is performed by changing the residual magnetic state of the carrier in accordance with the signals of the information being recorded.
Magnetic reproduction is a reproduction system based on the interaction of a magnetic reproducing element with the recording track of a phonogram.
Magnetic erasure — an erasure system based on magnetic action on a phonogram.
Tape recorder (cassette, reel-to-reel, digital) — a device designed for magnetic sound recording and playback.
Melody of speech — a component of intonation. It is carried out by raising and lowering the voice in a phrase.
Monologue — speech addressed by the speaker to himself, not intended for a verbal response from another person.
The language norm is the set of the most stable traditional implementations of the language system, selected and consolidated in the process of social communication.
The non-verbal component of communication is the use of gestural and facial expressions, the actions of the participants in the communicative act, as well as the consitution.
The zero vowel is a vowel that is reconstructed during analysis as a functional unit, but is not realized phonetically.
Objects of videographic research — videograms, video recording and video projection equipment, video information carriers.
Objects of phonographic research — oral speech phonograms, sound recording equipment, audio information carriers.
Relative level of phonogram erasure — the ratio of the recording level of a phonogram after erasure to the recording level of the same phonogram before erasure.
Relative noise level — the ratio, expressed in decibels (or relative values), of the noise level to the useful signal level corresponding to the maximum (or other standardized) recording level.
Signal-to-noise ratio — the ratio of the useful signal level corresponding to the maximum (or other standardized) recording level to the noise level.
Polylogue — a conversation between several people.
Full pronunciation style — clear (announcer, lecturer, etc.) pronunciation.
Suitability of oral speech phonograms for expert forensic examination — the presence of comparability of the materials submitted for examination (the phonogram under examination and the phonogram of oral speech samples) and their quality (intelligibility, volume and naturalness), meeting the requirements imposed on oral speech phonograms in the production of examinations.
Intelligibility of oral speech phonograms — the intelligibility, quality of oral speech, which depends on the speaker's articulation, the properties of the sound recording and sound reproducing equipment, and the acoustic conditions during recording and playback of phonograms. The measure of intelligibility is the ratio of the number of elements of oral speech correctly understood by the listener to all those pronounced. Depending on the purpose and nature of the forensic examination of oral speech, one or another measure of intelligibility may be sufficient.
Colloquial vocabulary — words used in casual conversation.
Conversational style of pronunciation — unclear, careless speech.
Reduction — weakening of the articulation of a sound and changing its sound (this mainly applies to vowels in an unstressed position).
Speech signal — an electrical process obtained at the output of a microphone voiced by speech.
Stereotypical speech/free speech — this characteristic shows the conjugation of the communicative act with the frequency/non-frequency of the situation.
Speech rhythm — regular repetition of similar and commensurate speech units.
Articulation signals (speech signals) — words and sounds that people occasionally pronounce when they listen to someone for a long time in an informal setting (like uh-huh, aha, mm, um, n-da, nu, da, da-da, no-no, etc.).
Syntagma — a rhythmic and melodic unit of continuous speech, grammatically formed and expressing a relatively complete thought within a more complex whole (sentence).
Syntax — 1) rules for structuring a speech utterance; 2) a section of grammar that studies the processes of structuring speech (combinability and word order).
Situational similarity of oral speech phonograms — coincidence of the set of circumstances under which the recording of the phonogram under study and the phonograms — samples of oral speech — took place. Such circumstances include: the form of oral speech, the nature of the relationship between the interlocutors, the meaning of the conversation, the rate of speech, the emotional state of the interlocutors, etc.
Communication situation — conditions that are taken into account when analyzing the speech characteristics of the speaker and his interlocutor (the material environment of the interlocutors, events occurring at the moment of speech, the behavior of the interlocutor, etc.).
Syllable — the minimum unit of a rhythmic phrase.
Text — a material object that represents the implementation of the speaker's activity in the form of a set of linguistic means (units of various levels) and reflects the motives, intentions and speech capabilities of the speaker in the conditions of a specific communication act.
Subject (given) — the starting point of the message; what is the basis of the statement; a component of the actual division of the sentence.
Narrow-band sonogram — the result of spectral analysis of the speech signal, in which the frequency resolution ensures the manifestation of the harmonic components of the fundamental frequency of the elements of oral speech.
Language levels. The phonetic level is related to the study and description of specific features of the implementation (articulatory, acoustic, auditory) of phonemes in the speech flow. Morphemic — with the structure of the word. The lexical level studies words and word forms. Syntactic — phrases and sentences.
Semantic — the meanings of words, differences in grammatical forms. The pragmatic level examines the connections between units of language and the user.
Phoneme — the smallest unit of the sound structure of the language, used to recognize and distinguish morphemes, words. It is realized in speech in the form of variants, shades (allophones).
Phonetics is a branch of linguistics that studies the sound side of language in its physical, articulatory and perceptual aspects.
A phonetic word is a group of syllables, including the main stressed syllable and the preceding (proclitic) and following (enclitic) syllables related to it.
Formant — the area of energy concentration in the spectrum of speech sound, determined by the resonant properties of the vocal tract. On average, three to four formants are observed in sound. Formants form the phonetic meaning of sound and carry information about the individual resonant properties of the speech-forming tract. Designated: F1, F2, etc. without specifying their frequencies.
Phrase — the basic unit of speech expressing a complete thought, semantic unity, the integrity of which is created by intonation (unifying phrasal intonation of one type or another and pauses separating a given phrase from neighboring ones, as well as a certain syntactic structure).
Phraseology — a section of linguistics that studies stable turns of speech.
Phonology is a section of linguistics that studies the structural and functional patterns of the sound structure of a language.
Hesitation is indecision, uncertainty, hesitation, conveyed in speech by means of pauses (empty, filled).
Formant bandwidth is the interval along the frequency axis occupied by the formant. It is designated depending on the formant number: B1, B2, etc.
A broadband sonogram is the result of spectral analysis of a speech signal, in which frequency resolution provides visualization of the formant structure of elements of the speaker's oral speech, which allows for the objectification of the expert examination process.
A cliché is a widespread «hackneyed» expression characterized by a faded lexical meaning and erased expressiveness.
Ellipsis is an omission of an element of an utterance that can be restored in a specific context or situation.
Language is a system of phonetic, lexical and grammatical means that is an instrument for expressing thoughts, feelings, and expressions of will, and serves as the most important means of human communication.

The following sources were used to prepare this collection of special terms and keywords: Linguistic Encyclopedic Dictionary. Moscow: Soviet Encyclopedia, 1990. — 685 p.; Potapova R.K. Speech: Communication, Information, Cybernetics. Moscow: Radio and Communications, 1997. — 528 p.; Identification of a Person by a Magnetic Recording of His Speech. Moscow: RFC SE, 1995. — 130 p., etc.

Bibliography

1. Identification of a person by a magnetic recording of his speech (Methodological manual for experts, investigators and judges). Moscow: RFCFS under the Ministry of Justice of the Russian Federation, 1995. — 130 p.
2. Kochetkov A.T., Serov V.N., Postavnin V.I., Vanin S.I., Goloshchapova T.I. Forensic examination of a video signal to identify identification features of video equipment and video media. Moscow: Forensic Center of the Ministry of Internal Affairs of the Russian Federation, 1998. — 40 p.
3. Bulletin of the Supreme Soviet of the USSR. Law on Amendments to the Fundamentals of Criminal Procedure of the USSR and Union Republics. Moscow, 1990. (Law of the USSR «On Amendments to Criminal Procedure of the USSR and Union Republics» of June 12, 1990).
4. Fant G. Acoustic Theory of Speech Production. Moscow: Nauka, 1964. — 285 p.
5. Lozhkevich A.A., Snetkov V.A., Chivanov V.A., Sharshunsky V.L. Fundamentals of Expert Forensic Research of Magnetic Phonograms. Moscow: All-Russian Research Institute of the Ministry of Internal Affairs of the USSR, 1977. — 172 p.
6. Ramishvili G.S., Chikoidze G.B. Forensic examination of speech phonograms and identification of the speaker. Tbilisi: «Metsniereba», 1991. — 265 p.
7. Kaledin A.I. et al. Fundamentals of forensic examination. Part I. General theory course. (Methodological manual for experts, investigators and judges). Moscow: RFC SE under the Ministry of Justice of the Russian Federation, 1997. — 430 p.
8. Zimnyaya I.A. On the method of studying the relationship between linguistic and individual characteristics in the spectral representation of a vowel sound (Methods of experimental speech analysis). — Minsk. — 1968. — 70 p.
9. Levi A.A. Sound recording in criminal proceedings. — M.: Legal literature, 1974. — 104 p.
10. Granovsky G.L. et al. Use of computers for the purposes of identifying tape recorders. Methodological letter. — M.: VNII SE, 1990. — 26 p.
11. Kaganov A.Sh., Mikhailov V.G. Features of recording voice and speech samples for conducting an identification phonographic examination. Proceedings of the All-Russian scientific and practical conference «Criminalistics of the XXI century» Rostov: CSK LSE, 2001 (in press).

Audio and video equipment as a source of evidentiary information.

Добавить комментарий