Objective metrics for assessing the quality of video codecs.

Objective metrics for assessing the quality of video codecs

IN SEARCH OF A SUPPORT POINT
One of the trends in the CCTV market is the diversity of video codecs present in it. Almost every developer of CCTV software and hardware strives to develop their own unique video compression algorithms and proclaim their superiority over others. Since the developer's statements may not always correspond to reality, the consumer needs an independent expert assessment of the quality of video codecs. Expert assessment is usually carried out by well-known laboratories, scientific communities based on two main methods: subjective testing and the use of objective quality metrics.
In subjective testing, a group of experts are shown video fragments compressed with various codecs, and they rate the quality of these fragments on a certain scale. The expert assessments are then processed in one way or another, and an integrated quality indicator is obtained, for example MOS (mean opinion score).
The advantage of this method is the simplicity of interpreting the assessments obtained, since they are directly related to human perception. Significant disadvantages are the dependence of the results on the experience of experts, as well as the fundamental non-reproducibility of the results. The latter disadvantage also means that the results of testing conducted in two competing firms will be diametrically opposed. Therefore, the question of the reliability of subjective assessments is open.
Of course, the consumer would like to have more reliable assessments of the quality of video codecs. At the same time, it is desirable that these assessments have the property of repeatability. And here objective metrics of video quality come into play, in the case of which not only other experts, but also the consumer himself can repeat all the experiments. The testing process itself is also quite technological: press a button — get a result.
This article will consider the objective metrics used to assess the quality of video, show their advantages and disadvantages, problems that arise when using these metrics. But before moving on to the metrics themselves, it is necessary to say a few words about the features of human perception of video, since these features affect both the construction of compression algorithms and objective metrics for assessing quality.

FEATURES OF HUMAN VISION
Knowing the features of human vision allows us to construct objective metrics for assessing video quality that most closely correlate with subjective assessments.
The most important properties of human vision include the following.
Sensitivity to changes in image brightness. Human vision is capable of adapting to a wide range of brightnesses, and in each range a person is able to distinguish certain levels of brightness. This resolving power does not depend on the difference in brightness levels, but on the ratio of this difference to the average brightness value, i.e. on the contrast.
Frequency sensitivity of human vision is manifested in the fact that a person is much more susceptible to low-frequency noise than to high-frequency noise. This is due to the unevenness of the amplitude-frequency characteristic of the human visual system.
A feature of color perceptionis that some colors can coexist in human perception (for example, reddish yellow is perceived as orange), while others cannot (opposite colors). It is this feature that is used when representing images with various color difference schemes.
Masking effect in the spatial domainconsists in increasing the detection threshold of a video signal in the presence of another signal with similar characteristics. Therefore, additive noise is much more noticeable on smooth image areas than on high-frequency ones, i.e. in the latter case, masking is observed. The masking effect is most pronounced when both signals have the same orientation and location.
The masking effect in the time domain consists in the fact that due to the persistence of vision, a person does not immediately detect a change in the brightness of a video sequence scene.

REQUIREMENTS TO OBJECTIVE VIDEO QUALITY METRICS
There are several requirements for metrics. The following requirements are given in [1]:
1. Relevance of the metric: the “best” subjectively video fragments should correspond to the “best” metric value. This characteristic can be measured quantitatively, for example, using the Pearson correlation coefficient, or assessed graphically, as shown in Fig.

obektivnie metriki dlya ocenki kachestva videokodekov 2

2. Monotonymetrics: ideally, the difference between two objective ratings of video fragments should have the same sign as the difference between two subjective ratings of the same material. It is estimated by Spearman's rank correlation coefficient.
3. Consistencymetrics: the «deviation» of its values from the values predicted based on subjective metrics should not be large. It is calculated as follows. First, a series of subjective assessments of the video fragment are performed. The results are statistically processed, and the standard deviation of the assessments is found. Then the values of the objective metrics are calculated, and their number is found, which are located at a distance from the subjective assessments more than twice the standard deviation.

OBJECTIVE IMAGE QUALITY METRICS
The dissertation [2] considers six classes of image quality metrics:
1) Pixel.
2) Correlation.
3) Contour.
4) Spectral.
5) Contextual.
6) Taking into account the human vision system.

1. Pixel metrics
Pixel metrics include, firstly, various variations of the Minkowski metric, for example, the peak signal-to-noise ratio, which is introduced as the logarithm of the ratio of the maximum possible signal energy to the square of the mean square error (MSE).
This metric is rightly criticized for not meeting the three requirements above. Despite this, it is widely used, but not everyone knows that there is one nuance in its application: first, it is necessary to calculate the standard deviation for individual areas (color channels, different video frames, etc.), and then take the logarithm of the average standard deviation.
Another possible pixel metric is the maximum difference between pixels. Here, it is recommended to calculate several values of the maximum difference and find their mean square value.
The metrics discussed above have the limitation that only entire images are compared. It may be useful to compare images presented at different scales. It is known that the human vision system first evaluates the low-frequency copy of the image, and then goes into details. Thus, the difference between images taken at a coarse resolution can be given large weights, and the difference in high-frequency details can be given small weights. Such a metric is used in machine vision systems.

2. Correlation metrics
Correlation measures are interrelated with distance measures: if two images are identical, the correlation coefficient will be equal to 1, if the square of the error is equal to the energy of the image (for example, the pixel values of the other image are zero), then the correlation measure will be zero.
Correlation can be calculated both between image pixels and between the vectors formed by them (for example, the correlation of angles between vectors).

3. Contour metrics
Many studies have shown that contours are the most informative part of an image. It is contours that are primarily identified by the human vision system; contour analysis is used in machine vision. This means that the quality of contours indicates the quality of the image. Examples of contour degradation include line breaks, line blurring, line displacement, false contours, etc.In order to compare images based on contours, it is necessary to somehow select the contours on the original image (outline it), then select the contours on the reconstructed image using the same method and compare them (for example, calculate the correlation).

4. Metrics in the spectral domain
After calculating the Fourier transform of the image, it is possible to compare the amplitude and phase of the resulting spectrum. There are proposals for constructing image quality assessment metrics on this basis.

5. Contextual metrics
Contextual metrics use the presence of correlation links between neighboring image pixels and their weakening in a distorted image.
To obtain a metric, it is necessary to be able to calculate a multidimensional probability distribution function for pixel values from a certain neighborhood and learn to be able to determine changes in this function.
Another approach to local assessment of image distortion is to calculate and compare local histograms for the original and distorted images, for example, for blocks of 16 x 16. The comparison can be performed by applying, for example, the Spearman rank correlation criterion.

6. Metrics that take into account the properties of human vision
One of the possibilities for constructing metrics of this class is preliminary filtering of images with bandpass filters that imitate their human perception.
Another possibility for constructing metrics that take into account the properties of vision is to perform a wavelet transform on the original and distorted images, as a result of which the images will be presented on several scales. Then, for each subband of the wavelet domain, it is necessary to select a scale weight by which a particular metric calculated locally for this domain will be multiplied. Depending on the task, these weights can vary. For example, if it is important to take into account high-frequency components (line clarity, etc.), then the weights for high-frequency domains can be increased.
The metric can be calculated for the entire subband, or locally, for its blocks with subsequent averaging in one way or another.
In [2], many metrics of the considered classes were studied and their «independence» was examined. The close placement of metrics in the figure means their correlation (and redundancy).

obektivnie metriki dlya ocenki kachestva videokodekov 3

EXPERIMENTS ON ASSESSING THE QUALITY OF VIDEO CODECS. A PROMISING METRIC
The specialists of the Faculty of Computational Mathematics and Cybernetics of Moscow State University have developed software and have been testing various video codecs for years based on both subjective and objective quality metrics. The results can be found on the website dedicated to compression [3]. The corresponding software can also be downloaded there.
During the experiments, the best results were shown by the new metric SSIM (Structural Similarity Image Measure). As the name suggests, this metric evaluates the structural similarity of images. A detailed explanation of the theory of this metric is given in the authors' work [4], here we will provide a general scheme for its calculation

obektivnie metriki dlya ocenki kachestva videokodekov 4

WHAT'S NEXT?

The world community is making significant efforts to develop new, more effective objective metrics for video quality assessment. The main areas of research are focused on the following areas:

development of adequate models of human vision;
construction of adaptive models of human vision;
construction of metrics that almost completely or completely do not require the original video sequence for assessing the quality.

The leading community in this research is the Video Quality Experts Group (VQEG), whose materials can be obtained via the Internet, as well as the sources referenced in the text of the article.

Literature
1. Winkler S. Digital Video Quality. Vision models and metrics. Wiley, 2005. 192 p.
2. Avcibas I. Image Quality Statistics and their use in steganalysis and compression. PhD The-sis.Bogazichi Univ., 2001. 113 p.
3. http://compression.ru
4. Wang Z., Bovik A., Sheikh H., Simoncelli E. Image Quality Assessment: From Error Visibility to Structural Similarity //IEEE Trans. On Image Proc., Vol.13, No. 4, 2004.