Video surveillance requires special data compression solutions.
With modern video surveillance technologies, large volumes of high-definition images and megapixel formats are no longer a problem in video surveillance systems. Gigabit Ethernet networks and terabyte hard drives allow the focus to be on the ultimate goal of CCTV systems: collecting high-quality information that is essential for security. However, this requires modified data compression methods. Those borrowed from multimedia applications, such as H.264 or MPEG-4, limit the capabilities of video surveillance systems. At first glance, they save costs because they reduce bandwidth requirements, but ultimately prove to be a factor in increasing the cost of the overall system.
Users often assume that if products from different manufacturers work according to the same standards, they are comparable in terms of cost, quality and flexibility. This is a significant misconception.
The differences between products can be very significant even in such a relatively homogeneous area as multimedia. They will be even more pronounced when applied in such areas with significantly different purposes — multimedia and CCTV.
Standards like H.264 provide freedom in developing methods for data compression, but this development turns out to be expensive. Therefore, many video surveillance system manufacturers with small markets for their products try to avoid new developments and often resort to using cheap solutions from the multimedia sphere. Such solutions, when trying to use them for purposes for which they were not intended, may turn out to be unsuitable due to the emergence of a number of compromises. This means that some manufacturers may have been deliberately misleading users for many years. In doing so, they save on development costs, and shift problems that they could well have solved onto the shoulders of users.
What can happen if the specific requirements of video surveillance tasks are not taken into account when developing data compression products can be illustrated by the example of the so-called P-frame chain problem.
The Problem of Compression with Interframe Compression
Video data compression can be frame-by-frame, such as MJPEG, or interframe, such as MPEG-2, MPEG-4, and H.264. With MJPEG, each individual frame of the image is compressed independently of the others. Interframe compression involves dividing frames into groups. The first frame in a group is the reference frame and is encoded independently of the others. For the remaining frames, called intermediate frames, only changes relative to the reference and previous ones are encoded.
Single frame compression and interframe compression. Intermediate frames are created in most products only in relation to previous image frames.
Since the changes in images from frame to frame are often small, interframe compression generates a small amount of data compared to single frame compression methods, thereby reducing the costs of storing and transmitting the compressed video. However, this has significant disadvantages. Since the intermediate frames only contain information about the changes, all the frames that were used to generate these changes must be in memory during decoding. This increases the cost of data decompression. If, for example, one of the frames referenced in a specific current frame is lost, full decompression can no longer be performed without distortion.
For multimedia applications, the compromises associated with this problem are often only a minor inconvenience: forward/backward frame-to-frame skips in DVD recordings are only possible in large «steps». In live television broadcasts of images, in the presence of interference, frames of images may be lost, which leads to distortions in subsequent frames of the sequence. Channel transitions in digital data transmission are associated with relatively long waiting times. In the field of multimedia, live television broadcasts of real images do not exist at all, since the compression methods used allow data loss and introduce delays of up to several seconds. But while in multimedia applications these inconveniences can be tolerated, in CCTV such shortcomings are significant and affect security as a whole.
The P-frame chain problem
Typically, the interframe compression method of video data compression operates with chains of so-called P-type frames. The image frames in a sequence or in a group of pictures (GOP) form a chain of frames that begins with a reference frame, the so-called I-type frame, i.e. a frame that does not depend on other frames.
Formation of chains of P-type frames containing changes in images
Impact of frame loss on subsequent frames of the P-type frame chain
In frames from a chain of P-type frames, only changes in images are compressed and saved. Each time the next reference frame, the I-type frame, is generated, a new chain of frames begins, which is supplemented by P-type frames obtained by calculating changes to the corresponding previous frame. To decompress each subsequent frame of such a chain of P-type frames, it is necessary to decompress all previous frames of this chain, including the reference frame, the I-type frame. If any frame is lost from the chain, then all subsequent frames can be decompressed only with distortions. In this case, voids of up to several seconds in duration may appear in the image stream, depending on the frame rate and the number of frames in the chain.
For use in CCTV, such a chain structure of video data has strong negative sides, since it allows for the possibility of frame loss with subsequent appearance of artifacts, and for video surveillance, it is the absence of distortions with partial data loss that is considered one of the main requirements.
If, when using interframe compression, a channel with a frame rate reduced by half is output from a live broadcast channel in real scale, the output will be distorted images with artifacts.
If this requirement cannot be met, then the implementation of many typical functions and the solution of the video surveillance tasks themselves are possible only with restrictions. Here are some examples.
Live broadcast of real images and their recording on media with different frame rates
This is one of the typical methods for reducing video data storage costs, the so-called Time-Lapse Method with variable frame rate image recording. The achieved savings of the method are of an order of magnitude that is in principle unattainable with video data compression. For example, a low frame rate of 5 frames per second is quite sufficient for documenting certain processes. However, there is often a need for simultaneous live broadcasting of live images in real time. In this case, if a camera is available from which only one video stream is received, then recording video data at a slow rate by simply thinning frames using the MJPEG compression method is difficult to implement, and is generally impossible with interframe compression. An attempt to do this will lead to the destruction of P-type frame chains, which are required in their entirety for decompression of video data.Typical compromises to overcome this problem are to either record video data at a higher frame rate than required, or to slow down image playback to a frame rate that allows simultaneous recording. The result of the first compromise is that, despite the use of H.264 compression, the storage costs may be higher than with MJPEG.
Video Analysis of Image Content
Video analysis of images is often performed in video data streams with a slow frame rate. In this case, only those frames should be analyzed that correspond to the speed of the observed processes. If the camera has a large viewing angle and only slow movements occur in its field of view, a video data stream rate of several frames per second may be sufficient to obtain complete information about these movements. Taking such factors into account, it will be possible to reduce the load on the system and the overall costs of video analysis, since decompression of video data in the host computers of video management systems takes up a significant part of these costs.
When thinning frames, the channel will not be able to provide conditions for fulfilling all the requirements of video analysis simultaneously with recording and live broadcasting of images in real time, and then the video analysis algorithm will be forced to analyze all frames from the live broadcast stream, even if it does not need them. Skipping any frames due to the appearance of related artifacts during decompression is in principle unacceptable, since in this case the distortions in the images will be perceived by the system as movement in the frame, which will cause false alarms. That is, if the frame rate would be sufficient for video analysis in the channel (for example, 5 frames per second), but it is forced to decompress 25 frames per second, then a 5-fold load on the system occurs during data decompression, and therefore, accordingly, an increase in costs.
Transferring video analysis directly to the camera does not solve the problem. On the one hand, the computing capabilities of the camera are limited compared to the host computer, and therefore many methods that require high performance are generally not feasible. On the other hand, even if they are implemented in the camera, you may end up being highly dependent on the product and its manufacturer.
In addition to the problem described here and the limitations associated with it, P-frame chains also cause a number of other problems that have one common property — the need to prohibit arbitrary frame skipping, which makes it difficult or even impossible to implement certain functions and meet certain requirements, such as:
Creating video archives with the function of thinning over time (Fading Long Term Memory). It is assumed that the older the image frames, the less valuable they are, and therefore, in order to free up and save memory, aging video recordings are thinned by removing certain frames from them, while the frame rate of the video recordings decreases.
Exporting videos at a lower frame rate than they are stored in the video recordings, which is often necessary to reduce information in specific situations.
High demands for video playback comfort. If forward/backward running on single frames, slow motion without jumps are relatively easy to implement, then very strict requirements are imposed on the synchronous playback of several recorded channels for observing situations simultaneously from different angles. In addition, P-type frame chains cause jumps when searching for frames, which greatly affects the comfort of image playback.
Adequate video data compression for security television
However, it is still possible to optimally align video compression methods with interframe compression to the requirements of video surveillance. In order for the compression of P-type frames to be independent of the presence of all frames in the chain, it is necessary to address such parameters of the standards as degrees of freedom, which usually do not play a role. However, the technical implementation of this task is associated with intervention in the data compression process, which assumes that in each group of image frames (GOP), the calculation of changes for each P-type frame should be performed with reference to the reference frame, i.e., the I-type frame, and not to the previous frame of the same P-type with the formation of a chain, as provided for by the standard. With this implementation of the compression process, the P-type frames in the group of image frames are not linked into a chain, and the loss of individual ones will not affect the decompression of subsequent frames.
The H.264 standard allows for the creation of such structures, but they are not used in the multimedia sphere, since this approach leads to a decrease in the efficiency of data compression, to a higher bit rate and the associated increased costs of data transmission or storage.
Video compression using interframe compression with unchained P-type frames
In CCTV, this disadvantage is largely outweighed by the gain in flexibility and alternative cost savings, such as time-lapse recording. The ideal data compression product is one that gives the user the freedom to choose between the conflicting goals of flexibility and compression efficiency. Such products not only exist, but are also offered on the market.
An important functional parameter of these compression methods, specially modified for video surveillance, is the ability to control the generation of reference frames, I-type frames (Instantaneous-Frames), without delays. Without this ability, many video surveillance processes experience long time delays, since image quality or resolution can only change when a new reference frame, I-type frame, appears in the data stream. If you have to wait for a new group of image frames (GOP) to start, then control of the entire equipment and process becomes slow or even important information, such as alarm frames, can be lost.
Conclusion
There is no point in using data compression to save some 10% of bandwidth if the equipment will operate with an unnecessary resolution and increased bit rate. Or, insufficient flexibility in accessing image frames will greatly reduce the possibilities for effective use of the system. By making relatively simple changes to data compression methods, significant improvements and acceptable trade-offs between the conflicting goals of cost and functionality can be achieved.
Video surveillance-modified data compression products are superior to standard products borrowed from the multimedia industry in this regard. They provide a better balance between cost and flexibility.