What's Today?.
Intelligent analysis, video analytics, intelligent video surveillance systems — it has become unfashionable to talk about «simple» motion detectors lately. But in fact, if you do not count the license plate recognition systems, motion detectors still mainly belong to the category of video analytics. Sometimes (rarely) there are reverse and opposite variations (conventionally speaking, «rest detectors») — detectors of abandoned objects, «museum» detectors, but all this, in fact, belongs to the category of motion detectors.
The fashion for intelligence arose at the end of the last century, when the capabilities of motion detectors reached a level acceptable for practical use and motion detectors began to be hung with all sorts of bows, useful and not so useful. First of all, this is automatic adjustment from changes in lighting conditions, weather conditions, and other features. Sometimes there were such additional features as analysis of the shape or behavior of the observed object, however, as a rule, both are implemented very simply in practice: automatic adjustment means the presence of several (from two to a hundred) different sets of settings that switch depending on the time of day, season, or several simple features. Behavior analysis also usually comes down to the fact that movement in one direction is considered suspicious, and in the other — acceptable. The fact is that truly intelligent systems require extremely long adjustment to the features of the area, the features of the suspected violators, etc.As an anecdote, I will tell you about a system created on the basis of neural algorithms, which is a classic example of a self-learning system. It showed remarkable results in the prototype version, but to train it, it was supposed to be shown several thousand videos shot in various conditions, and, what is most unpleasant, half of them had to contain the target detectable situation — an intruder. Imagine the task: at a specific object, to film at least a thousand times a person imitating an intruder, penetrating in various ways, it is desirable to cover as widely as possible all possible manners and methods of penetration. And in order to generalize such a procedure to various objects, it is necessary to conduct the same on thousands of objects. This is millions of experiments. What is even worse: the experimental model, capable of processing only a few dozen examples, worked relatively acceptably on a 3-GHz Pentium. And if you expand its capabilities to master millions of examples, you will need a cluster of at least a thousand computers — and all to process a single video signal.
Let's get back to the present. What is the situation with video analytics today? The situation is simple: the needs exceed the possibilities. In recent years, the need for systems capable of replacing humans in analyzing video signals has grown and is recognized by most security practitioners. The main reason: the cheapness of video cameras and their ever-growing number. For example, in Britain, hundreds of thousands of cameras are installed within the framework of «city schemes». Together with video cameras installed in private facilities, the total number of video cameras available to the police now amounts to 4.2 million.At the same time, the threat of terrorism continues to grow, as does the threat of ordinary crime. And yet, despite the huge number of video cameras, they are of little help. Partly because they have saved on matches (lighting, lenses, cameras, transmission systems), but the final quality is determined by the single, worst element (remember – the speed of a squadron is determined by the slowest ship?). But to a large extent, the effectiveness of video surveillance systems is low because their use requires numerous highly qualified specialists who constantly maintain their level – video surveillance operators.
In fact, in the most advanced in terms of video surveillance, Great Britain, no more than half of all surveillance posts are provided with round-the-clock duty. At least some, not to mention specially trained operators. Of course, video recording is carried out. Video recordings of increasingly better quality are accumulating in increasing quantities. But what to do with them? The same British police have a standard of a man-week for a careful analysis of a tape with 24 hours of recording. Of course, such an analysis is done only «in case of something», that is, when the crime that has occurred became known outside the video surveillance system, that is, the system did not help to prevent or stop it.
Yes, video surveillance operators, especially experienced and specially trained ones, are very expensive. Most control rooms do not have 24-hour duty, and even when operators are present, there are one or two operators for hundreds of cameras. Why then do they continue to install more and more new video cameras? There are several reasons. The first is objective. The wise English rely on Moore's law and the development of technology and prepare systems for the time when they can be used effectively. After all, it is possible to deploy new computers at surveillance stations in a few days, but the installation of video cameras and communication networks takes years. By the way, this is why most city systems in England require that live video be delivered to the central site in high quality.
The second reason is subjective (if not selfish) – as in the widely known case when one Spanish mayor demanded that a contractor install a large video wall in a room designed for one operator, not hiding the fact that it was needed only to show off in front of the neighboring mayors (the contractor turned out to be a friend of the notorious Vlado Damjanovski, and this case received some publicity among experts).
So, the demand for automation of video signal analysis is huge. However, only very simple systems are actually offered, which can be classified as video analytics only through the efforts of the advertising departments of manufacturing companies. However, these systems (motion detectors), as a rule, are very difficult to configure, install and operate, and therefore are not used very often.
In addition, the peculiarity of our time is the transition to IP technologies for transmitting video signals. I will not go into technical details, but I will say the main thing: after any strong compression, it is too late to conduct analysis. Distortions and artifacts introduced into the signal by information compression algorithms are much more dangerous than any noise or natural phenomena, because they are not random at all, but occur exactly where and when something interesting happens. Of course, with a low degree of compression, especially if you use megapixel video cameras, the overall result is not so bad, but the natural solution for developers is to transfer the analysis «to the camera». There are many examples: dozens of famous and not so famous companies offer distributed analysis systems in one way or another, ensuring the selection and high-quality transmission of only particularly suspicious areas, as a rule, with subsequent additional analysis at the central station.
In essence, all these distributed systems are an attempt to implement on cheap small processors at least part of what they have recently learned to implement on powerful Pentiums and special DSP crystals.
Today, the most significant result is that the digital flow (quality) decreases without significant changes (similar to how the use of even primitive activity detectors in old multiplexers allowed for dynamic redistribution of recording quality). The most serious drawback is that although everyone claims that metadata transmission complies with MPEG-7 and MPEG-4 standards, in fact these are non-standardized extensions, and therefore such systems are not yet compatible with each other and exclude the use of third-party control systems or additional analysis of all video streams.
My beginning was somewhat gloomy and pessimistic, but in fact, technologies continue to develop, although not as rapidly as we wanted 10 years ago. What new trends have emerged in recent years?
Recording analysis. A number of companies are developing solutions (most are focusing on the long-suffering UK police) that allow automated analysis of recordings, with flexible criteria for detecting suspicious situations. Please note: in the case of recording analysis, of course, the quality of the video signal is much worse than live, but it is possible to repeatedly rewind the recording and select the optimal settings, which, alas, is impossible in real time. The search can be aimed at a wide variety of criteria. For example, if people linger for a long time in places known as drug trafficking sites, the relevant frames are transferred for further manual analysis. Of course, you can't do without manual (perhaps it would be more correct to say, «eye») viewing, but at least you don't have to look through kilometers (what is the correct term — megaframes, gigabytes) of recordings with your eyes.
Operator assistance. Approximately the same algorithms used in recording analysis, but in real time. A trained operator can interactively adjust them as the situation changes, as a result of which the integrated human-machine system (a kind of cyborg) works much better than one person and incomparably better than 10 computers. Such systems also provide algorithms not so much for video analysis, but simply for making life easier — for example, image improvement, noise removal, scale control, interactive control of short recording rollback, etc. Of the video analytics proper, especially popular (although I would not say that they work well) are means of operational tracking of a selected person. As a rule, these algorithms work according to very simple criteria — shirt color, continuity of movement, but in the future there are developments that additionally analyze the size, shape of a person, and his face. For example, a research project is underway by a group of European companies and universities, the purpose of which is to implement a comprehensive analysis — simple criteria work in simple conditions, and complex ones, including face comparison, are included if a person could, for example, change clothes. In addition to video analytics itself, integration with data from other sensors is developing (slowly), allowing for a comprehensive analysis of not only the image, but also the entire situation at the facility. I would like to emphasize that all of the systems mentioned are extremely expensive and complex, essentially experimental, and are used only at particularly important facilities such as airports or central government buildings, where special importance is combined with special complexity due to the huge number of completely legal visitors.
Moore's law is a legendary observation by Gordon Moore about the peculiarities of integrated circuit manufacturing technologies, made by him in the early 60s and became famous after generalizing to the statement that «the productivity of the average computer doubles every eighteen months.» Moore's Law recently celebrated its 40th anniversary, and in general terms it still holds true.