CCTV cameras are ubiquitous in most major cities, primarily used for forensic analysis of events. In other words, they work after the fact because the laborious task of monitoring thousands of cameras in real time is far too expensive, and raises questions about privacy and surveillance. Where they prove useful is in post-event reconstruction of what may have led up to and caused an event, such as a traffic accident or crime.
The industry has been trying for several decades to tackle the real-time problems of CCTV using software and artificial intelligence (AI) with limited success. Video analytics can be programmed to recognize specific objects or sequences, but this has a narrow range of application. For instance, in cases where a known object presents itself with little variation, a camera could read a bar code, a license plate, or even recognize faces. The problem in most urban environments is that the scenes captured by CCTV are too complex and the behaviors too unpredictable.
This seems like an intractable problem as long as you approach it from the perspective of positive recognition of a known pattern. However, there is a different, more promising approach using machine learning. Advanced video analytics software learns what is “normal” in a scene, and reacts to unknown “dwell-motion patterns” (think of someone moving in a no-go zone) or “density anomalies” (think of a piece of luggage abandoned in a metro station).
Take perimeter security, which is a normal use for CCTV cameras. Each camera has a fixed view of a known scene in which the same things happen in repetitive patterns. However, imagine that an intruder enters the view of the camera at an unusual time (at night) and there is the sound of breaking glass. Both of these pieces of data appear to the analytics program as anomalies. They stand out from the normal pattern. At this point, the program doesn’t know what it is, i.e., an intruder, but it can alert a human to view the anomalous footage.
It may turn out to have been the neighborhood raccoon knocking a bottle off a dumpster at the back of the building. The security personnel can flag this as a “non-alertable” anomaly, which the analytics program learns to ignore. Or, they can immediately recognize it as a break-and-enter situation, and alert the police or security personnel to investigate.
So, what has the video analytics software actually accomplished, since it still took a human being to do the real work? It has taken over the burden of actively monitoring hundreds or thousands of CCTV cameras around the city, and dramatically reduced the amount of video that city personnel have to look at. It has turned CCTV cameras into sensors.
It has also taught itself. Similar approaches have been tried, but they always relied on human operators to teach the software what should be considered normal, which is labor-intensive. In this case, the video analytics uses machine learning, reviewing thousands of hours of logged video from each camera to construct a model of what is normal. The only thing it needs to be taught is what anomalies to ignore — and it gets smarter over time.
The positive recognition approach of earlier analytics software isn’t incompatible with the anomaly approach. In fact, at this point, there are very large libraries of known objects, such as license plates, and new algorithms are being developed every day. A good video analytics solution should have the ability to import algorithm libraries and use pipelining logic to link them together, alongside anomaly detection.
With multi-access edge computing — putting the processing resources closer to groups or constellations of CCTV cameras — a city’s network doesn’t have to be burdened with hauling terabits of streaming video across the core of the municipal network. Only anomalous footage has to be transmitted for viewing by the relevant personnel; the normal footage can be stored locally and uploaded in off hours to the city data center.
The best video analytics solutions should also be able to integrate any other kind of data into the mix. The software should be able to use the audio data captured by the cameras to construct an “audio normal” model, but also be able to use data from other municipal systems, such as access control systems.
Cities are keen to adopt IoT into their smart city infrastructure, but while sensors are not expensive, installing them and connecting them to the network is a major undertaking. This is what makes the use of video analytics so appealing. It takes CCTV, an already established piece of infrastructure, and turns it into a very valuable sensor network. Think of it as a rare instance of new technology reversing obsolescence, instead of creating it.
Peter Buccellato, IoT business development lead, is part of Nokia’s Enterprise group, concentrating on analytics and IoT solutions for enterprises, government entities and telecom service providers. His background includes extensive experience in sales engineering, professional services, product management, sales, and education of complex software solutions.