Friday, September 19, 2008

4CIF - resolution or image size?

In a previous posting I commented on how people claim to be able to sell a 4CIF solution that uses the same amount of storage as someone else's 2CIF. This is only true if you are using an equal bitrate because of some differentiated compression technique, but if the technique is basically the same, say conventional MPEG-4 Part 2, then it's usually a price war that is forcing the low-bidders to cram video into an abnormally thin pipe, and end up with an awful picture caused by over-compression.

But there's something else I'd like to say. It's not life threatening but it sadly reflects a lack of understanding in our industry.

There is a way to improve your video (assuming you let the bitrate rise with it of course) - move from CIF, to 2CIF or even 4CIF. But what is CIF? It is the name given to the number of horizontal and vertical pixels (picture elements) in an image. For the purposes of this posting I'll stick with NTSC, which has 480 horizontal lines from top to bottom.

A CIF image is 352 pixels across by 240 down. 2CIF has double the information going across, i.e. 704 x 240, and this is useful because the human eye is more interested in left-right activity than up-down - so more information the better. 4CIF is as good as NTSC gets, a full 704 x480.

What we have described here are a number of different image sizes, exactly like computer monitors used to be IBM's 1987 VGA (640x480) or SVGA (800x600) right up to QXGA (2048x1536). Notice I haven't used the word resolution yet. But isn't VGA a resolution? No, it's an image size.

We can better define resolution as pixels per inch (ppi), just like printer resolutions are often measured in dots per inch (dpi). And it is this tie in to distances in the real world, the monitor, that is of fundamental importance.

If we had a monitor that measures 704x480 and we put it into quad mode, then a CIF image in one quarter of the screen will look identical to a 4CIF image right next to it in another corner. The only difference is that the 4CIF image consumes up to 4 times the bandwidth to carry the detail we cannot see until we digitally zoom in (typically on recorded video). So now imagine standing in front of a PC video management system, showing 16 cameras in a 4x4 array. If the monitor is a modest 1024x768, and even if all the screen was used to show video (which is not the case), then each image has 256x192, which means you will not see any difference between CIF and 4CIF until you make one camera window much larger.

So, if you have a fixed size camera window, then as you increase your image size (CIF, 2CIF etc.) your resolution increases (there are more pixels per inch on the screen) and clarity increases, so there is clearly a close relationship. However if you increase your image size (CIF, 2CIF etc.) but you also increase the size of the camera window, your resolution will not change and the clarity stays exactly the same.

Resolution is influenced by image size, but not only by image size. They are related but not the same thing. It is the same lack of understanding that causes people to be clueless when scanning in 5x7 photos at extremely high resolutions (say 8MB per photo), and being confused as to why it looks identical on a computer monitor as a 100kB version of the same photo. It is because computer monitors are generally limited to resolutions of about 72-96 ppi so anything higher is simply not visible. Another good example is home digital cameras which are now in the 8-10MP range, yet, unless you zoom in or print out at poster-size, the image actually looks identical on a PC monitor as a humble 3MP camera. All you're doing is taking up more hard drive.

There you go - 4CIF is an image size, not a resolution.

Saturday, September 6, 2008

Intelligent video analysis at the edge

In Sam Pfeifle's recent Security Systems News article (http://www.securitysystemsnews.com/index.php?p=article&id=ss2008096JPzLB) "Milestone aggregates analytics" he quotes Milestone Systems' Chief Marketing and Sales Officer, Eric Fullerton:

"It's a myth," he [Eric] said, referring to the trend toward putting analytics on the edge, on cameras or encoders, "to think that you're going to reduce what you send back over the Ethernet. You're going to need to have the full recording." But, he said, those analytics are great for creating metadata and tagging video as it's streamed back.

Eric is spot on. People are going to stream video to a centralized recording system, regardless of whether it is an NVR or Direct-to-iSCSI, in which case there is no way you are going to send alarm video only. The only exception to this is if you're Recording at the Edge, as in the case of Bosch's encoders with embedded storage. In this scenario video is never transmitted across the network to be recorded, so analytics at the edge does not affect this either way. If you consider a telecoms operator with 5,000 cell towers, with the need for analytics and recording at each location, then analytics at the edge becomes very attractive, with the benefit of the edge sending the actual alarm to some centralized point for processing.

Eric is also completely right about the critical value of metadata, which is information that adds value to the video. And edge devices have the responsibility of creating and sending this metadata, which is precisely what the Bosch encoders and IP cameras do. Data like 'A red object stayed in this area of the image for 3s, then moved to this other area for 30s'. The secret behind unlocking the power of this metadata is not only in aggregating them to provide better false alarms, but also in being able to do forensic searches. Such searches are always done on recorded video, and by definition are done after the fact. They allow you to mine months of video searching for whatever you want, even if you never thought of setting up the rule months ago. For example, 'tell me everytime someone parked in front of the main gate in the last 4 weeks'. Bosch has developed such a tool, called Forensic Search which is a licensable part of Archive Player. I personally find this feature to be invaluable because it allows me to record a day of video from a real camera with real and typical events I want to detect, and then to test whether my rules work. I get instant results because I don't have to wait for the events to happen in real time.

I agree that metadata is critical, which is why all Bosch devices stream copious amounts of it, because, more often than not, you don't know what you're looking for until it's happened and then you're busy looking for a needle in a haystack because your rules weren't set up before.

Finally a comment on pricing, or as Milestone's Channel Marketing Manager Mark Wilson alludes to, value. Yes, prices will continue to drop while accuracy increases. For unit costs to drop sustainably I suspect the number of deployments will either have to rise or the cost of sale will have to drop. It's these reality dynamics that have brought Bosch to the point of delivering embedded intelligent video analysis in all our encoders and IP cameras, with a functionality set and price point that is targeted at the mass market - the everyday applications like watching for people loitering, smoking on fire exits, leaving shopping carts or boxes by doors, people but not small animals crossing a 5 mile perimeter fence line or cars parking next to it.

Although I agree analytics could live anywhere, there is a strong argument for keeping 'everyday analytics' at the edge, including metadata generation, bandwidth minimization via recording at the edge and the sheer economies of scale of dividing the workload among many tiny edge devices, not to mention the elimination of the single central point of failure.