There’s an interesting article over at the Economist.com about semantic video analysis – using computers to try to recognise what a picture or video is about.
To a microprocessor, a photograph of James Bond might as well depict a cat in a tree. That can make tracking down a video on the web or searching through a film archive a painstaking task, unless someone has written a full and accurate description of each item being examined. Anyone who has tried to find a clip on YouTube will know how rare that is.
Well, researchers from Queen Mary (University of London) have made some progress. They say their computer programme can now tell the difference between water and a human being and can sometimes identify more complex images such as a person lying on a beach. It works by a similarity algorithm: the programme has to be input with many tagged images of water and human skin. It then looks for similarities in colour and shape.
What I think is really interesting is how they used an evolutionary algorithm. I blogged about these a week ago and they seem to be popping up everywhere.
Once the computer has identified the colours, textures, colour-distributions and horizontal lines in the groups with the most blocks, those blocks are subjected to a mathematical algorithm called the Pareto Archived Evolution Strategy. This uses the principles of evolutionary biology (generating a lot of slightly different variations, selecting the best among them, and then using that to generate another set of variations, and so on) to reach what is, if all has gone well, the right answer.
In other words, the computer tries to determine the rules which would allow you to most accurately detect whatever you want (and presumably with as few false positives as possible). This is useful – for example with water, the colour and texture could allow you to determine it was water. The shape wouldn’t help at all. Similarly, the size and shape would allow you to detect a mobile phone but the colour would be no use.
The programme also tries to look at the context of the image – objects which routinely appear together. It’s an interesting piece of work. I really wish there was a way of automatically tagging photos of everybody on Facebook through face detection.