INTEL HAD TWO video recognition demos at a pre-IDF event Sunday, one on low power remote facial recognition, the other called Oasis. Neither one did anything really new, it was just a matter of how things were done instead of the end result.
Yes, it is a banana
Oasis has a simple premise, put an object in it’s field of view, the software recognizes it, and contextual info is projected on the surface near the object. The demo had a mock kitchen with food items being recognized and recipes projected near them. You can also use your hands to interact with projected items making the entire setup fairly interactive.
The Oasis setup, laptop, camera and projector
The idea is simple, take a small projector and a depth camera (RGB-D) and hook them up to a laptop. The laptop reads the camera, does object recognition, and projects the desired information. Because the camera reads depth as well as images, it is aware of taps and ‘clicks’ as well as when your hand touches the surface. It isn’t magical, but the technology is there to do what is needed for this level of interactivity.
Moving to the facial recognition demo, Intel was showing off how to recognize a face using the lowest possible power on the device, or do more than the device has the CPU power to achieve. This is useful for not only laptop battery life, but also for simple useability.
You know what facial recognition is, you show it a picture of Bob, and if all goes well, you get a result like, ‘This is a picture of Bob’. This simple problem that humans are very good at is a real pain for computers, taking tons and tons of CPU time with limited success. What Intel was showing has two main advances, parsing off the workload over a network, and deciding when to parse off that work.
The device that has the camera does a little initial work, it does face detection, and then culls the face out of the picture or video. If the device has enough CPU power to do the recognition, it does. If it doesn’t, then it sends the cropped image out over the net to a server which IDs the person, and sends back ‘Bob’. But was not yet programmed to say that Bob is, necessarily, your uncle.
A low end netbook could possibly do a frame or two a second, but not a 60FPS video. The netbook can do a frame or two a second, and then send the rest out to the remote server where the remainder is done, with both results recombined on the netbook. This allows low end hardware to do much more than they should be capable of in realtime.
The flip side of this is of course battery life. If you are scanning a crowd on your phone, the last thing you want is to suck your battery dry before you get a result. With a little intelligence, the phone could weigh the cost of sending the pictures over the net against the cost of doing it locally.
If that cost is too high, either in watts used or CPU time allocated, then the software can farm the image processing out. In it’s purest form, Intel is giving the devices the data that lets them make intelligent decisions on where to process the information. Once again, nothing earth shattering, but it could be useful if done right.S|A