Tuesday, May 11, 2010

Visual search: because text search is so nineties


Imagine you are walking down the street when you see a stunning red sportscar flash by. It’s a new model – you’ve never seen it before in your life and have no idea what brand it is. But it’s so drop-dead gorgeous you just have to find out what it is. So you whip out your mobile phone and take a picture. The picture gets sent to a search engine, which recognises the car and gives you pictures, videos and information on the latest model, including pricing and colour options. If you think this sounds like a far-fetched vision of the future then think again: this sort of technology is already here.


Systems that recognise faces, vehicle makes, models, colours and number plates are already deployed around the world. Image and object recognition software is not new, having been developed in the mid-1960s. However, it was only in the late 1990s that facial recognition software became popular, being used in airports and banks. On the other hand, automatic number plate recognition (ANPR) technology has been used since the late 1970s, especially in Britain, the USA and several European countries like Germany. But only in the last few years has the technology gone mainstream and been applied to everyday objects and people.

Things started heating up in August 2006 when Google acquired Neven Vision, a company specialising in face and object recognition. Then in December 2009 Google unveiled Google Goggles, a feature that allows people to perform an Internet search simply by taking and submitting a photograph of an object. Through its database of billions of objects, Google Goggles can recognise album covers, books, artwork, landmarks, logos, businesses, products, barcodes and text. Taking a photo of a can of beer is now the same as typing in ‘Castle Lager’. At the moment Google Goggles is not great at recognising animals, plants, cars and furniture, but these are still early days. And although it can recognise faces, Google has not yet implemented this feature because of threats to privacy. However, this feature will be added once privacy concerns are out of the way.

Another great feature of Google Goggles is that it can automatically translate text. You simply snap a picture of the text (using your Android phone), which is then automatically recognised and translated. At the moment only Latin languages (like German, French Italian and Spanish) can be translated but non-Latin languages, like Chinese, Arabic and Hindi, will follow.

Google has upped the ante by acquiring visual search engine company Plink, in April this year. With the addition of Plink, Google Goggles will enable people to identify paintings and artworks with their phones. And in September 2009, Google bought reCAPTCHA, a start-up focusing on converting optical characters into digital ones. The company was bought mainly to help Google accurately scan text for Google Books, but can also be used for its many other optical recognition ventures (such as Picasa).

Google is leading the way when it comes to visual search, from Google Goggles to the Picasa digital image organiser that can recognise faces and search for only those people. However, Apple is fighting back in a big way. In August 2009 a new augmented reality service appeared on Apple’s iPhone. Using Yelp business review software allows users to activate a feature called ‘the Monocle’, which uses the phone’s GPS and compass to display markers for restaurants, bars and others businesses. The information is displayed over the camera’s image. But that’s not all. Apple has taken out several patents for recognising objects through visual inputs, RFID (Radio Frequency ID) reader or GPS readings. Like Google, the company is also developing facial recognition software. In fact, it is already in use on Apple’s iPhoto photo organisation software, which incorporates a system that allows people to tag their friends in photos, and search for their friend’s faces using the Spotlight feature. Other companies are cottoning on as well – for example, Sony’s Picture Motion Browser (PMB) scans and analyses photos by counting the number of people in each picture and detects identical faces so they can be tagged accordingly.

Soon, it will be possible to take a picture of any object or person around you and have a search engine give you all the information you want about those objects and people. This is part of the rise of augmented reality, which adds things like graphics and sounds to the natural world. Imagine taking a picture of the Eiffel tower and your smartphone telling you things like the history of the tower, the admission price for the day, the temperature and wind at the top of the tower and so on. Visual search is just the beginning of augmented reality since accurate computer vision is essential for augmented reality to work. But that's another story.

No comments:

Post a Comment