Conceptual Image Recognition for End Users

11 Jan 2020

Conceptual Image Recognition for End Users

Over the last few years, cloud photo services have rightly gotten a lot of attention for expanding their use of facial recognition to the ordinary consumer. For example, in Google Photos, I can now see a menu of hundreds of people who have appeared in my photos over the years, and click on one example of that face to see all the other places it appears in my collection. That is a neat trick with a lot of privacy implications, but I have lately taken interest in a more generic but arguably powerful feature: the automated tagging of almost everything else (objects, qualities, features) in the image.

For instance, I can search my Google Photos for “green building,” and get hits like this from my personal photo archive:

A Google photo search for 'green building'

It’s not perfect, since the system is obviously picking up on “green” in the landscapes too, but at least some of those photos do also contain green buildings, or buildings with green in them.

Most cloud sharing features have this service, but the sophistication varies. Google’s is the best I have seen. I can often get results when I search for any noun with an adjective related to state (e.g., “sleeping person”) or (“running animal”), and have a decent chance of getting something useful. On Dropbox, I can search for simple nouns (e.g., “person,” “tree,” “rock,” “car”) and get something back, but modifiers don’t seem to have much of an effect. On Flickr, a photo sharing service for photography buffs, within my own collection, I was surprised to find that only very basic nouns return any results.¹ But as the technology proliferates, it is bound to get better across all these platforms.

I like this feature because it makes my photos a lot more useful. But it also has commercial implications which are likely to shake out. One way to think about what these consumer-grade image recognition features are doing is allowing the end-user to turn their photo collection into a personal stock image gallery. And stock images are big business. As an article from the Seattle Times reports, over half of the revenue taken in by photography juggernaut Getty Images comes from selling stock photos.

Now, a search for “green building” on the Getty Image exceeds my own personal collection by a lot. Getty is probably using a combination of manual and automated tagging, with a high level of conceptual sophistication (goes “green” refer to the color or its environmental qualities), to return their results. But it’s not that much better than what I, someone with a modest but high-quality photo collection of a decade or more, can generate.

If I’m an organization that has its own modest collection of in-house photos, there is suddenly a lot more in those photos that is useful to me, because image recognition gives me a set of generic tags that I can use to “find” things. I know I will be watching this feature closely in this and future years.

This depends on the breadth of subject matter in your photos, obviously. The results are better when you search the entire platform, but seem to privilege results that are hand-tagged by the users themselves. Hand-tagging is a different work-to-reward proposition.[↩]