Diffbot, the "Visual Learning Robot," Helps Web Content Go Mobile

Sep 01, 2011

Diffbot is one of those applications (and companies) you probably are not even aware of when you use it, but that's not necessarily a problem for the company's co-founder and CEO Michael Tung.

That's because his product is a "visual learning robot," that hundreds of developers are using to translate web content into better mobile apps, and as such it stays pretty much under the hood.

"We've invented this visual ID algorithm," says Tung. One of our core insights is that the entire web can be classified down to 30 page types. There are product pages, event pages, news pages -- we can identify them visually with 99.999 accuracy."

Diffbot technology identifies each page's components, such as nav bars, footers, etc., as part of its identification process. Design standards are such that there is a high degree of similarity between the various page types grouped by category.

One customer using Diffbot at present is AOL's recently launched Editions, which is a personalized daily magazine for the tablet.

"Editions uses thousands of news sites," says Tung. "They send the urls to our servers, and our technology analyzes all of that content and extracts the headlines, bylines, and other key elements and assigns to a topic. When Diffbot analyzes the front page of Huffington Post, for example, it can identify what the top story is because it recognizes the features that define the top story there, and it knows what the topic is, also."

He continues, "it's not just a black box. It leverages all that human knowledge and the work of all those news editors out there."

A key factor driving Diffbot's growth is the problem mobile devices like the iPad have in interfacing with web content. "It's kind of a crappy experience," as Tung puts it. "And this is a huge opportunity for us. Because we extract that web data and make it easy for developers to use it in creating mobile apps."

For me, one of Diffbot's most surprising features is it works not only in English, but in some 250 languages, "because we leverage Wikipedia" (which publishes in all those languages.) "The ontology structure of data on Wikipedia allows us to use it as a training set" – that allows Diffbot to visually analyze a web page in any language by recognizing each page's tell-tale visual components.

The company, with a staff of five, is based in Palo Alto and was incubated by Stanford's StartX. Tung and his co-founder, Leith Abdulla, are Stanford grads. The company charges customers like AOL a licensing fee and a usage fee, and Tung says the startup is already profitable.

Diffbot, the "Visual Learning Robot," Helps Web Content Go Mobile

28 Fun Things to Do This Week (12.15.25)

Secret Recipe: Delfina Chef Craig Stoll's Award-Winning Latkes

Locals We Love: Rising Star Zoha Malik Competes on New Gordon Ramsay Show, 'Next Level Baker'

Where to Find the Best Matcha in Silicon Valley

This Year’s Must-Have Holiday Drink Just Landed at Costco (and it’s a Bay Area Exclusive)

Sixty years after they were stolen, the Asian Art Museum returns ancient bronze sculptures to Thailand.

The Ultimate Christmas Staycation in Downtown San Francisco

Shop Talk: Gap Returns to the Marina, Lan Jaenicke Drops a Gorgeous Collection + New Pop-Ups Arrive Downtown

30 Fun Things to Do This Week (12.08.25)

Secret Recipe: Delfina Chef Craig Stoll's Award-Winning Latkes

Locals We Love: Rising Star Zoha Malik Competes on New Gordon Ramsay Show, 'Next Level Baker'

Copenhagen Earns Its Good-Looking, Good-Living Reputation

The Ultimate Christmas Staycation in Downtown San Francisco

Chinatown’s Hungry Ghost Festival, the only one of its kind in the U.S., returns to San Francisco.

This Year’s Must-Have Holiday Drink Just Landed at Costco (and it’s a Bay Area Exclusive)

4 Ways to Celebrate the Holidays with Clase Azul México in the Bay Area