Diffbot is one of those applications (and companies) you probably are not even aware of when you use it, but that's not necessarily a problem for the company's co-founder and CEO Michael Tung.
That's because his product is a "visual learning robot," that hundreds of developers are using to translate web content into better mobile apps, and as such it stays pretty much under the hood.
"We've invented this visual ID algorithm," says Tung. One of our core insights is that the entire web can be classified down to 30 page types. There are product pages, event pages, news pages -- we can identify them visually with 99.999 accuracy."
Diffbot technology identifies each page's components, such as nav bars, footers, etc., as part of its identification process. Design standards are such that there is a high degree of similarity between the various page types grouped by category.
One customer using Diffbot at present is AOL's recently launched Editions, which is a personalized daily magazine for the tablet.
"Editions uses thousands of news sites," says Tung. "They send the urls to our servers, and our technology analyzes all of that content and extracts the headlines, bylines, and other key elements and assigns to a topic. When Diffbot analyzes the front page of Huffington Post, for example, it can identify what the top story is because it recognizes the features that define the top story there, and it knows what the topic is, also."
He continues, "it's not just a black box. It leverages all that human knowledge and the work of all those news editors out there."
A key factor driving Diffbot's growth is the problem mobile devices like the iPad have in interfacing with web content. "It's kind of a crappy experience," as Tung puts it. "And this is a huge opportunity for us. Because we extract that web data and make it easy for developers to use it in creating mobile apps."
For me, one of Diffbot's most surprising features is it works not only in English, but in some 250 languages, "because we leverage Wikipedia" (which publishes in all those languages.) "The ontology structure of data on Wikipedia allows us to use it as a training set" – that allows Diffbot to visually analyze a web page in any language by recognizing each page's tell-tale visual components.
The company, with a staff of five, is based in Palo Alto and was incubated by Stanford's StartX. Tung and his co-founder, Leith Abdulla, are Stanford grads. The company charges customers like AOL a licensing fee and a usage fee, and Tung says the startup is already profitable.