Diffbot launches AI-powered knowledge graph of 1 trillion facts about people, places, and things

August 31, 2018

If you’ve ever performed a Google search for a celebrity, a famous landmark, or a product before, you’ve likely encountered the infoboxes that sometimes sit to the right of the results page. They’re filled with information from Google’s Knowledge Graph, an entities database used to enhance search results on the web and in smart speakers like Google Home. Most of the Knowledge Graph‘s more than 1.6 billion facts are crowdsourced from human teams, who regularly comb through millions of websites for answers to common questions about people, places, and things.

But if you ask Mike Tung, there’s a better way to do it.

He’s the founder of Diffbot, a Mountain View, California-based startup whose mission is to convert the web’s unstructured data into structured data — or, as Tung put it, “extracting knowledge in an automated way from documents.” Diffbot is publicly launching this week after a years-long private pilot program.

“We’re trying to build the first comprehensive map of human knowledge … by analyzing every page on the internet,” Tung told VentureBeat in a phone interview.

Diffbot

It’s a lofty goal, but Diffbot, which grew out of Tung’s artificial intelligence (AI) work at Stanford, spent five years building the tools necessary to accomplish it. Leveraging a combination of computer vision and natural language processing, Diffbot’s web crawler can parse the layout and structure of virtually any webpage — about 90 percent of the web and 20 or so page types, Tung claims — for facts, figures, and abstract relationships between objects. (Typical examples include a product page on Amazon.com or an executive bio on a company’s webpage.)

“We call it knowledge-as-a-service,” Tung said. “Right now, 30 percent of a knowledge workers’ job is data gathering. There’s a big opportunity in the market for a horizontal knowledge graph — a database of information about people, businesses, and things.”

Data extracted by Diffbot’s crawler feeds into an enormous database called the Diffbot Knowledge Graph, or DKG, comprising more than a trillion facts and 10 billion entities. (Tung said it’s adding facts at a rate of 130 million per month.) Core categories include people (skills, employment history, education, social profile), companies, locations (mapping data, addresses, business types, zoning information), articles (every news article, dateline, byline from anywhere on the web, in any language), discussions (chats, social sharing, and conversations), and images (organized using image recognition and metadata collection).

Diffbot

All of this is accessible via API calls and manipulable with Diffbot DQL, the company’s custom query syntax. Clients can view results from the DKG in a list, map, or table layout in Diffbot’s web-based UI, or from within third-party content management systems or analytics platforms.

Among those clients are Microsoft, eBay, Yandex, and DuckDuckGo, which are using it to enhance the quality of their search results. Other customers include Cisco, Salesforce, Crunchbase, Hubspot, Adobe, Instapaper, and Onswipe.

“Simply put, Diffbot is using the power of AI on a scale we’ve never seen before,” said Aydin Senkut, founder and managing director of Felicis Ventures, one of Diffbot’s investors. “It’s the first profitable AI company on record; they are the ‘secret ingredient’ powering applications from many of the largest companies in tech.”

In a demo, Tung showed me how it worked. Say you wanted to perform a one-off search for a brand of shoe. In Diffbot’s web dashboard, you’d type the sneaker brand into a Google-like search bar and hit enter; within milliseconds you’d get a product profile synthesized from sources around the web.

Diffbot

Looking for news articles instead? Same process: Typing in an author’s name yields every article they’ve ever published online (across languages, too). Searching for a person, on the other hand, pulls up a CV-like work history pieced together from dozens (or hundreds) of bios, articles, and publicly available profiles.

One of Diffbot’s unique strengths is its ability to quickly drill down by entity, Tung explained. It’s helpful in tasks like job recruitment — the appropriate DQL string (e.g., “type:Person employments.employer.name:’Diffbot’”) can collate every employee at a given company, along with their job title, skills, educational background, and social media profiles all in one place.

“This is the holy grail of machine learning — capturing all the world’s knowledge in one place,” Tung said.

Google’s Knowledge Graph has historically faced criticism for lacking attribution and omitting sources of conflicting information, but Tung said that Diffbot’s automated approach kills two birds with one stone. Not only is Diffbot more comprehensive than manually curated databases like Google’s Knowledge Graph, but it’s more accurate, too — Diffbot’s crawler regularly refreshes the DKG with new information and its machine learning algorithms are smart enough to pass over sites with histories of producing “logically inconsistent” facts.

“That’s one of the reasons why we fuse information together from different sources,” Tung said. “Our scale is such that there’s minimal potential for errors. We’d bet the business on it.”

Diffbot launched in 2008 and counts 28 employees among its core staff of engineers and data scientists. It previously raised $10 million in a funding round led by VC Tencent, Felicis Ventures, and Amplify Ventures.

Walmart-backed fintech One introduces buy now, pay later as it prepares…

Credit Bureaus Still Can’t Figure out BNPL

Spotify turns up volume to make record profits

Tesla shares jump 13% after Musk says company aims to start…

Why PayPal’s Cross-Border Stablecoin Solution Should Be Bigger News

Stock market today: Tesla surges 12%, stocks go nowhere amid earnings…

Ripple CEO predicts crypto market will double in size to $5…

Bitcoin Rises Post-Halving, Ethereum Faces a 5% Volume Dip at $3,000,…

Hong Kong Bitcoin and Ether ETFs officially approved to start trading…

Stock market today: S&P 500 snaps 6-day losing streak ahead of…

Here’s what to know before withdrawing funds from inherited individual retirement…

How much money Americans in their 40s saved in 2023—and 3…

With money running out, financial expert breaks down if your Social…

Biden issues new rule to crack down on bad retirement advice

A retirement expense of $413,000 you’ll need to be prepared for

An ultralow-concentration electrolyte for lithium-ion batteries

Ion thermoelectric conversion devices for near room temperature

The spam came from inside the house: How a smart TV…

Former NASA engineer claims he invented a ground-breaking thruster that doesn’t…

Gurman: Apple Working on On-Device LLM for Generative AI Features

Diffbot launches AI-powered knowledge graph of 1 trillion facts about people, places, and things

Most Viewed

Bitcoin and Ethereum Fall Substantially in $18 Billion Crypto Market Wipeout

Stocks steady after early plunge on trade war fears

Nasdaq 100 futures slip after major averages incur back-to-back losses

Trending Now

Here’s what to know before withdrawing funds from inherited individual retirement accounts

How much money Americans in their 40s saved in 2023—and 3 strategies to save...

With money running out, financial expert breaks down if your Social Security benefit is...

Diffbot launches AI-powered knowledge graph of 1 trillion facts about people, places, and things

RELATED ARTICLES

Most Viewed

Trending Now