Google’s Knowledge-Based Trust Project, pt. 1

Google’s Knowledge-Based Trust Project, pt. 1

Last month, Google made serious waves throughout the SEO community with the publishing of a paper entitled “Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources” (available for free– link on the right side of the page). The gist of the paper was that Google engineers were attempting to create a way to algorithmically judge the accuracy of statements on a web page (or an entire website– I’ll get to that later).

This will be the first in a planned series on the Knowledge-Based Trust (KBT) paper and the possibility of its implementation into Google’s ranking algorithm. This article will take a look at the big picture– what is the overall goal, what challenges does Google face in achieving it, and what are the worries associated with such a project. Later articles will explore the KBT paper and the preceding Knowledge Vault (KV) paper. The KV paper outlines Google’s procedure for creating an algorithmically populated knowledge base, which a KBT system would refer to when evaluating the accuracy of published content.


So, what is the big picture?

The paper is motivated by the following pair of observations:

Quality assessment for web sources is of tremendous importance in web search. It has been traditionally evaluated using exogenous signals such as hyperlinks and browsing history. However, such signals mostly capture how popular a webpage is.

Google’s engineers correctly identify a disparity between the goal of providing quality content to users, and the methods presently used to judge quality. The current method of quality assessment weigh the popularity of a page pretty heavily: backlinks and traffic numbers mean a lot in terms of ranking.

Google (and other search engines) include other factors in their assessment that are designed to identify some dimensions of quality. Page speed, ease of navigation, and mobile usability all factor into a page’s ranking. Notice that these are assessments of the quality of a page or site, as opposed to the quality of the content on the page or site. This is a serious shortcoming when Google’s aim is to help users to find quality content. A way to address this is needed.


One way to address this is to evaluate the accuracy of the information on a page. If Google can rank pages at least in part according to the accuracy of the content, they can deliver better quality content to users. So far Google has written about two components of their plan to assess content accuracy:

The Knowledge Vault (KV), an algorithmically-created and -maintained knowledgebase, much larger than any previously existing public knowledgebase. It collects what are called “triples”, which are arrangements of subject, object, and predicate.

The Knowledge-Based Trust (KBT) evaluation methodology, which checks triples on a website against the triples in the KB. Once the triples on a page have been extracted and evaluated, the page is rated according to the probability that some piece of information on it is true. This could be scaled to an entire website, in theory.

By constructing a massive knowledgebase and then checking website content against it, Google can assess the accuracy of information on a website. By adding this to their ranking factors, Google can provide users with a more complete evaluation of a website’s quality. In short, Google will be able to add “accuracy of data” as a ranking factor.

Join me in another two weeks for the next part of my KBT series: Understanding the Knowledge Vault. I’ll be diving into the Knowledge Vault paper and explaining Google’s methodology for generating the knowledgebase for Knowledge-Based Trust evaluation.

No Comments

Sorry, the comment form is closed at this time.