As Google continues to get smarter with its algorithm, its latest predictive method in identifying duplicate content within the site has got more challenging. Google is now seeing URLs with similar pattern URL structure as possible duplicate content, which means if you own a website with multiple similar URL patterns, Google may deem the content as duplicate. Even if each of the content is uniquely written, URLs that are similar, it is best to revisit those pages and optimize the URL structure properly for Google to understand the content better, so it will not be left out when indexing.

In a recent SEO hangout on March 5th, Google’s John Mueller was asked by a site owner, Ruchit Patel, about why 1000s of his URLs were not being indexed correctly. 

John Mueller responded, “What tends to happen on our side is we have multiple levels of trying to understand when there is duplicate content on a site. And one is when we look at the page’s content directly and we kind of see, well, this page has this content, this page has different content, we should treat them as separate pages.

The other thing is kind of a broader predictive approach that we have where we look at the URL structure of a website where we see, well, in the past, when we’ve looked at URLs that look like this, we’ve seen they have the same content as URLs like this. And then we’ll essentially learn that pattern and say, URLs that look like this are the same as URLs that look like this.”

Mueller explained that if Google does not crawl or index a URL because it thinks a page is a duplicate version of another page based on a similar URL. And in January, Mueller also brought up that duplicate content (to some degree) will not affect your search ranking as Google understands that some pages will have a certain amount of duplicate content especially sites like eCommerce, and Google’s algorithm is built to handle a certain situation like this. Do not feel alarmed that if Google did not crawl or index some of your webpages, it is penalizing you for duplicate content but rather being skipped for now until it understands the content better. 

