All Collections
General information
Video tutorials
Tutorial: How to use the OnCrawl near duplicate detector
Tutorial: How to use the OnCrawl near duplicate detector
Francois Goube avatar
Written by Francois Goube
Updated over a week ago

This feature has evolved
and this tutorial is currently being updated.

This article will soon be replaced with updated content on detecting and resolving duplicate and near-duplicate content in OnCrawl.

Video transcription

Hello SEO folks. I'm François. I want to drive you through our Duplicate Content tab.

With OnCrawl, we can identify any type of duplication issue. Either it's on your title tags, your description tags, your H1 tags, or even between your body tags in the content that is displayed on your page to your users.

You can click on every metric.

Let's click on this one: pages with near duplicates.

You have access by clicking on this to an exhaustive list of your pages with near duplicates.

You can add some columns, such as the content similarity ratio. You can order this list. You can even add a filter. So, ok, give me all pages that have more than 80% similarity.  There we go.

If we order this, you can see that we only have pages with more than 80% similarity ratio.

Let's go back to the Duplicate Content tab.

We also aggregate your pages by type of duplication issues, so that it's easy for you to prioritize your work on this kind of SEO issue.

We have the exhaustive view of all your pages, the ones with no duplicated content, and the ones with duplicated content issues. We also group your pages by kind of near duplicated content issues and by page group so that you can see where your near duplicates are located.

And then we have something we are very proud of: we have a representation of all duplicated pages clusters. I mean that we have grouped pages by very similar content.

So each of these squares represents similar content and the number on the square represents the number of pages in the group.

Here you can see at the bottom right that I have six pages dealing with the same content or very similar content and there are six of them here.

And we give you something very cool: this is information about how you manage your canonicals within your website.

We are telling you either if there is a canonical set within your pages, and we are telling you something very important, which is if the canonical within a group is not pointing to the same URL.

I mean: for these three pages here, they don't have the same canonical. That's an issue because it looks like these three pages are dealing with very similar content, but they are not pointing their canonical to the same page.

And we'll see later that this is very impactful on your crawl frequency. So you better be aware of that.

And if your canonicals are matching the same URLs within a group, you have no problem.

You can filter these datasets by number of pages within the groups or by similarity ratio, so it is very actionable.

You can click on any square to access the list of pages with near duplicates.

Talking about actionable data, we also give you the view by page depth and by path so that you can jump into the part of your website that has a lot of near duplicates.

I hope you will enjoy this tool, and feel free to give us your feedback, we would really appreciate that. Feel free to send your questions on twitter at @oncrawl (or use the blue Intercom chat button at the bottom right of your screen).

Thanks for listening, guys, and happy crawling!

Did this answer your question?