Duplicate content leads to SEO issues that can hurt your rankings. Duplicate content refers to content that appears in more than one place, whether on a website or on multiple websites. In fact, duplicate content causes trouble for crawlers since it is impossible to tell which URL is the most relevant for a given query. Taking UX into consideration, search engines will often not display multiple pages for the same query and are forced to choose the one likely to be the best. This leads to an important loss of relevant results on search engine results and therefore to a loss of traffic. Duplicate content can lead to three main issues:
- confusion between versions to index
- trouble to direct the link metrics (authority, trust, anchor text, link juice) to the right page or share it between different versions
- inability to rank the right version for queries
With OnCrawl, you can easily find your groups of duplicate pages and near duplicates. You will also be able to see if the canonicals strategy you have put in place is able to manage your duplicate content, or if there are still problems. We split your problematic duplicate content based on whether there are multiple canonical URLs for the cluster, or whether canonical URLs are simply not set.
You can filter your clusters by number of pages and also by content similarity. Read more about this graph here.
By clicking on a specific cluster, you will access further details about the URLs in this cluster.
Also you can examine what type of content is duplicated:
Here, you can see that 191 pages have a duplicate title. Click on the card to see the pages with duplicate titles.
However, there are different types of duplicate content. Some of them will hurt your rankings whereas others are harmless. Let’s focus on the ones penalizing your SEO.
What are the best practices ?
In order to avoid those duplicate issues there are some best practices you can follow. Much of the time, a content which is found in different URLs should be canonicalized. It can be done by using 301 redirects, rel=canonical or parameter handling tools in Google Webmaster Central.
301 redirect is in most cases the most relevant solution and especially for URLs issues. It tells search engines which version of the pages is the original and links the duplicate one to the primary one. Moreover, when multiple well ranked pages are linked to a single one, they are not competitors anymore and create a stronger relevancy and popularity signal. Those pages are thus better ranked.
Rel=canonical works slightly the same way as 301 redirect except it is easier to implement. It can be used for copied pieces of content from other websites. It will tell search engines that you know the article copied has been intentionally placed on your website and that all the weight of that page should pass to the original one. If you need further details about how rel=canonical works, we previously wrote an article on that subject.
This combined tags is useful for pages which should not appear in search engine’s index. Bots can crawl the pages but will not index them.
Google Webmaster Tool offers different services. One of them is to set a preferred domain for your site and handle URL parameters differently. However, this just applies to Google. Your changes will not be taken into account for Bing or other search engine settings.
Further methods which can be implemented
This is a very basic setting that should be implement on every site. It just tells search engines whether a site should be displayed with the www or not in the search engine result pages.
Be careful when internally linking. If you decide that the canonical version of a website is www.mywebsite.com/, then all the internal links should go to http://www.mywebsite.com/website.html and not to http://mywebsite.com/page.html
When regrouping content, be sure to add a link back to the original one.
Write unique product descriptions
It might take more time, but if you write your own descriptions instead of taking the manufacturer ones, it might help you to rank above those other sites with duplicated descriptions.
How to improve your content and avoid duplicate content issues?
Here are the main situations where duplicate content happen. This is what you should avoid:
Parameters like click tracking or analytics code can lead to duplicate content issues. Actually, similar URLs pointing to identical pages will have problems. Google regards www, non-www, .com, com/index.html, http or https as different pages even if they are the same. It is thus seen as duplicate content.
Copied or syndicated information
If you want to share an article, a quote or a comment of someone you worship or just to illustrate your articles, it will be seen as duplicate content, even if you have linked back to its website or URL. Indeed, Google will poorly value this pieces of content and it will certainly lead to an overall domain score quality drop.
Duplicate product information
If you own an ecommerce website, you have probably met this problem. It occurs when you use manufacturers’ item descriptions hosted on their websites to describe your products. The problem is that these manufacturers may sell this product to many different sellers and thus the description is appearing on many different websites. This is just pure duplicate content.
Sorting and multi-pages lists
An ecommerce website like Amazon offers filter options that generate unique URLs. It has a large number of product pages in most categories which can change orders depending on how the list is ordered. For example, if you range 30 items by price or by alphabetical order, you will end up with two pages with the same content but with different URLs.
For any questions about duplicate content, feel free to drop us a line @Oncrawl_CS
You can also find this article by searching for:
contenido duplicado similitudes
contenu dupliqué similaire