Rel canonical is a tag in the header of your HMTL or in the HTTP header of the page that tells search engines which piece of content is the most authoritative version and which ones are copies. It was created in order to avoid duplicate content issues and to optimize your SEO.
Some pieces of content can appear in multiple places of your website and thus be seen as duplicate content. Duplicate content can create serious indexing and ranking issues, and rel canonical provides you with a way to indicate to search engines which version of the content is the original, gives credit to that primary one, links the copy to the right URL and thus displays the right version in the search engines results.
Google uses canonical declarations as recommendations, not rules, and will choose a different canonical page if the canonical URL you declare does not seem to be the most authoritative page containing the same or very similar content.
As an effective strategy for managing duplicate content, canonicalization is essential to create an optimized website and offers an improved user experience; users do not have to choose which version of a page is the best or more likely to be the original.
Oncrawl helps you spot errors in your canonical strategy more easily:
Here you can see if your canonical are set up correctly.
In evaluating your page's canonical declaration, Oncrawl assigns it one of four possible states:
A “matching” canonical declaration: The page declares itself as the canonical version.
A “not matching” declaration: The page declares a different page as the canonical version.
“Not set”: The page doesn’t declare a canonical version.
There are "Too many" canonical: The page has more than one canonical tag on it.
Like our other features, you can access deeper details by clicking on the graph to see the list of associated pages in the Data Explorer:
Problems with canonicals
Often, problems with canonicals stem from the fact that the pages that list one another as canonicals send mixed signals to Google. In this case, Google will often ignore the canonical declaration.
One common reason is if a group of very similar (or identical) pages don't all declare the same page as their canonical URL.
Another common problem is when a page declares another page with very different content as is canonical URL.
Exploring Canonical conflicts in the Duplicate content dashboard, or adding a column for Near-duplicate status to your Data Explorer report and filtering for "Canonical conflicts" can help pinpoint pages with this sort of error.
Best practices for canonicals
There are multiple situations where canonical should be used for duplicate content:
Multiple URLs: e-commerce websites which offer filter options like prices, sizes, colors, categories have a lot of URLs with duplicate content
HTTP, HTTPS, WWW: a search engine can see http://www.mywebsite.com, https://mywebsite.com and https://www.mywebsite.com as different websites and will index them as such
Mobile URL: mobile URLs like m.mywebsite.com are seen as duplicate content
Country URL: content remains the same even if you are using specific country URLs. However, if the language is different, you may want search engines to offer separate results
Session ID URLs, breadcrumbs links, printer friendly versions, permalinks: they are automatically generated
How to optimize your content with rel canonical?
First of all, you need to choose which URL is the main one and then insert at the top of your prefered URL <head> section:
<link rel="canonical" href="http://www.yourdomain.com/your-main-url/" />
Most CMS solutions integrate that tag automatically.
Here are the few rules you should respect if you want to integrate your rel canonical correctly:
Verify that the rel canonical target exists otherwise you will get a 404 error
Check that the rel canonical target does not have a noindex robots meta tag
Insert the rel canonical link in either the <head> of the page or the HTTP header and not in the <body>
Include no more than one rel canonical per page. When more than one is specified, all rel canonicals will be ignored
A large part of the duplicate page’s content should also be on the canonical version
For any questions about the rel=canonical, feel free to drop us a line @Oncrawl_CS
You can also find this article by searching for:
URLs canónicas, contenido duplicado
URLs canoniques, contenu dupliqué