The purpose of canonical tags
Rel canonical is a tag that tells search engines which URL is the most authoritative version of the content and which ones are copies. Canonical URLs are one signal that helps to avoid duplicate content issues and to optimize your SEO.
Some pieces of content can appear in multiple places of your website and thus be seen as duplicate content. Duplicate content can create serious indexing and ranking issues, and rel canonical provides you with a way to indicate to search engines which version of the content is the original, gives credit to that primary one, links the copy to the right URL and thus displays the right version in the search engines results.
Google and Oncrawl
Google uses canonical declarations as recommendations, not rules, and will choose a different canonical page if the canonical URL you declare does not seem to be the most authoritative page containing the same or very similar content.
Tag placement | Google's evaluation | Oncrawl's evaluation |
In the HTTP header | Strong signal | Analyzed |
Within the HTML <head> | Strong signal | Analyzed |
In xml sitemaps | Weak signal | Not taken into account |
In Oncrawl, you can use canonical declarations to examine different SEO-related issues. Here are some common examples:
see which pages are canonical (signaling they should be indexed), and which aren't (signaling that they shouldn't be indexed)
find pages that don't have a canonical URL
find groups of similar pages that don't share the same canonical URL
determine whether the use of canonical declarations is effective in managing similar or duplicate content across your site
Ensuring that similar pages declare the same canonical URL, and that canonical pages are crawlable and indexable can increase the reliability of canonical signals on your site in Google's eyes.
Oncrawl's canonical-related data
In the Crawl report
Oncrawl helps you spot errors in your canonical strategy more easily:
Here you can see if your canonical are set up correctly.
In evaluating your page's canonical declaration, Oncrawl assigns it one of four possible states:
A matching canonical declaration: The page declares itself as the canonical version.
A not matching declaration: The page declares a different page as the canonical version.
Not set: The page doesn’t declare a canonical version, or declares multiple, different pages as canonicals.
In crawls before July 2024, the page canonical evaluation could also be listed as Too many canonicals (The page has more than one canonical tag on it). This state is no longer used, to align with Google's behavior of allowing multiple canonical tags as long as they contain the same value. However, it is still present in certain graphs to allow you to view older crawls correctly.
Canonical evaluation in the Data explorer
Like our other features, you can access deeper details by clicking on the graph to see the list of associated pages in the Data explorer:
The Data explorer contain multiple fields that allow you to evaluate your canonical strategy, both for individual pages, and for groups of pages that appear similar:
For individual pages:
Canonicals: lists the canonical URL determined for the page
Rel canonical: lists all rel canonical declarations found on the page
Page canonical evaluation: indicates the type of suggestion made to search engines (matching: page is the canonical version; not matching: page has a different canonical version; and not set: the canonical version can't be determined)
Canonical declaration: indicates, in the case of multiple canonical declarations on the page, whether they are multiple similar declarations (pointing to the same page and evaluated as if there were only one), or whether there are multiple conflicting declarations (pointing to different ages and evaluated as if no declaration was made).
For clusters of similar pages (groups):
Cluster ID: shared ID of a group of similar pages
Cluster canonical evaluation: indicates whether or not all pages in the group declare the same canonical URL (matching), different URLs (not matching), or don't have a canonical URL (not set).
Content similarity ratio
Near-duplicate status: indicates whether or not the pages' similarity is explained by on-page tags, such as canonicals or hreflang declarations. Examples can include managed with canonicals (the cluster canonical evaluation is matching), canonical conflicts (the cluster canonical evaluation is not matching), or no management strategy (the cluster canonical evaluation is not set and all pages in the cluster each have multiple conflicting declarations).
Has near-duplicate content
Starting in July 2024, in new crawls, some pages may have shifted from No management strategy to Managed with canonicals, as Oncrawl began, like Google, to accept multiple declarations of the same canonical URL on a single page.
This may mean that you have less problematic duplicate content now than before July 2024.
Analyzing problems with canonicals
Often, problems with canonicals stem from the fact that the pages that list one another as canonicals send mixed signals to Google. In this case, Google will often ignore the canonical declaration.
One common reason is if a group of very similar (or identical) pages don't all declare the same page as their canonical URL.
Another common problem is when a page declares another page with very different content as is canonical URL.
Exploring Canonical conflicts in the Duplicate content dashboard, or adding a column for Near-duplicate status to your Data explorer report and filtering for Canonical conflicts can help pinpoint pages with this sort of error.
Finding pages with multiple canonical declarations
Particularly if you use multiple means to declare canonicals (for example, HTTP headers and HTML <rel canonical=...> tags), some pages may end up with more than one canonical declaration. As long as all declarations on the page indicate the same canonical URL, Oncrawl will accept this as valid.
For crawls starting in July 2024, to find pages with multiple declarations, use the Canonical declarations field (column) in the Data explorer, or in OQL filters throughout Oncrawl. This field can have different values:
Multiple similar declarations: This page contains multiple declarations, but Oncrawl has determined they all indicate the same canonical URL. The page is evaluated as though it has only one unique declaration, which can be either matching (the current page is the canonical version), or not matching (another page is the canonical version).
Multiple conflicting declarations: The page contains multiple declarations, but they indicate different canonical URLs. The page is evaluated as though the canonical URL could not be determined, or is not set.
No declaration: The page does not have any canonical URLs declared
Unique declaration: The page has only one canonical URL declared
Best practices for canonicals
There are multiple situations where canonical should be used for duplicate content:
Multiple URLs: e-commerce websites which offer filter options like prices, sizes, colors, categories have a lot of URLs with duplicate content.
HTTP, HTTPS, WWW: a search engine can see http://www.mywebsite.com, https://mywebsite.com and https://www.mywebsite.com as different websites and will index them as such.
Mobile URL: mobile URLs like m.mywebsite.com are seen as duplicate content.
Country URL: content remains the same even if you are using specific country URLs. However, if the language is different, you may want search engines to offer separate results.
Session ID URLs, breadcrumbs links, printer friendly versions, permalinks: they are automatically generated.
How to optimize your content with rel canonical?
First of all, you need to choose which URL is the main one and then insert at the top of your prefered URL <head>
section:
<link rel="canonical" href="http://www.yourdomain.com/your-main-url/" />
Most CMS solutions integrate that tag automatically.
Here are the few rules you should respect if you want to integrate your rel canonical correctly:
Verify that the rel canonical target exists otherwise you will get a 404 error.
Check that the rel canonical target does not have a noindex robots meta tag.
Insert the rel canonical link in either the
<head>
of the page or the HTTP header and not in the<body>
.Include no more than one rel canonical per page. When more than one is specified, all rel canonicals will be ignored.
A large part of the duplicate page’s content should also be on the canonical version.