Using the Oncrawl Data Explorer, you can identify and explore hreflang page clusters, that is, groups of pages that reference one another as hreflang translations.
Each cluster is assigned a unique ID. To view the pages in the cluster for a specific URL, you can use the following shortcut in the Data Explorer to the Oncrawl Query language filter:
Our data
Hreflang declarations that tell search engines which language the page is written in, and which other languages are available.
Oncrawl analyzed all methods of hreflang declarations during a crawl. The following three methods are fully supported:
1. HTML hreflang declarations in the page <head>
Similarly to canonical tags, hreflang tags take the following form:
<link rel="alternate" hreflang="en" href="https://www.yoursite.com/your_translated_page" />
2. Hreflang declarations in the page's HTTP headers
This method is particularly useful if your page is a file in a format other than HTML, such as an image or a PDF.
These declarations look like this:
Link: <https://www.yoursite.com/your_original_page>; rel="alternate"; hreflang="en", <https://www.yoursite.com/your_translated_page>
3. Hreflang declarations in sitemaps
By declaring hreflang references in sitemaps, you are able to keep all of your references in a single place. However, this also means you need to keep your sitemaps updated when you add translations or localizations.
Sitemaps declarations add an <xhtml:link> tag and its properties to the listing for each page with hreflang translations:
<url>
<loc>https://www.yoursite.com/your_original_page</loc>
<xhtml:link
rel="alternate"
hreflang="en"
href="https://www.yoursite.com/your_translated_page"/>
</url>
If you use this method for your hreflang declarations, you may want check the following option in your crawl settings:
Expand the Sitemaps section of the crawl settings.
Tick the Allow soft mode box.
This ensures that we'll analyze sitemaps regardless of whether you follow the rules of the sitemaps protocol. Many sites ignore these rules with regard to where they place sitemaps.
You do not need to specify sitemap URLs.
How hreflang works
In the examples above, "en" is the language code for English, but you can also specify the country, such as "en-GB" for English spoken in Great Britain, or "en-DE" for English speakers in Germany.
Using hreflang tags allows Google to offer the correct regional or language version of your page to users. Hreflang tags help send a strong signal to Google that one page is a translation or localization of another, which helps boost the ranking of new translations. Hreflang tags are a key element to an internationalized website.
Hreflang page clusters
On any page, hreflang declarations list all of the equivalent pages for other languages or regions. Together, these pages form a cluster.
When hreflangs are correctly implemented, each page in a cluster will have an hreflang reference to every other page in the cluster, including a reference to itself. Here is an example of a simplified ideal cluster:
All pages that are referenced as hreflangs should be canonical and indexable pages.
However, it often happens that some of the links within a cluster are missing or incorrect, producing clusters with errors:
Oncrawl makes it easy to check whether you've added the correct language alternates on all of the necessary pages.
Available hreflang data
As with all of our data, you can click through to the Data Explorer to view the hreflang URLs, languages, specific issues, and page clusters for each URL.
Additional hreflang columns can also be added to any Data Explorer results:
Hreflang hrefs: a list of all the URLs referenced in hreflang links on the page.
Hreflang langs: a list (in the same order as the hreflang hrefs) of all the languages referenced in hreflang links on the page.
Hreflang errors: a list of all of the errors (if any) found for the page. See below for a full list of errors.
Hreflang error details: a link to an overview of hreflang use for the page, including a link to the Oncrawl Query Language filter for the page cluster, and error details for each error encountered.
Hreflang cluster ID: an Oncrawl ID that uniquely identifies each cluster of pages that reference each other as hreflang translations.
Hreflang source: a list (in the same order as the hreflang hrefs) of the location where Oncrawl found the hreflang reference. The source of an hreflang declaration in this list can be
HTML
,Header
or the URL of a sitemap.
Optimizing hreflang use
Setting up your crawl for Hreflang analysis
Hreflang data is automatically included in every crawl. You don't have to do anything special.
If your site is translated or localized and you want to analyze complete hreflang data, the only thing you need to do is to make sure that all pages of the site are crawled, no matter what language they're in.
Your site has translations in different directories, such as https://www.yoursite.com/es and https://www.yoursite.com/en
If there are links from your start URL to each directory, you don't need to do anything else. Way to go!
If you're not sure, it doesn't hurt to list all of the directories as start URLs. You can do this in the Start URL section of the crawl settings.
Your site has translations in different subdomains, such as https://es.yoursite.com and https://en.yoursite.com
If there are links from your start URL to each subdomain, scroll down to the section Subdomains in the crawl settings and tick the box for Crawl encountered subdomains. That's it!
If you're not sure, it doesn't hurt to list all language or regional subdirectories as start URLs. You can do this in the Start URL section of the crawl settings. If you list all your translation URLs as start URLs, you do not need to enable the Crawl encountered subdomains option unless you want to explore additional subdomains during the crawl.
You might also want to take a look at the article on crawling subdomains if you have other subdomains you don't want to crawl.
Your site has translations on different domains, such as https://www.mysite.es and https://www.mysite.co.uk
In the Start URL section of the crawl settings, you must list all domains as start URLs. This is the only way to get your site's hreflang data for both domains at once.
Identifying Hreflang issues
Oncrawl will help you identify hreflang issues on your site.
Two key charts, the Hreflang issues chart and the Non-indexable pages declared as hreflang chart, can be found in the default dashboard under Crawl report > Indexability > Rel alternate.
Certain issues may prevent Google from taking your hreflang declarations into account. These issues include the following groups of errors:
Missing declarations
Missing outbound declarations: This page is missing declarations to some pages in the cluster.
Missing inbound declarations: Some pages in the cluster don't declare this page
Missing self declaration
Missing self declaration: This page doesn't declare itself as alternate
Incorrect language code
Incorrect language code: This page declares an hreflang with an incorrect language code
Duplicate hreflang declarations
Multiple hreflangs for the same language: This page declares multiple hreflangs for the same language
Same hreflang for multiple languages: This page declares an hreflang multiple times, and for multiple languages
Hreflang set in multiple places: This page declares an hreflang multiple times, and in several places (sitemaps, header, html)
Page is hreflang for multiple languages: This page is declared as alternate by other pages for multiple languages
Duplicate hreflang declaration: This page declares an hreflang multiple times, for the same language
Conflicting x-default declarations
Conflicting x-default declarations: This page declares an x-default URL that isn't the same as the x-default URL declared by at least one other page in the cluster.
Non-indexable pages declared as hreflang
Hreflangs with bad status code: This page is part of a cluster containing pages with a 5xx status or that did not respond.
3xx hreflang: This page is part of a cluster containing pages with a 3XX status.
4xx hreflang: This page is part of a cluster containing pages with a 4XX status.
Non-indexable hreflang by meta robots: This page is part of a cluster containing non-indexable pages by meta robots
Non-indexable hreflang by robots.txt: This page is part of a cluster containing non-indexable pages by robots.txt
Non-canonical hreflang: This page is part of a cluster containing non-canonical pages
Non-indexable page: This page is not indexable, but is part of an hreflang cluster.
Some issues indicate crawl issues that may or may not reveal underlying hreflang issues:
Pages with too many hreflangs
Too many hreflangs: This page is part of a cluster that couldn't be fully processed because of its size. In most cases, this only occurs when large numbers of pages incorrectly declare the homepage as their hreflang.
Page clusters with pages that are declared as hreflang alternates but that couldn't be found in crawl
This last error covers all of the following situations:
Pages on a subdomain that is not crawled. This is a site audit (crawl) error. Make sure that the crawl settings allow the Oncrawl bot to crawl subdomains.
Pages on a different domain. This is a site audit (crawl) error. Make sure that the crawl settings include all domains in the list of start URLs.
The crawl hit a maximum depth or a maximum number of URLs before reaching the hreflang pages. This is a site audit (crawl) error. Modify the maximum depth and number of URLs of the crawl.
Pages are not linked to the rest of the site. This is an implementation error. Make sure these pages appear in a sitemap or are linked to by other pages in the site.
Pages do not exist. This is an implementation error. Modify the hreflang URL to point to a pre-existing page, or create a page at the URL that is currently listed.
Examining hreflang page clusters
Click on the arrow beside the URL in the first column.
In the pop-up menu that opens, scroll down to the HREFLANG section.
Choose View all pages in its hreflang cluster.
Best practices
In order to prevent Google from treating your translations as duplicate content, it is especially important to indicate translations of pages when content may look very similar to a search engine. This may be the case in the following examples:
user-created page content (e.g. forums, comments, or product reviews)
regional variants (e.g. a page in South African English and a page in American English)
translations of entire sections or entire sites
When using hreflang tags, here are some best practices to optimize your pages and avoid common errors:
Make sure that links are reciprocal. If, on Page A, you have an hreflang link to Page B, the opposite should also be true. On Page B there should be an hreflang link to Page A. Non-reciprocal links are ignored by Google.
Make sure that pages declare an hreflang link to themselves.
List the URL in the correct format: "https://www.yoursite.com/your_page". Don't skip the "https://"!
Use the correct code for the language and, optionally, the region. Some studies show Google recognizes certain errors and corrects for them; others show the opposite. It's easiest to err on the side of caution and use the correct codes. Languages are given in the two-letter ISO 639-1 format. Optionally, you can also add a region in the ISO 3166-1 Alpha 2 format. If the targeted language uses multiple scripts, you can also choose to specify the script. For example, to indicate a page in the Simplified Chinese script for users in Taiwan, you can use the code:
zh-Hans-TW
Indicate a generic language page, such as "en", if you have a series of regional pages in that language ("en-US", "en-GB", "en-CA", "en-ZA", "en-AU"). This page will be used for all regions you didn't specify. In this example, that might include English-speaking New Zealand, or even a user browsing in English from France.
Remember that the language code is required. Never use a region code by itself.
Use the optional value hreflang="x-default" for language selection pages or for pages that automatically redirect to the user's language.
Until recently, Google required you to list all available translations of a page. This practice is still strongly encouraged. However, if even you can't list all translations of the page, you must include reciprocal rel="alternate" links between each translated page and the page in the main or original language of your site.
You can also check out our list of 5 common hreflang errors to avoid.