Using the OnCrawl Data Explorer, you can identify and explore hreflang page clusters, that is, groups of pages that reference one another as hreflang translations.
Each cluster is assigned a unique ID. To view the pages in the cluster for a specific URL, you can use the following shortcut to the OnCrawl Query language filter:
Hreflang tags are data in the header of your HTML that tell search engines which language the page is written in, and which other languages are available. Similarly to canonical tags, hreflang tags take the following form:
<link rel="alternate" hreflang="en" href="https://www.yoursite.com/your_translated_page" />
In this example, "en" is the language code for English, but you can also specify the country, such as "en-GB" for English spoken in Great Britain, or "en-DE" for English speakers in Germany.
Using hreflang tags allows Google to offer the correct regional or language version of your page to users. Hreflang tags help send a strong signal to Google that one page is a translation or localization of another, which helps boost the ranking of new translations. Hreflang tags are a key element to an internationalized website.
Hreflang page clusters
On any page, hreflang declarations list all of the equivalent pages for other languages or regions. Together, these pages form a cluster.
When hreflangs are correctly implemented, each page in a cluster will have an hreflang reference to every other page in the cluster, including a reference to itself. Here is an example of a simplified ideal cluster:
All pages that are referenced as hreflangs should be canonical and indexable pages.
However, it often happens that some of the links within a cluster are missing or incorrect, producing clusters with errors:
OnCrawl makes it easy to check whether you've added the correct language alternates on all of the necessary pages.
Available hreflang data
As with all of our data, you can click through to the Data Explorer to view the hreflang URLs, languages, specific issues, and page clusters for each URL.
Additional hreflang columns can also be added to any Data Explorer results:
- Hreflang hrefs: a list of all the URLs referenced in hreflang links on the page.
- Hreflang langs: a list of all the languages referenced in hreflang links on the page.
- Hreflang errors: a list of all of the errors (if any) found for the page. See below for a full list of errors.
- Hreflang error details: a link to an overview of hreflang use for the page, including a link to the OnCrawl Query Language filter for the page cluster, and error details for each error encountered.
- Hreflang cluster ID: an OnCrawl ID that uniquely identifies each cluster of pages that reference each other as hreflang translations.
Optimizing hreflang use
Setting up your crawl for Hreflang analysis
Hreflang data is automatically included in every crawl. You don't have to do anything special.
If your site is translated or localized and you want to analyze complete hreflang data, the only thing you need to do is to make sure that all pages of the site are crawled, no matter what language they're in.
- If there are links from your start URL to each directory, you don't need to do anything else. Way to go!
- If you're not sure, it doesn't hurt to list all of the directories as start URLs. You can do this in the "Start URL" section of the crawl settings.
- If there are links from your start URL to each subdomain, scroll down to the section "Subdomains" in the crawl settings and tick the box for "Crawl encountered subdomains". That's it!
- If you're not sure, it doesn't hurt to list all language or regional subdirectories as start URLs. You can do this in the "Start URL" section of the crawl settings. If you list all your translation URLs as start URLs, you do not need to enable the "Crawl encountered subdomains" option unless you want to explore additional subdomains during the crawl.
You might also want to take a look at the article "How can I crawl some subdomains and not others" if you have other subdomains you don't want to crawl.
- In the "Start URL" section of the crawl settings, you must list all domains as start URLs. This is the only way to get your site's hreflang data for both domains at once.
Identifying Hreflang issues
OnCrawl will help you identify hreflang issues on your site.
Two key charts, the "Hreflang issues" chart and the "Non-indexable pages declared as hreflang" chart, can be found in the default dashboard under Crawl report > Indexability > Rel alternate.
Certain issues may prevent Google from taking your hreflang declarations into account. These issues include the following groups of errors:
- Missing outbound declarations: This page is missing declarations to some pages in the cluster.
- Missing inbound declarations: Some pages in the cluster don't declare this page
Missing self declaration
- Missing self declaration: This page doesn't declare itself as alternate
Incorrect language code
- Incorrect language code: This page declares an hreflang with an incorrect language code
Duplicate hreflang declarations
- Multiple hreflangs for the same language: This page declares multiple hreflangs for the same language
- Same hreflang for multiple languages: This page declares an hreflang multiple times, and for multiple languages
- Hreflang set in multiple places: This page declares an hreflang multiple times, and in several places (sitemaps, header, html)
- Page is hreflang for multiple languages: This page is declared as alternate by other pages for multiple languages
- Duplicate hreflang declaration: This page declares an hreflang multiple times, for the same language
Non-indexable pages declared as hreflang
- Hreflangs with bad status code: This page is part of a cluster containing pages with a 5xx status or that did not respond.
- 3xx hreflang: This page is part of a cluster containing pages with a 3XX status.
- 4xx hreflang: This page is part of a cluster containing pages with a 4XX status.
- Non-indexable hreflang by meta robots: This page is part of a cluster containing non-indexable pages by meta robots
- Non-indexable hreflang by robots.txt: This page is part of a cluster containing non-indexable pages by robots.txt
- Non-canonical hreflang: This page is part of a cluster containing non-canonical pages
- Non-indexable page: This page is not indexable, but is part of an hreflang cluster.
Some issues indicate crawl issues that may or may not reveal underlying hreflang issues:
Pages with too many hreflangs
- Too many hreflangs: This page is part of a cluster that couldn't be fully processed because of its size. In most cases, this only occurs when large numbers of pages incorrectly declare the homepage as their hreflang.
Page clusters with pages that are declared as hreflang alternates but that couldn't be found in crawl
This last error covers all of the following situations:
- Pages on a subdomain that is not crawled. This is a site audit (crawl) error. Make sure that the crawl settings allow the OnCrawl bot to crawl subdomains.
- Pages on a different domain. This is a site audit (crawl) error. Make sure that the crawl settings include all domains in the list of start URLs.
- The crawl hit a maximum depth or a maximum number of URLs before reaching the hreflang pages. This is a site audit (crawl) error. Modify the maximum depth and number of URLs of the crawl.
- Pages are not linked to the rest of the site. This is an implementation error. Make sure these pages appear in a sitemap or are linked to by other pages in the site.
- Pages do not exist. This is an implementation error. Modify the hreflang URL to point to a pre-existing page, or create a page at the URL that is currently listed.
Examining hreflang page clusters
- Click on the arrow beside the URL in the first column.
- In the pop-up menu that opens, scroll down to the "HREFLANG" section.
- Choose "View all pages in its hreflang cluster".
In order to prevent Google from treating your translations as duplicate content, it is especially important to indicate translations of pages when content may look very similar to a search engine. This may be the case in the following examples:
- user-created page content (e.g. forums, comments, or product reviews)
- regional variants (e.g. a page in South African English and a page in American English)
- translations of entire sections or entire sites
When using hreflang tags, here are some best practices to optimize your pages and avoid common errors:
- Make sure that links are reciprocal. If, on Page A, you have an hreflang link to Page B, the opposite should also be true. On Page B there should be an hreflang link to Page A. Non-reciprocal links are ignored by Google.
- Make sure that pages declare an hreflang link to themselves.
- List the URL in the correct format: "https://www.yoursite.com/your_page". Don't skip the "https://"!
- Use the correct code for the language and, optionally, the region. Some studies show Google recognizes certain errors and corrects for them; others show the opposite. It's easiest to err on the side of caution and use the correct codes. Languages are given in the two-letter ISO 639-1 format. Optionally, you can also add a region in the ISO 3166-1 Alpha 2 format. If the targeted language uses multiple scripts, you can also choose to specify the script. For example, to indicate a page in the Simplified Chinese script for users in Taiwan, you can use the code "zh-Hans-TW".
- Indicate a generic language page, such as "en", if you have a series of regional pages in that language ("en-US", "en-GB", "en-CA", "en-ZA", "en-AU"). This page will be used for all regions you didn't specify. In this example, that might include English-speaking New Zealand, or even a user browsing in English from France.
- Remember that the language code is required. Never use a region code by itself.
- Use the optional value hreflang="x-default" for language selection pages or for pages that automatically redirect to the user's language.
Until recently, Google required you to list all available translations of a page. This practice is still strongly encouraged. However, if even you can't list all translations of the page, you must include reciprocal rel="alternate" links between each translated page and the page in the main or original language of your site.
You can also check out our list of 5 common hreflang errors to avoid.
If you still have questions about hreflangs and translated pages, feel free to drop us a line at @oncrawl_cs or click on the Intercom button at the bottom right of your screen to start a chat with us.