If your website redirects users to the appropriate version based on their IP address's localization, or based on their browser language, then sets a cookie to keep track of the user's preferred site version, keep reading,
Sometimes this method means that your site doesn't play well with bots.
OnCrawl crawls with bots that use a US-based IP address. If you redirect them as if they were US-based users, they might be unable to reach pages on other versions of the site, even if you set a different version as your Start URL.
Because the bot never clicks to select a different country version of the site, no override cookie is ever created created, and the bot will continue to be redirected whenever it requests a page.
You can resolve this problem by crawling the other version of your site (www.mysite.fr in the example above) with a modified crawl profile.
How it works
We'll set the user preference cookie to the appropriate site version, and pass this information to the server in the HTTP headers.
How to do it
Find the cookie information
First, you will need the cookie name and value that determines a user's location. The easiest way to do this is to request the information from your website's development team, but you can also find it on your own.
You can use the inspection tools in your browser to find the cookie name and value that determines your location. This information depends on the website.
These instructions are for Chrome.
- Visit the website.
- Right-click and choose "Inspect".
- Switch to the "Application" tab at the top of the console that appears.
- In the sidebar at the left, scroll down to "Cookies". Click to expand this section.
- Still in the sidebar, under "Cookies", select the website.
- You may want to filter the cookies for just those created by the website itself. In the example below, we used the filter bar at the top of the list of cookies (on the right) to filter for "ikea.com".
- Look for a location cookie. It usually has a title related to "country" or "location" and a value that looks like a country name or two- or three-letter country code, such as "FR" or "FRA" (for France). Here's the cookie code for ikea.com:
You'll need to note both the cookie name (the first column) and the value (the second column) for the version of the site that you want the bot to crawl.
Create a new crawl profile
- From the project home page, click on "+ Set up new crawl".
- Click "+ Create Crawl Profile". This will ask you to pick a crawl profile to copy and ask you to provide a name. Choose the profile that is being redirected, and provide a name such as "mysite.fr" if you plan to crawl the French version of the site.
Set a cookie via HTTP headers
- Enable the extra settings at the top of the page:
- Scroll down to "HTTP headers" and click to expand the selection:
Set-Cookie: CookieName=CookieValuein the text box. Replace
CookieNamewith the name of your cookie and
CookieValuewith the value you want to use.
- Save the new crawl profile by clicking on "Save" at the bottom of the settings page.
Crawl your site
You can now launch or program a crawl using the new profile.
If you still have questions about overriding default cookie settings and how this can help in the case of unwanted automatic geographic redirects, drop us a line at @oncrawl_cs or click on the Intercom button at the bottom right of your screen to start a chat with us.
This article can also by found by searching for:
redirections géographiques, crawler une version internationale de mon site, automatiquement redirigé, problème avec mon Start URL, redirecciones geográficas, rastrear una versión international del sitio web, problema de Start URL