URLs appear in many formats. Our OnCrawl crawler resolve certain types of URLs encountered during a crawl in order to process them.

URLs with parameters

OnCrawl may change the order of a the parameters in a query string in a URL for the purpose of comparison and analysis.

This means that you might find URLs with parameters in the crawl results that don't exist on your site in exactly the same form, character for character. However, the URLs used in the crawl results will always be functionally equivalent to the actual URLs found on your site.

For example:

https://www.example.com/product?q=search&utm=email&parameter1=value1

may be treated as:

https://www.example.com/product?parameter1=value1&q=search&utm=email

This allows us to treat all of the URLs with the same parameters in a different order (q=search&utm=email&parameter1=value1  or parameter1=value1&q=search&utm=email  or utm=email&q=search&parameter1=value1  or ...) as the same page.

Note
: We always keep multiple values for the same parameter in their original order. For example, if your query string contains parameter1=value1&parameter1=value2  and we change the order of the parameters, the re-ordered URL will still contain the exact string parameter1=value1&parameter1=value2.

Relative vs complete URLs

OnCrawl resolves all relative URLs. A relative and a complete URL are counted as a single URL, rather than two different URLs on your site.

For example:

/contact/

will be recorded as:

https://www.example.com/contact/


WWW versions vs versions without WWW

OnCrawl considers URLs with a www  hostname and identical URLs with no hostname to be one and the same URL.

For example:

https://example.com/

is treated as the same URL as:

https://www.example.com/

URLs with page anchors or hash symbols (#)

OnCrawl truncates (removes and ignores) the content in a URL following a hash (#).

For example:

https://www.example.com/product#specs

will be recorded as:

https://www.example.com/product

Note: We keep hashbangs (#!) and the content that follows a hashbang. This is used as a render indicator in certain types of JavaScript.

URLs with special characters vs encoded HTML entities

We encode special and non-ASCII characters as HTML entities starting with a % symbol in the URL.

For example:

https://www.example.com/product/5ways-to-encode- URL

will be treated as the same as:

https://www.example.com/%35ways-to-encode-%20URL

Trailing slashes

OnCrawl adds a missing trailing slash.

For example:

https://www.example.com

will be treated as the same as:

https://www.example.com/

Default ports

OnCrawl removes default ports from the URL. Default ports are 80  for HTTP and 443  for HTTPS. All other ports are left in the URL.

For example:

https://www.example.com:443/

will be treated as the same as:

https://www.example.com/

Lower- and upper-case characters

OnCrawl's analysis is not case sensitive. Uppercase and lowercase characters do not make a difference in the URL and URLs are analyzed as lowercase strings.

For example:

https://www.example.com/EBooks/

will be treated as the same as:

https://www.example.com/ebooks/
Did this answer your question?