Skip to main content

What counts as a single URL in Oncrawl?

Understand how Oncrawl handles URL variations during a crawl

URL normalization options in Oncrawl

URLs can exist in multiple formats while pointing to the same page. Oncrawl applies normalization rules to ensure consistent crawling and analysis.

You can control part of this behavior in the URL normalization section of your crawl settings.


Available options

  • Normalize URL encoding

  • Rewrite // to /

  • Crawl URLs with parameters

  • Reorder query params

  • Filter URL parameters


Normalize URL encoding

This option normalizes encoded and decoded URLs across both the URL path and query parameters.

When enabled, Oncrawl standardizes encoding to reduce inconsistencies between equivalent URL formats.

Examples

Encoded and decoded characters:

https://www.example.com/%7Eabout/ https://www.example.com/~about/

UTF-8 encoded vs decoded paths:

https://www.example.com/%D1%82%D0%B5%D1%81%D1%82/ https://www.example.com/тест/

Case-insensitive encoding:

%aa = %AA %aA = %AA

Query string normalization:

?a=10&b=20 → ?a=10&b=20 ?a=b&&&&c=d → ?a=b&c=d ?&a=b → ?a=b

Why use it

Enable this option to:

  • reduce false duplicate URLs

  • avoid canonical mismatches

  • prevent hreflang inconsistencies

  • reduce crawl size caused by equivalent URL variations

Important

When this option is disabled, Oncrawl keeps the default behavior described in this article and does not enforce additional encoding normalization.


Rewrite // to /

This option rewrites repeated slashes in the URL path as a single slash.

Example

https://www.example.com/path//subpage/ → https://www.example.com/path/subpage/

Why use it

Enable this option to reduce duplicate URLs caused by malformed paths.

Disable it to identify broken or malformed URLs.


Crawl URLs with parameters

This option allows Oncrawl to crawl URLs that contain query parameters.

Example

https://www.example.com/products?q=shoes

Why use it

Enable this option if parameterized URLs are part of your site structure, such as:

  • faceted navigation

  • search result pages

  • filtered category pages

If disabled, URLs with query parameters are not crawled.


Reorder query params

This option normalizes the order of query parameters to reduce crawl quota usage and prevent false duplicates.

Example

?q=search&utm=email&sort=asc → ?q=search&sort=asc&utm=email

Why use it

Enable this option to:

  • reduce duplicate parameterized URLs

  • improve crawl consistency

  • optimize crawl quota usage

Important

The order of repeated values for the same parameter is preserved.


Filter URL parameters

This option allows you to include or exclude specific query parameter names.

Examples

utm_source utm_medium utm_term

Why use it

Use this option to:

  • remove tracking parameters

  • reduce duplicate URLs

  • focus on meaningful URL variations


Other URL handling rules in Oncrawl

The following rules are always applied by Oncrawl, regardless of your settings.


Relative URLs are resolved

Relative paths are converted to full URLs.

/contact/ → https://www.example.com/contact/

Session IDs are removed

Oncrawl removes session identifiers such as PHPSESSID or JSESSIONID.

This behavior cannot be disabled.


Fragments are ignored

Everything after # is ignored.

/page#section → /page

Hashbangs (#!) are preserved.


Default ports are removed

Default ports are removed:

  • :80 (HTTP)

  • :443 (HTTPS)

https://www.example.com:443/ → https://www.example.com/

Hostnames are case-insensitive

https://www.MySite.com/ = https://www.mysite.com/

URL paths are case-sensitive

/Books/ ≠ /books/

About encoding behavior

When Normalize URL encoding is disabled, Oncrawl follows its standard handling of encoded characters, where equivalent URLs may still be interpreted consistently but are not fully normalized across all encoding variations.

Did this answer your question?