Skip to main content

Link Dataset

Understanding the information collected by Oncrawl about a website's links

The link dataset in the Data Explorer contains information about the links discovered during a crawl.

Information in the dataset

This dataset contains information about the link origin. The origin is the page on which the link was found, or the page that it links from.

It also contains information about the link target. The target is the page that the link points to.

This dataset also contains information about the link itself:

  • Anchor: the link's anchor text

  • Juice: a value representing the page importance within the website transmitted by the link. (This value is not cumulative.)

  • Target status code

  • Link category:

    • Internal: links to another page on the website

    • External: links to a page on a different website

    • Self-referencing: links to a location on the same URL

  • Rel attribute value allows the link to be followed:

    • Yes

    • No

Link position

In addition to these fields, the Link dataset provides a Link position field in Data Explorer.

This field identifies the main page zone where the link was found. Available values are:

• Header

• Main

• Nav

• Footer

• Aside

You can use the Link position filter in the Link dataset to analyze links by page zone and better understand sitewide navigation, contextual links, and template-based linking.

How link position is assigned:

Link position is based on the page section that owns the link.

Current values follow this priority order:

header > footer > nav > aside > main

This means that when multiple page zones could apply, Oncrawl assigns the value according to this order.

Customize the link position rules

You can customize how Oncrawl classifies link position in the crawl configuration.

By default, Oncrawl uses the HTML tags header, footer, nav, and aside to assign links to page zones. Links that do not match any of these zones are classified as Main.

If your site uses a different HTML structure, you can replace these default rules with your own CSS selectors for Header, Footer, Navigation, and Aside.

When several selectors could apply, Oncrawl uses the closest matching ancestor. If several selectors match the same element, the first matching selector is used.

To configure this setting, go to Crawl configuration → Analysis → Link Position. You can edit the default selectors and add multiple rules per zone:

Before launching the crawl, test your configuration on a few URLs using different page templates to make sure the rules behave as expected. Don't forget to save your changes.

This setting affects link position data in the Link dataset, but does not affect URL discovery during the crawl.

Links to other datasets

Beyond individual link fields, the links dataset also integrates shortcuts to other dataset, notably the pages dataset, using the menu available in the columns Origin: Full URL and Target: Full URL. These shortcuts provide a direct link to the Data Explorer with the appropriate OQL for the diffferent elements.

Below is an example of the shortcuts available for target URLs:

Did this answer your question?