Skip to main content
All CollectionsHow to get startedGeneral
Optimize Content Analysis with the Exclude Content Feature
Optimize Content Analysis with the Exclude Content Feature

This feature allows you to filter out dense, redundant, content from your analysis, enabling you to focus on what truly matters

Updated over 2 weeks ago

How to set up the exclude content feature

In this example, we'll configure a crawl to analyze only the main content while excluding boilerplate elements like headers, footers, and cookie banners.

1 - Create a Dedicated Crawl Profile

To correctly proceed, I create an identifiable crawl profile such as: Boilerplate_content

2 - Enable Content Exclusion

  • Navigate to the Content Analysis: Exclude Elements section.

  • Toggle the option to Yes (Exclude Boilerplate).

3 - Identify Elements for Exclusion

Use the Preview tool to inspect and highlight the content you want to exclude (e.g., header and footer sections).

4 - Select HTML Elements

  • Open the page you want to analyze and use Chrome's Inspect tool to locate the HTML code for the blocks you want to exclude.

  • Copy and paste the corresponding HTML selectors (like #id or .class) into the crawl configuration.

πŸ’‘ Tip: You can use standard HTML tags like <header> and <footer>.

5 - Verify Exclusions

  • Use the Preview tool to ensure the targeted content is excluded.

  • The remaining text should represent the primary content for analysis in reports.

πŸ’¬ Faq

Will linking analysis be impacted?

No, these settings only affect content metrics such as similarity or word count. Link discovery and analysis remain unchanged.

What metrics will be affected?

All the metrics derived from the full text are impacted.

  • Word count

  • N grams

  • near_duplicate signature

  • similarity ratio

Does this work with JS mode crawls?

Yes! If you enable JS mode, the preview feature will render the content loaded by JavaScript.

Did this answer your question?