How to set up the exclude content feature
In this example, we'll configure a crawl to analyze only the main content while excluding boilerplate elements like headers, footers, and cookie banners.
1 - Create a Dedicated Crawl Profile
To correctly proceed, I create an identifiable crawl profile such as: Boilerplate_content
2 - Enable Content Exclusion
Navigate to the Content Analysis: Exclude Elements section.
Toggle the option to Yes (Exclude Boilerplate).
3 - Identify Elements for Exclusion
Use the Preview tool to inspect and highlight the content you want to exclude (e.g., header and footer sections).
4 - Select HTML Elements
Open the page you want to analyze and use Chrome's Inspect tool to locate the HTML code for the blocks you want to exclude.
Copy and paste the corresponding HTML selectors (like
#id
or.class
) into the crawl configuration.
π‘ Tip: You can use standard HTML tags like <header>
and <footer>
.
5 - Verify Exclusions
Use the Preview tool to ensure the targeted content is excluded.
The remaining text should represent the primary content for analysis in reports.
π¬ Faq
Will linking analysis be impacted?
No, these settings only affect content metrics such as similarity or word count. Link discovery and analysis remain unchanged.
What metrics will be affected?
All the metrics derived from the full text are impacted.
Word count
N grams
near_duplicate signature
similarity ratio
Does this work with JS mode crawls?
Yes! If you enable JS mode, the preview feature will render the content loaded by JavaScript.