Orphan pages are pages that aren't linked to in your site structure. They can’t be reached from anywhere on the website. This means that users can’t find them unless they know the URL.
We use the word "orphan" to indicate the lack of parent pages. A parent page is a page with an outgoing link to another page, called a child page. An orphan page, therefore, has no parents. Orphan pages are almost impossible to find for search engines, as bots follow links when they crawl a website.
This won't always prevent search engine bots that already know your orphan pages exist from visiting them.
OnCrawl is well aware of this, and this is one reason you may want to combine your crawl data with log data using our cross-analysis. You can discover all your pages, both the ones present in your website structure (discovered by the crawl) and the orphan pages (often not crawled by OnCrawl or by Google).
OnCrawl also displays your orphan page distribution by page group so that you can determine trends in the location of your orphan pages.
Why do we get orphan pages?
Here are a few reasons you might end up with orphan pages:
- Pages linked to from external websites. Google has indexed a page that isn't part of your site structure, it's often because of a link from an external website. This produces an active page (a page that receives SEO hits) that is also an orphan page.
- Redirected pages. When you redirect a page, you remove it from your site structure. Internal links should always go directly to the correct page.
- Non-canonical pages. When you successfully tell Google to index a different page using rel=canonical, the non-canonical page can become an orphan page.
- Expired pages on a website with many pages that have a short lifespan. These pages often actually expire during the crawling time so it can become dangerous if they remain orphans for too long.
- Pages returning errors that have been corrected but that Google still crawls for a few moments.
Best practices for orphan pages
- Link all pages that could possibly generate traffic to your website’s structure (like category pages or internal search result pages).
- Avoid syntax errors when creating canonical tags as it creates incorrect URLs (HTTP 200 or errors).
- Make sure that your expired content delivers the appropriate status code (a 404 or a redirection to a newer version).
- Be careful when setting up your sitemap in order to avoid any syntax errors.
- Reattach known orphan pages and pages that bring the most value to your website structure.
- Be aware that when you correct an orphan page by redirecting traffic, it may take a while for Google's bots to stop testing it.
Make sure you are not wasting some valuable organic traffic!
If you have any questions regarding orphan pages, feel free to drop us a line @OnCrawl_CS or click on the blue Intercom button at the bottom of the page to chat with us.