A sitemap.xml file is a list of the pages on your website that you want indexed by search engines. The Oncrawl bot can crawl sitemaps and utilizes the data for cross analysis; the cross-analysis with the URLs in your sitemap leverages existing sitemap data to spot ways to improve your SEO.
Accepted sitemap formats
Oncrawl can analyze sitemaps in the following formats:
XML: sitemap.xml, sitemap_index.xml
Gzip: sitemap.xml.gz, sitemap_index.xml.gz
Text files: sitemap.txt
Syndication feeds: sitemap.rss (RSS 2.0), sitemap.atom (Atom 1.0 or 3.0)
Best practices for sitemaps
When setting up a crawl:
Specify your sitemap URLs if you have sitemaps that won't be found in the robots.txt file or that don't have a standard name.
Remember that sitemaps must contain no more than 50,000 URLs and must be no larger than 50MB (uncompressed). If you need additional space, you can use a sitemap_index.html file.
Sitemaps cannot be used as start URLs.
When using a sitemap:
Remember that scanning a sitemap is not the same as crawling the URLs in the sitemap. (If that's what you're trying to do, you can extract the URLs from your sitemap, then crawl the resulting list in URL list mode.)