All Collections
General information
How to explore all of the URLs in my sitemap
How to explore all of the URLs in my sitemap
Sometimes you just want data on the URLs in your sitemap. It's possible to do this using a list of URLs.
Rebecca Berbel avatar
Written by Rebecca Berbel
Updated over a week ago

OnCrawl will analyze and crawl any list of URLs you provide. If you want to crawl the URLs in your sitemap (and only these URLs), the only thing you need to do is provide your sitemap as a list of URLs.

Note: Do not indicate your XML sitemap as your start URL. The OnCrawl bot will not know what to do with a .xml file. We'll explain below how to give the bot something it knows how to explore.

Step 1: Extract the URLs from your sitemap

First, you will need to extract the URLs from your sitemap.

There are many options to do this, from online tools or desktop converters, to bash and command line executables.

You will want to create one or more files in .txt or .csv format with one URL per line.

If you need help with the format of file to create, you'll find additional information here.

Step 2: Zip your files

Zip your files into a .zip compressed archive. You can do this using free tools like 7-zip (Windows) or right-clicking and selecting "compress" (Mac).

Step 3: Upload your .zip files to OnCrawl

From the project home page, click on "Add data sources", then on "URL files".

Drag your zip file and drop them within the dotted blue box.

Step 4: Launch a crawl in URL list mode

From the project home page, choose "+ Set up new crawl".

Click on Start URL to expand the section.

Choose the option "List of URLs".

In the drop-down menu, select the .zip file you uploaded in the previous step. 

Launch your crawl.

Going further

If you still have questions, drop us a line at @oncrawl_cs or click on the Intercom button at the bottom right of your screen to start a chat with us.

Happy crawling!

This article can also by found by searching for:
crawler les URLs dans mon sitemap, explorer un sitemap, rastrear las URLs en un sitemap, mapa del sitio

Did this answer your question?