In order to create an accurate setup of your log monitoring, it is useful to work with at least one day of data. Ideally, this sample should contains lines for both of the following types of visits:
- SEO visits (e.g. with "www.google.com" in the referer field)
- Bot hits (e.g. with "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html" as the User Agent).
In order to make an efficient analysis of your SEO traffic and Bot behavior, we recommended that your logs cover the largest time period possible.
With 30 days of logs, you should be able to see that the Bots have crawled at least one time a large amount of distinct pages.
OnCrawl will process your newly uploaded logs files with content up to 60 days old.
Data that have already been processed in earlier uploads are, of course, kept. This allows you to carry out analyses over as long of periods as possible.
When OnCrawl analyzes your crawl, the last 45 days of logs are added into the mix to determine useful indicators such as Orphan Pages (pages only known to Google).
- 1 day of log is enough to configure the parser / set up Log Monitoring
- 30 days or more are recommended to ensure a relevant analysis
- 45 days is the period used for the cross-analysis (Crawl + Logs)
- 60 days is the limit for data in a single new log file