Reducing the size of access log files before sending them to ftp.oncrawl.com make them much faster to transfer and to process. This can be achieved with compression and pre-filtering.
In short: "grep google" and "gzip" are your friends
On average, compressing text files leads to reduce the size by 85%.
OnCrawl can read multiple compression formats:
- tar (+gz)
You can use your favorite compression program to compress access log files before sending them. Please note that RAR files are NOT supported.
On average, pre-filtering reduces logs files by 90%. Sometimes 99% ! Access logs contain a lot of information that is discarded by OnCrawl because it's not relevant for SEO analysis.
Pre-filtering is as easy as keeping only lines that contains the word:
in lower case.
If you work on Linux or Macos this can be achieved with grep:
grep google MY_LOG_FILE > MY_FILTERED_LOG_FILE
When combined compression and pre-filtering together we usually reduce the file size by 95%. That's why we recommend using both.
On Linux or Macos just do:
grep google MY_LOG_FILE | gzip > MY_SUPER_FILTERED_LOG_FILE