Welcome to OnCrawl's Getting Started Tutorials. Today we're going to walk through how to upload a log file to OnCrawl
Servers record the details of every request they receive from bots or visitors to provide the content for a URL. Each request is a log line, and looks something like this. A line might include a variety of different data, but for SEO, your log lines must include:
- query path or the full URL
- date with time and timezone
- User Agent of the visitor
- Referer, or the site from which the visitor came before arriving on your URL
- And the status code of the requested URL
If your files or your analyses contain multiple domains or subdomains, you must include the vhost, if it's not already present in the full URL
If your files contain HTTP and HTTPS hits, you also must add the scheme or protocol (that is: http or https), or the port of the request, if it's not already part of the full URL. If this isn't present, we'll assume all requests are for HTTPS.
We also highly recommend you include:
- client IP: which Google recommends using to detect fake Googlebot hits
- size of response in bytes
- response time, which is not present in this example
For SEO purposes, we'll concentrate only on lines in your log files that provide useful information:
- Organic hits, or lines with a search engine as a referrer
- Search engine bot hits, or lines with an official search engine User-Agent and IP address
Any other lines will be filtered out.
To upload a log file to OnCrawl, you will first need to activate log monitoring for this project, and convert your project to an advanced project.
We'll ask for some information about your log files. It's okay if you don't know the answers, but if you do, this helps us parse your files later:
- Do they contain lines for HTTP?
- Multiple subdomains?
- Multiple domains?
- What is your server's type and timezone?
- Do you use a cache server?
- Do all files have the same format?
Use an FTP client to connect to the OnCrawl server. Make sure that:
- One, your log files are zipped as a .zip, .gz or .7z file, or formatted as a CSV file
- And, two, your company's firewall is open to FTP connections.
You have a secure, private space to upload your log files. By default the login and password are the same as for your OnCrawl account, but you can contact us to change that if necessary.
Once logged in, choose the folder that corresponds to the project you're uploading log files for and upload your files here.
Return to OnCrawl to check how your files are parsed, that is: broken down into fields.
Every server is different. OnCrawl tests common patterns to try to find one that fits your log lines.
Here is the result.
We explain any difficulties we found. In this case:
- The automatic parser couldn't decide whether the lines were HTTP or HTTPS. This is an error, and we can't continue until it's fixed.
- We couldn't find which subdomain requests should be applied to. This is a warning, so it should be fixed to prevent errors in analysis, but isn't required.
- We didn't find any SEO visits. This is a warning, so if you know there are no organic visits in this file, you can ignore it.
You can re-map the fields found in your file by switching to manual mode. For each field, choose the right type of information. You might not need to use all of the labels in the dropdown menu, and that's ok.
When you're done, click CHECK LOGS FORMAT to see the results of your changes.
When OnCrawl can identify all of the information it needs, you'll see the result EVERYTHING SEEMS OK, even if you still have warnings.
Click CONFIGURATION IS NOT OK to go back in to make more changes.
If you use a CDN, it might replace the IP address of the visitor or bot with its own. If this is the case, to find Googlebots, we'll need to remove the IP verification step.
You can also supply the server's timezone it isn't available in the log lines.
If you can't get your logs set up right, you can also contact us, and we'll be happy to help you get your log parsing sorted out.
When you're sure that you are happy with your parsing, click CONFIGURATION IS OK.
Once you've gotten log monitoring up and running, you can see an overview of main statistics from the project home page.
You can also take a look at the history of your log processing. We also show you the number of errors and filtered lines, which we ignore. These don't count towards your quotas.
Click on a file to view a sample of each type of line.
You will need to upload your log data regularly to make sure that there are no gaps in your data: even small gaps make it look like there's been a drop in SEO activity on your site.
Plenty of solutions exist to automate uploading log files to keep you from forgetting and to make your job easier.
If you're interested in automating your uploads, or if you have questions, reach out to us from the OnCrawl interface by clicking on the blue Intercom button at the bottom of the screen, or tweet to us at OnCrawl_CS.
See you next time!
Until then, happy crawling.