This article will take you through the steps to set up log monitoring with OnCrawl:

  1. Set up your log data
  2. Validate your log file parsing
  3. Set up the ftp transfer for log data
  4. Monitorlog processing

Step 1: Set up your log data

Required fields:

  • query path (/blog/) or full URL (https://www.oncrawl.com/blog/ )
  • vhost, if not already present in the full URL
  • scheme (http or https), if not already present in the full URL
  • date with time and timezone
  • User Agent
  • referer
  • status code
  • port of the request (80 on HTTP / 443 on HTTPS)

Optional but highly recommended fields:

  • client IP : used to detect fake Googlebot hits.
  • size of response in bytes

If you use a server cache, the client IP corresponds to the server cache's IP address. This means that we cannot identify any bot hits because our Googlebot IP address verification is running.

In this case, in order to retrieve data about bot hits, let us know so that we can deactivate the Google IP address verification. This can cause bias in your log data as some of the hits we identify as coming from Google may in fact come from fake Googlebots.

If required by your in-house policies, you can filter your log files to remove lines that are not bot hits or SEO (organic) visits.
Find more on how to filter log lines here.

Sample log line bot hit:

www.oncrawl.com:80 66.249.73.145 - - [07/Feb/2018:17:06:04 +0000] "GET /blog/ HTTP/1.1" 200 14486 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" 

Sample log line SEO visit:

www.oncrawl.com:80 37.14.184.94 - - [07/Feb/2018:17:06:04 +0000] "GET /blog/ HTTP/1.1" 200 37073 "https://www.google.es/" "Mozilla/5.0 (Linux; Android 7.0; SM-G920F Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36" "-"

If you host multiple subdomains (for example, https://www.oncrawl.com and https://fr.oncrawl.com) on the same server, make sure that the two subdomains are clearly identified in your logs. One way to do this is to verify that the full URL is used instead of just the query path.

  • If your logs do not contain what’s mentioned above please contact your IT and add the fields (otherwise the dashboards might lack of important data) 
  • If your logs contain what’s mentioned above please move on to the next step

Learn how to configure the right log format when using Apache or Ngnix here

Step 2: Validate your Log Files using OnCrawl Parsing Process

From the project page in OnCrawl, click on the "ADD DATA SOURCES" button.

This will take you to the space where you can add any additional data sources. You will already be looking at the "ADD LOGS" tab.

Next, enable the blanks accordingly to your log files formatting (fields)

OnCrawl should automatically process your log files accordingly to what you've chosen.

Step 3: Upload your log files to your OnCrawl FTP Account

You have access to your log files and now they are ready to be uploaded into OnCrawl.

First, make sure that your network firewall is open for FTP connections.
Please note that OnCrawl does not use SFTP.

Use a FTP client solution (eg FileZilla or any other) to connect. You may need the following information:

  • Server: ftp://ftp.oncrawl.com or by IP: 23.251.134.79 
  • Username: OnCrawl Username 
  • Password: OnCrawl Password 
  • Ports: 21 , and 10090  to 10990 

OnCrawl does not use an authentification key  for the FTPS connection.

Once connected, you will see folders for each project in your account:

  • Directory: Your project name 

Open the folder for your project and drop the file(s) there.

You're (almost) done.
If you've properly configured Step 2 and your log files posses the right format; then move on to check the Log Manager Tool (Step 4)

Learn how to secure your FTP connection here (this makes sure you're using FTPS)
Learn how to automate the daily log files uploading
here

Step 4: Monitor the Log Processing from OnCrawl Log Manager Tool

OnCrawl's Log Manager Tool allows you monitoring the processing of your log files.

You can find the "LOG MANAGER TOOL" button where the "ADD LOGS" button used to be.

What sort of information can be monitored?

OnCrawl activity displays information regarding OnCrawl parsing jobs, whereas files Queued for parsing, Currently being parsed, Queued for export, and Currently being exported.

At this first stage, OnCrawl evaluates the data contained in your log files.

Second stage, the Log Manager Tool displays graphs regarding the uploading process and the parsing process.

Finally, the tool shows an explorable data table with information including File Name, Deposit Date, File Size, OK Lines, Erroneous Lines, and Filtered lines.

File Names are clickable and explorable.

Processed Files Legend:

  • Having lots of log lines in the columns "Files size" and "Ok Lines" should be fine
  • Having a few log lines in the columns "Erroneous lines" and "Warnings" should not be a problem
  • On the contrary, having lots of log lines in the columns "Erroneous Lines" and "Warnings" should indicate a parsing error. In that case, contact us using the OnCrawl chat box, we are happy to help.

Step 5: Explore OnCrawl Log Monitoring Dashboards
You are now all set. Click on the "SHOW LOGS MONITORING" and being your logs analysis.

You can also find this article by searching for:
paso a paso cómo comenzar con la configuración de monitoreo
mise en place du suivi des logs, étape par étape

Did this answer your question?