Apache
Recommended format:
LogFormat "%{HOST}i:%p %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined_host_ssl
%{HOST}i
: hostname of the query%p
: server port%h
: client IP%l
: remote logname. Not used by OnCrawl by standard in log formats%u
: remote user. Not used by OnCrawl by standard in log formats%t
: request date time%r
: first line of request%>s
: response status code%b
: size of the response in bytes%{Referer}i
: referer%{User-Agent}i
: User agent
Nginx
Recommended format:
log_format combined_vhosts '$host:$server_port $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"';
Reference documentation:
IIS
Recommended formats: W3C Extended Log File Format or NCSA Common Log File Format.
W3C Extended Log File Format
Example format containing time, client IP address, method, URI stem, protocol status and protocol version:
#Software: Internet Information Services 6.0
#Version: 1.0
#Date: 2001-05-02 17:42:15
#Fields: time c-ip cs-method cs-uri-stem sc-status cs-version
17:42:15 172.16.255.255 GET /default.htm 200 HTTP/1.0
NSCA Common Log File Format
The following example lists these properties:
IP address
Remote log name of the user (blank in this example and represented by a "-" )
Domaine/user
Date and time
Command
Status code returned
Bytes of data sent
172.21.13.45 - Microsoft\fred [08/Apr/2001:17:39:04 -0800] "GET /scripts/iisadmin/ism.dll?http/serv HTTP/1.0" 200 3401
CloudFlare
For those using CloudFlare, you need to have an Enterprise plan in order to access your (raw) log files. Then, you need to activate Enterprise Log Share.
Also, you can set up a daily upload of your log files to our FTP (with the credentials you received by email).
Best practices
If you host multiple subdomains on the same server:
Make sure the subdomains are clearly identified in each log entry. One way to do this is to provide the full URL instead of only the request query string.
Regardless of your log format, the following information is required:
URL: Since the full URL isn't usually found in server logs, we reconstitute it based on some or all of the following fields:
- scheme (http/https): can be determined using the port or the ssl protocol
- hostname
- path: can be found in the http request
- query string: usually included in the path
date
referer
user agent
http status
Regardless of your log format, the following information is optional:
response size (in bytes)
load time
client IP: This is used to verify that lines with a Googlebot user agent are genuine bot visits. This can be left empty if the user agent does not contain "Googlebot".