If you are a GCP user and you store your log data in a Google Cloud Storage bucket, we can help you automate uploading log data to OnCrawl.
Before beginning, you must make sure that you have activated the log monitoring feature for your project and followed the setup steps here.
If you choose to, you can also use the manual FTP upload process for the initial activation of your log monitoring and to configure the parser. The following steps will automate the process of regularly providing log data to OnCrawl for monitoring.
Make your Google Cloud Storage files available for frequent retrieval
Set up and authorize a Service Account for the OnCrawl platform
Connect to your GCP account.
In the IAM config page, add a new service account for OnCrawl.
In the GCS browser, click on the name of the bucket containing your log files.
In the “Permissions” tab, click “Add”, and search for the service account created before.
Select the Cloud Storage > Storage Object Viewer role and save.
The service account now has read access to the GCS objects stored in the bucket.
Here's Google's official documentation: Creating a service account
Generate a HMAC key
OnCrawl will use the HMAC keys associated with the service account to access objects in GCS.
To generate these keys, follow these steps:
Go to the GCS settings page.
In the “Interoperability” tab, go to the “Service account HMAC” section, and click “Create a key for a service account”.
Select the service account that you created in the steps above, and click “Create key”.
An access key and a secret key will be displayed: these are the keys that need to be sent to OnCrawl.
With these keys, OnCrawl will be able to use the service account and have read access to the files in your GCS bucket.
Heres's Google's official documentation: Creating an HMAC key
Provide the following information to OnCrawl
Please contact us (use the blue chat button at the bottom right of your screen) to put automated log file uploading in place.
You will need the following information:
Bucket name
Prefix. This is the first part of the name of the directory in the bucket in which your log files are found.
Note: Your website might use more than one prefix. Please provide all of the prefixes for log monitoring data on the domains and subdomains you want analyzed.
For example, if your website uses directorieswww-2019/
,www-2020/
,www-2021
, andshop-2021/
, you can indicate all four directories with the prefixeswww-
andshop-
.An access key
A secret key*
* We recommend encrypting your GCS secret key before sending it to us.
This can be done very simply using Keybase.io. Once encrypted, your secret key can only be decrypted by a single recipient defined at the time of encryption.
For this, you must:
1. Paste your secret key in the "Message to encrypt" text box on this page: https://keybase.io/encrypt#oncrawl
2. Click "Encrypt".
3. Send us the entire result found in the "The secret message" text box.
This is a PGP message that can only be decrypted by the owner of the https://keybase.io/oncrawl account.
Please make sure that the logs stored in your bucket under the provided prefixes meet the following criteria:
They contain both SEO (organic) visits and bot hits,
They contain logs only for the site you would like to analyze. Logs for non-production sites, partner sites, or domains not included in your project must not be included in the provided prefixes.
This information will allow us to configure the connection for you.
Once this has been set up, OnCrawl will retrieve new log files every hour throughout the day. This ensures that no log data is missing in OnCrawl, and enables you to follow important website events using the live log monitoring dashboard.
At this point, this data will be processed at the end of each day: we aggregate events by URL and make them available in the Log Monitoring reports and in other reports that use cross-data analysis, like the SEO Impact Report.