If you store your log data in an Amazon S3 bucket, we can help you automate uploading log data to OnCrawl.
Before beginning, you must make sure that you have activated the log monitoring feature for your project and followed the setup steps here.
If you choose to, you can also use the manual FTP upload process for the initial activation of your log monitoring and to configure the parser. The following steps will automate the process of regularly providing log data to OnCrawl for monitoring.
In the first part of this article, you will automate the transfer of your files from S3 to OnCrawl by making files in your bucket available for frequent retrieval. Several times a day, OnCrawl will scan your S3 bucket and retrieve any new files available.
Once this is up and running, follow the instructions in the second part of this article to have your S3 account notify OnCrawl whenever new log files are available in your bucket. These files can then be retrieved, analyzed and made available as soon as possible in the Live Log feature.
1. Make your S3 files available for frequent retrieval
Please contact us (use the blue Intercom button at the bottom right of your screen) to put automated log file uploading in place.
You will need the following information:
- Bucket name
- Prefix. This is the first part of the name of the directory in the bucket in which your log files are found.
Note: Your website might use more than one prefix. Please provide all of the prefixes for log monitoring data on the domains and subdomains you want analyzed.
For example, if your website uses directories
shop-2021/, you can indicate all four directories with the prefixes
- An access key
- A secret key*
* We recommend encrypting your AWS secret key before sending it to us.
This can be done very simply using Keybase.io. Once encrypted, your secret key can only be decrypted by a single recipient defined at the time of encryption.
For this, you must:
1. Paste your secret key in the "Message to encrypt" text box on this page: https://keybase.io/encrypt#oncrawl
2. Click "Encrypt".
3. Send us the entire result found in the "The secret message" text box.
This is a PGP message that can only be decrypted by the owner of the https://keybase.io/oncrawl account.
Please make sure that the logs stored in your bucket under the provided prefixes meet the following criteria:
- They contain both SEO (organic) visits and bot hits,
- They contain logs only for the site you would like to analyze. Logs for non-production sites, partner sites, or domains not included in your project must not be included in the provided prefixes.
This information will allow us to configure the connection for you.
Once this has been set up, OnCrawl will retrieve new log files at regular intervals throughout the day. This ensures that no log data is missing in OnCrawl.
At this point, this data will be processed at the end of each day: we aggregate events by URL and make them available in the Log Monitoring reports and in other reports that use cross-data analysis, like the SEO Impact Report.
2. Make your S3 bucket ready for Live Logs and live file retrieval
Now that OnCrawl can regularly retrieve log files in your S3 bucket, you can make the live log monitoring feature as responsive as possible. To do this, you'll set up an S3 event notification from your bucket to OnCrawl that will automatically run whenever new files are available.
First, let the OnCrawl customer service manager know so that we can set things up on our end to allow your bucket to send us notifications.
If you need to automate live log monitoring for several projects in the same bucket, you only need to ask us to allow the bucket once.
Then, we'll let you know that it's your turn to create an S3 notification to let OnCrawl know when we need to collect a new file.
To create a notification, follow these steps:
- In your S3 bucket page, navigate to the "Properties" tab and scroll down to "Event notifications".
- Click on "Create event notification".
- Give your new notification a name that will allow you to find it again if necessary. For example:
- Indicate the prefix or suffix of your files in order to narrow down the files that trigger this notification. These should be the same prefixes you provided in the automation for regular retrieval earlier in this article.
- Set up the events that trigger the notification. Choose "All object create events" (Put, Post, Copy, Multipart upload completed) and "Restore completed" under "Restore object events".
We do not need to be notified of any other events, and any other events that are notified will be ignored by OnCrawl.
- Next, configure the destination by selecting "SQS queue", and then selecting "Enter SQS queue ARN"
- Enter the OnCrawl ARN:
If you are unable to validate the notification because of an "Unknown Error" related to the "API response", it's likely that you've tried to set this up before we allowed your bucket.
Check in with your OnCrawl customer service manager to make sure your bucket is allowed and set up the notifications for live log monitoring again.
How long does it take to process log files?
When setting up log monitoring:
- Processing time will depend on the amount of data that needs to be handled. This will take longer the first time you connect your site's log data: we will collect log data for the period of up to 90 days before you activated log monitoring, if it is available.
- Provide correct and complete information. It will take longer to receive your initial log data if our first attempts with incorrect information don't succeed.
When log monitoring is already in place:
- At the end of the day, the information in your log files is processed and analyzed. Log monitoring data is typically available within 24 hours of the logged activity.
Once S3 notifications for new files are set up:
- New live log monitoring data is automatically processed as soon as it is received.
- Live log monitoring for a project might be unavailable for a few minutes if you upload large numbers of files at once, such as during setup or if you run a major synchronization. This can show up as a short period of no activity in live log monitoring, during the past few hours only.
When the rate of files transferred per project is too high, some files might not be processed until the rate of file transfers for your project goes down again. However, no worries: any missing information is not lost, just delayed. It will be automatically added within the next several hours.
If you've noticed a short gap in activity in your live log monitoring in the last few hours, you can check your S3 bucket to see if a large number of files have been created.
Are you interested in log analysis and using Amazon S3 to store your log data? Let us know--we're happy to help!
You can also find this article by searching for:
automación, automation, Amazon S3, AWS S3, AS3