Import all logs to S3 bucket using DAS (Data Activity Stream) feature in Aurora Postgre RDS

Recently there was a requirement where I have to work on publishing all the activities (in real time) that is being done in a RDS instance (Aurora in this case) to S3 bucket as readable log files into a S3 bucket. So after researching and working on it by getting bits and pieces from different articles on net, I thought of writing a comprehensive article about it covering everything to achieve this pipeline.

Following steps illustrate on how we can configure real time monitoring on AWS Aurora PostGres using the DAS(Data Activity Stream) feature. After the successful setup of this pipeline , any action taken on the Aurora instance will be audited and logged to the S3 bucket.

The setup to achieve the above architecture is fairly simple. We should have a Postgres Aurora instance. Then we will have to enable the activity stream feature for the instance. To do this a KMS key should be created. Once the activity stream is enabled aws internally creates a kinesis stream automatically. This can be integrated with firehose along with the power of lambda for dumping the audit logs into the S3 bucket. The raw files in s3 can be then converted to a JSON formatted file using any online JSON converter. Please look at the architecture diagram for easy understanding. Detailed step as follows.

[1] Create an Amazon Aurora Instance
[2] Create a new KMS key
[3] Enable Data Activity stream on the Aurora.
[4] Create an S3 bucket to keep the DAS logs
[5] Create the AWS Lambda function to decrypt the stream.
[6] Configure Lambda and all IAM permissions.
[7] Create a kinesis firehose to stream the logs from kinesis to the S3 bucket
[8] Verify if the pipeline is working properly.

[1] Creating an Aurora Postgre instance.

Goto the RDS page in AWS portal and click on the “Create database” button.

Follow the screenshots below to create an Aurora instance for DAS testing.

Choose Standard create option on the RDS page and select the “Aurora (PostgreSQL Compatible)” option.

Choose the Dev/Test option if you are testing . Choose Production if you are creating a prod instance.

For Aurora PostgreSQL, you can use database activity streams with the following DB instance classes:

db.r7g.*large

db.r6g.*large

db.r6i.*large

db.r6id.*large

db.r5.*large

db.r4.*large

db.x2g.*

Reference :- https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/DBActivityStreams.Overview.html

You can choose the VPC as per your need and organisation’s network\domain.

You can disable “DevOps Guru” . It is not needed for this particular case

You can choose the DB parameter group as per your requirements or leave it at default.

Choose to “Enable encryption”

Click on Create Database to create the Aurora cluster. It will take around 10-15 minutes for the instance to come online.

[2] Create a New KMS key to configure DAS.

We need to create a KMS key before we can enable the activity stream on the instance. KMS key will be used to configure the DAS(Data Activity Stream).

Search for KMS in the AWS Portal as shown below and follow the steps shown in the screenshots below.

Click on “Create a Key”

In the “Add labels” page give a name for your key in the “Alias” box.

Leave default values in “Define Key administrative permissions”page

Leave default values in “Define Key usage permissions”

Let’s now submit the key creation form.

Once the key is created let us go back to the RDS instance and enable the Activity Stream on the instance. We will have to specify the created key while enabling this option. Let’s do that as follows.

[3] Enable Data Activity stream on the Aurora.

To enable data activity stream on the aurora cluster please follow the below screenshots. select (click on radio button) your Aurora Postgre cluster and then open the drop down of the “Action” button and click on “Start database activity stream” .

Specify the KMS key that you created in step [2] and make sure to select the apply immediate option. No need to wait for the scheduled maintenance option.

Select the Asynchronous mode and select the “Immediately” radio button.

After clicking “Start database activity stream” the RDS instance will show as “Configuring activity stream” as status :

Wait for the RDS instance to become available. It should take 10 minutes.

Please note that once we activate the stream a kinesis stream will be automatically set up and it can be viewed by looking at the
configuration details of the aurora instance created as shown below.
We will later create a kinesis firehose using this stream as the source to dump the activity logs to s3.

[4] Create S3 Bucket to store the logs in.

Search for S3 bucket in the AWS services section and click on it

Click on create bucket

Fill the details as below and create.

Select “Block all public access”

You can disable the Bucket versioning. Its not needed in this scenario.

Select “Server Side Encryption with Amazon S3 amanaged Keys (SSE-S3)”

Click on the “Create bucket

[5] Create the AWS Lambda function to decrypt the stream.

Now we have a Lambda function on Github created by AWS for this particular task of decrypting the kinesis logs and making it in readable format. Deploying it in AWS is little tricky so carefully follow the below steps as it is. Here we are using AWS cloud shell along with SAM CLI commands to creade a cloudformation package.

This lambda requires python 3.8 libraries, so we will use the AWS CLI to bundle the libraries and deploy it on the AWS . Please follow the below steps to create\deploy Lambda package on the AWS subscription.

Please follow the below steps as it is:

Before you run the below command please keep your Aurora ResourceID , RegionName and BucketName you have created for this activity in a notepad. You will need it to input in below steps.

ResourceID – You can get this on your Aurora cluster page in the “Configuration” tab.
RegionName – Example us-east-2 or us-east-2 (whichever is your region)
BucketName – Name of the S3 Bucket you have created for this activity.

Step A:
Open cloudshell on your AWS console.

Run below command one by one (as shown in screenshot.)

python3 -m venv venv
source venv/bin/activate

Step B: Install the Python 3.8 using below command

To install python 3.8.17 and package it in AWS CLI

Run the below commands one by one. (Refer Screenshot)

sudo yum install gcc openssl-devel bzip2-devel libffi-devel

When you get this prompt:
Total download size: 50 M
Is this ok [y/d/N]:

Type y and press enter.

When above command completes , copy the below (one by one) and press enter.

wget https://www.python.org/ftp/python/3.8.17/Python-3.8.17.tgz

tar xzf Python-3.8.17.tgz

cd Python-3.8.17

./configure --enable-optimizations

After the above one completes , run below command.

sudo make altinstall
wait for it to completely finish then only proceed to next steps. it will take 5 minutes for the command to finish

Step C : Upload the below zip file into aws cli. (make sure everything happens in one AWS CLi session) You will be in the “Python-3.8.17” folder in the cloudshell, so to come out of it run the command cd .. To come one folder out of the python folder path, as shown in below screenshot

Once you are back in the cloud shell home folder like shown below

Download the attached zip file on your local laptop and then we will upload it to the AWS Cli in the cloud shell home folder like shown below.

The above zip file can be found at “https://github.com/aws-samples/aurora-das-processing” for reference. (The file present on the github has indentation errors in the python file). I have fixed it all and have attached the zip file for you to use directly.

You can check if the file is there in the path or not by using ls -l

Now run the below command to unzip the zip file.

Unzip the uploaded file:

unzip aurora-das-processing-final

Go inside the extracted file:

cd aurora-das-processing-Final

Now we have to deploy this package as the Lambda:

Type Command one by one..:

sam build

Now run the below command in the cloudshell.

sam deploy --guided

Fill in the details as per your AWS environment. Like shown in the below table. (You have the details in your notepad).

Fill the details as mentioned below:

Setting default arguments for ‘sam deploy’
=========================================
Stack Name [sam-app]: LambdaForKinesisAuroraV2 ——————– Give the name for your cloud formation stack you can choose any name of your liking
AWS Region [us-east-1]: —————————————————————– Give your Region Name
Parameter BucketNamePrefix [dastestbucket]:—————————— Give your destination S3 Bucket Name
Parameter KeyName []: ——————————————————————- Just Press enter and leave it blank
Parameter RegionName [us-east-1]: ———————————————- Just Press enter and leave it blank
Parameter AuroraResourceID []:—————————————————– Give the Aurora cluster resource ID.
#Shows you resources changes to be deployed and require a ‘Y’ to initiate deploy
Confirm changes before deploy [y/N]: —————————————– Just Press enter and leave it blank
#SAM needs permission to be able to create roles to connect to the resources in your template
Allow SAM CLI IAM role creation [Y/n]: Y ———————————— Type Y and press enter
#Preserves the state of previously provisioned resources when an operation fails
Disable rollback [y/N]: —————————————————————– Just Press enter and leave it blank
Save arguments to configuration file [Y/n]: Y——————————- Type Y and press enter
SAM configuration file [samconfig.toml]: ———————————– Just Press enter and leave it blank
SAM configuration environment [default]: ———————————- Just Press enter and leave it blank

You can check the Status and see the resource created in the CloudFormation on the Aws console.

Click on the stack name and open it.

Click and open the Lambda Function in a new window to configure
It.

[6] Configure Lambda and all IAM permissions:

For the whole pipeline to work properly, we need proper configurations and permissions.
So , make sure the permissions of every component is same as shown below.

Lambda Configuration:
Open your Lambda created and change the Timeout to 1 minute.


Check the environment variable section to see your details are correct, if not carrect you can edit it and input correct details. (as shown in the sreenshot)

On you Lambda main page , click on actions as shown below and click
“Publish new version” .

Give the new version as “1” and click on “Publish” as shown below.

As shown in the below image click on the Role name and open it in a new window, this is basically the IAM role that has been created by the process of lambda Deploy. We will check and set its permissions.

Make sure the role of Lambda has all the below permissions, *if it’s not there then please add it. *

Once the above steps are done we will move to configure the Kinesis FireHose to put the logs to the S3 Bucket.

[7] Create a Firehose delivery to stream the logs from kinesis to the S3 bucket:

Let’s now create the kinesis firehose using the kinesis stream as the source. This will relay the activity stream from the kinesis steam to the s3 bucket. Make sure that you choose the source as the kinesis data stream and the target as the s3 bucket. In our case, the kinesis is automatically created when we enabled the activity stream and the S3 is created in the above section. The name of the automatically created stream can be fetched from the configuration section of the aurora rds.

select the options as shown below and click on “Create Delivery Stream”

Amazon Kinesis Data Firehose Defination:
Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), and Splunk.
With Kinesis Data Firehose, you don’t need to write applications or manage resources. You configure your data producers to send data to Kinesis Data Firehose, and it automatically delivers the data to the destination that you specified. You can also configure Kinesis Data Firehose to transform your data before delivering it.

Get “Kinesis data stream” name from “configuration” tab of the Aurora RDS cluster. (Below screenshot is from our Aurora RDS page where Kinesis link is present)

when we click on “Browse” button our kinesis stream name will be seen . (you can match the name from RDS page and select the one which is yours)

Give some name to your Kinesis FireHose stream:

In below window we will select the lambda function that we have created to transform\decrypt the data
And put it in the S3 bucket.

When you will click on “version or alias” , choose the version as 1 (Remember we published a version 1 in the lambda steps)

Choose the destination by clicking browse. We will see the name of S3 bucket created in step 4.

Click on Create the delivery system. and a FireHose stream will be created.

Now lets give the IAM role of Kinesis FireHose proper rights.

On your Kinesis firehose page go down and you will find IAM role for your firehose.

click on the IAM role and let it open . Make sure the IAM role has below permissions as the screenshots. Give the below permissions as the screenshot.

You need to manually add:
AmazonKinesisFirehoseFullAccess
AmazonS3FullAccess
AWSLambda_FullAccess

KinesisFirehoseServicePolicy-**************-us-east-2 – something like this will be already present so no need to add.

Final Configuration for the IAM of Kinesis and IAM of Lambda.

copy the names of both the IAMs from your Kinesis Firehose and IAM of the Lambda function and paste it in notepad somewhere.

Now go to your KMS key as shown below.

Add both the IAMs in the “Key users” area of the KMS key.

[8] Verify if the pipeline is working properly.:

Perform these below steps first for the pipeline to start working seamlessly.(Logs get decrypted only after below steps is done) .
Step a: Go to your Aurora cluster and stop it temporarily. Wait for it to stop completely.
Step b: Go to you S3 bucket that you have created and delete all the folders inside it and make it clean.
Step c: Then start your Aurora Cluster again . and wait for it to completely come online.

Connect to your Aurora instance using PGAdmin and create few databases , create some tables in it
Drop some tables and drop some database.
Wait for 10 minutes after doing above activities and open your S3 bucket

Inside your S3 bucket you will find a folder named “parsed” as shown in the screenshot.

When you open the parsed folder you will find sub folders as shown below. These
Sub folders contains the logs for the different actions you have performed on the
Aurora instance.

Like “CREATE DATABASE/” will contain the logs for the new database created .

Go inside the “CREATE DATABASE/” folder and open the text file.

You will see the content inside the folder as below.

Copy the content and open the website “https://jsonformatter.curiousconcept.com/

Paste the content of the txt file inside the box on the website as shown below.

When you click on process button. The data will be formatted nicely in a
Readable way as shown below.