preloader
blog-post

Stream AWS CloudWatch logs to Snowflake

Table of Contents

CloudWatch logs provide insight into logs from many AWS services, but third-party observability/application performance monitoring(APM) tools like DataDog, New Relic, Splunk, Sumo Logic enables to perform more sophisticated visualization, alerting, and analysis. Most often, the CloudWatch logs are exported to observability/application performance monitoring(APM) tools using a forwarder Lambda function. CloudWatch logs can also be exported to Snowflake for data analysis and audits.

Learn how to stream AWS CloudWatch logs to Snowflake for easy data analysis. This article provides a step-by-step guide on how to set up the resources required to stream logs from CloudWatch to Snowflake using Terraform. Code used in this article can be found here. In this demo we will use an existing CloudWatch log group and setup CloudWatch log filters, AWS kinesis data firehose, S3 bucket, Snowflake storage integration and Snowpipe to ingest data into Snowflake. If you don’t have a need for near real time streaming, then streams and tasks or scheduled copy statements can be used to ingest the logs into Snowflake RAW layer.

All resource creation in AWS and Snowflake is done using Terraform. This blog assumes that you know the basics of Snowflake, AWS and Terraform.

Prerequisite

  • Download and install docker for your platform. Click here for instructions
  • Create an AWS account for the demo. Click here for instructions
  • Create a Snowflake account for the demo. Click here for instructions

Account setup

AWS

  • Login into your AWS cloud account and navigate to IAM –> Users and create a new user, say terraform. This new user will be used by Terraform to create the AWS resources

  • Edit the permissions of the user and attach the policy AdministratorAccess directly to the user.

    Attaching AdministratorAccess is for the ease of this demo. In an ideal scenario, you should only attach the required permission/policy to the terraform user.

  • Navigate to Security credentials of the IAM user and create a new access key with CLI access only

Snowflake

  • We will need a Snowflake user and role for Terraform to create the rest of the resources

  • Login into Snowflake and execute below SQL to create the user and role

    -- Create role and grant required access for TF to operate
    USE ROLE SECURITYADMIN;
    CREATE ROLE IF NOT EXISTS ENTECHLOG_TF_ROLE;
    
    CREATE USER IF NOT EXISTS ENTECHLOG_TF_USER DEFAULT_ROLE = ENTECHLOG_TF_ROLE;
    GRANT ROLE ENTECHLOG_TF_ROLE TO USER ENTECHLOG_TF_USER;
    
    GRANT CREATE ROLE ON ACCOUNT TO ROLE ENTECHLOG_TF_ROLE;
    GRANT CREATE USER ON ACCOUNT TO ROLE ENTECHLOG_TF_ROLE;
    GRANT MANAGE GRANTS ON ACCOUNT TO ROLE ENTECHLOG_TF_ROLE;
    
    GRANT ROLE ENTECHLOG_TF_ROLE TO ROLE SECURITYADMIN;
    GRANT ROLE ENTECHLOG_TF_ROLE TO ROLE SYSADMIN;
    
    USE ROLE ACCOUNTADMIN;
    GRANT CREATE INTEGRATION ON ACCOUNT TO ROLE ENTECHLOG_TF_ROLE;
    
    USE ROLE SYSADMIN;  
    GRANT CREATE DATABASE ON ACCOUNT TO ROLE ENTECHLOG_TF_ROLE;
    GRANT CREATE WAREHOUSE ON ACCOUNT TO ROLE ENTECHLOG_TF_ROLE;
    

Development environment setup

For the purpose of the demo we will use a docker container called developer-tools which has Terraform and tools required for the demo. If you already have a machine with Terraform, AWS CLI then you can skip this step and proceed with Create resources using Terraform

Start container

  • Clone developer-tools repo

    git clone https://github.com/entechlog/developer-tools.git
    
  • cd into developer-tools directory and create a copy of .env.template as .env. For the purpose of demo, we don’t have to edit any variables

    cd developer-tools
    
  • Start the container

    docker-compose -f docker-compose-reg.yml up -d --build
    

Validate container

  • Validate the containers by running

    docker ps
    
  • SSH into the container

    docker exec -it developer-tools /bin/bash
    
  • Validate terraform version by running below command

    terraform --version
    

Create AWS profile

  • Create AWS profile named terraform by running the below command

    aws configure
    
  • Update the account details with values from AWS account setup

  • Test the profile by running the command

    aws sts get-caller-identity
    

    AWS profile can be also configured by setting the environment variables AWS_DEFAULT_REGION, AWS_SECRET_ACCESS_KEY, AWS_ACCESS_KEY_ID with the value from AWS account setup

Create resources using Terraform

AWS

  • To stream the data from CloudWatch to Snowflake, we would need the following resources to be created in AWS

    Resource Description
    CloudWatch Logs subscription filters Subscription filters are used to get real-time feed of log events from CloudWatch Logs to Kinesis Data Firehose. Within a log group, if we need only specific logs to be delivered to Firehose we can limit them using a filter pattern. Each log group can have up to two subscription filters associated with it
    Kinesis Data Firehose Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon S3 or to HTTP endpoints owned by third-party service providers, including Datadog, Dynatrace, LogicMonitor, New Relic, and Sumo Logic. Kinesis Data Firehose can also transform the data before delivering it to the destination
    S3 bucket S3 Bucket to store the CloudWatch Logs delivered by Kinesis Data Firehose
    S3 bucket notification Notify creation of new s3 objects to sqs queue used by Snowpipe for auto ingestion
    IAM roles and policies We would need multiple IAM roles and policies to make this integration work securely.
    1) Role to deliver the logs from CloudWatch to Firehose
    2) Role to deliver the logs from Firehose to S3
    3) Role for Snowflake to read data from S3
  • Clone the snowflake-examples repo to get started

  • cd into snowflake-examples/snow-cloudwatch-logs/terraform/aws. This directory contains terraform files with the required resource definition to create AWS components

  • Terraform variables are defined inside a file called variables.tf. The values for variables are specified in terraform.tfvars for local runs. Create terraform.tfvars from terraform.tfvars.template and update with the details about Snowflake storage integration and Snowflake sqs queue for receiving the s3 bucket notification. During the initial run Snowflake integration details won’t be available, so they will be set to the default value

  • Create the resources by applying the terraform template

    # install custom modules
    terraform init -upgrade
    
    # format code
    terraform fmt -recursive
    
    # plan to review the summary of changes
    terraform plan
    
    # apply the changes to target environment
    terraform apply
    
  • Capture the output from this step, this will be used as input variables for the Snowflake deployment. You could also avoid this two-step deployment and try to get the values by reference

    Variable Name Description
    aws_iam_role__s3_to_snowflake_delivery_role__arn IAM role used by snowflake to access the S3 bucket
    aws_s3_bucket__cloudwatch_logs__id S3 bucket name for creating the external stage in Snowflake

Snowflake

  • To stream the data from CloudWatch to Snowflake, we would need the following resources to be created in Snowflake

    Resource Description
    Database & Schema Snowflake database and schema for storing the RAW data
    Storage Integration Snowflake object to store generated identity and access management (IAM) entity for s3
    Stage Snowflake stage to specify where data files are stored (i.e. “staged”)
    File format Snowflake file format to encapsulate information of data files, such as file type (CSV, JSON, etc.)
    Table Snowflake table to persist the RAW data from S3
    Snowpipe Snowpipe to trigger the data ingestion copy statements based on file creation event in sqs
  • cd into snowflake-examples/snow-cloudwatch-logs/terraform/snowflake. This directory contains terraform files with the required resource definition to create Snowflake components

  • Create terraform.tfvars from terraform.tfvars.template and update with the Snowflake account and database details. Here are the variables specific to storage integration

    Name Description
    snowflake_storage_integration__storage_allowed_locations S3 location that storage integration will be allowed to access, this the s3 bucket path from the AWS terraform output
    snowflake__aws_role_arn IAM role name from AWS terraform output
  • Create the resources by applying the terraform template

    # install custom modules
    terraform init -upgrade
    
    # format code
    terraform fmt -recursive
    
    # plan to review the summary of changes
    terraform plan
    
    # apply the changes to target environment
    terraform apply
    
  • Capture the output from this step, this will be used as input variables for the Snowflake deployment. You could also avoid this two-step deployment and try to get the values by reference

    Variable Name Description
    snowflake_pipe__notification_channel This is the sqs queue which will be auto created when creating a snowpipe
    snowflake_storage_integration__storage_aws_external_id The external ID that is needed to establish a trust relationship. External ID is required to grant access to your AWS resources (i.e. S3) to a third party (i.e. Snowflake)
    snowflake_storage_integration__storage_aws_iam_user_arn The AWS IAM user created for Snowflake account. All S3 storage integrations use that IAM user

Update AWS resources

To grant correct permissions to AWS resources, update the aws terraform variables in terraform.tfvars with the output from Snowflake and re-run the AWS terraform template. This should update AWS resources with correct ARN, external id and the notification channel.

Generate CloudWatch logs

After configuring the resources in AWS and Snowflake, to stream CloudWatch logs, generate some logs by triggering the lambda function OR by performing the action which would generate the required logs.

Validate data in Snowflake

  • Validate the file format, states and integration by running below sql

    USE DATABASE <database-name>;
    USE SCHEMA <schema-name>;
    
    -- validate stages
    SHOW STAGES;
    DESC STAGE <stage-name>;
    LIST @<stage-name>;
    
    -- validate data in stage
    SELECT src.metadata$filename AS file_name,
      src.metadata$file_row_number AS file_row_number,
      src.metadata$file_content_key AS file_content_key,
      src.metadata$file_last_modified AS file_last_modified,
      src.metadata$start_scan_time,
      src.$1 AS cloudwatch_log
    FROM @<stage-name> src;
    
  • Validate the snowpipe by running following sql

    USE DATABASE <database-name>;
    USE SCHEMA <schema-name>;
    
    -- validate pipes
    SHOW PIPES;
    DESC PIPE <pipe-name>;
    SELECT SYSTEM$PIPE_STATUS('<database-name>.<schema-name>.<pipe-name>');
    
    -- validate copy history
    SELECT *
    FROM TABLE(information_schema.copy_history(TABLE_NAME=>'<database-name>.<schema-name>.<table-name>', START_TIME=> DATEADD(hours, -1, CURRENT_TIMESTAMP())));
    
    -- validate data in final table
    SELECT * FROM <database-name>.<schema-name>.<table-name>;
    
    SELECT 
    VALUE:timestamp::VARCHAR AS timestamp, 
    VALUE:id::VARCHAR AS id, 
    VALUE:message::VARCHAR AS message
    FROM <database-name>.<schema-name>.<table-name>,
    lateral flatten( input => "cloudwatch_log":logEvents)
    ORDER BY timestamp DESC;
    

Hope this was helpful. Did I miss something ? Let me know in the comments OR in the forum section.

This blog represents my own viewpoints and not of my employer, Snowflake. All product names, logos, and brands are the property of their respective owners.

References

Share this blog:
Comments

Related Articles