preloader
blog-post

How to chat with data in Snowflake using ChatGPT, dbt, and Streamlit

Table of Contents

Unleash the Power of Chatting with Data: A Deep Dive into Snowflake, ChatGPT, dbt, and Streamlit

In the era of data-driven decision making, being able to ‘converse’ with your data can unlock new levels of insights and understanding. Imagine posing queries as if you were chatting with an analyst, and receiving responses in natural, human-like language. This is no longer the stuff of science fiction, but a reality made possible by the fusion of Snowflake, OpenAI’s ChatGPT, dbt, and Streamlit.

In this comprehensive guide, I will take you on a fascinating journey where we don’t just analyze data - we chat with it! Together, we will delve into the heart of setting up a Docker environment, initiating interactive queries with Snowflake, and crafting intuitive Streamlit applications to facilitate a seamless dialogue with your data.

Whether you’re a data professional seeking to elevate your skills or a curious enthusiast eager to dabble in the realms of data and AI, this guide has something for you. Not only will you understand the technology underpinning these interactions, but you’ll also gain hands-on experience as you follow along with the code.

By the end of this guide, you’ll be able to transform your approach to data, shifting from traditional querying to engaging, insightful, and intuitive conversations. So, are you ready to chat with your data?

The code used in this article can be found here. Dive in, explore, and let’s bring data to life !

This blog assumes that you have a basic understanding of Docker, dbt, Snowflake, and ChatGPT.

Prerequisite

  • Download and install Docker for your platform. Click here for instructions
  • Create a Snowflake account for the demo. Click here for instructions
  • Create an OpenAI account for the demo. Click here for instructions

Account setup

OpenAI

  • Log in to your OpenAI account, navigate to API keys, and create an API key named streamlit. This API key will enable Streamlit to interact with OpenAI

    If you already have an API key, it’s recommended to create a new one instead of reusing keys created for different use cases

Snowflake

  • A Snowflake user and role will be required for Streamlit to perform queries

  • Log into Snowflake and execute the SQL provided here to create the user, role, and warehouse

    Be sure to update the names and credentials based on your setup.

Development environment setup

For this demo, we will utilize a Docker container named developer-tools, which comes pre-equipped with dbt. If you’ve already set up a machine with dbt, feel free to skip this step and proceed directly to setup the container for Streamlit.

Container for dbt

  • Clone the developer-tools repo from here by running the command:

    git clone https://github.com/entechlog/developer-tools.git
    
  • Navigate into the developer-tools directory and create a copy of .env.template as .env. For this demo, there’s no need to alter any variables:

    cd developer-tools
    
  • Initiate the container using the following command

    docker-compose -f docker-compose-reg.yml up -d --build
    

Container for streamlit

  • Start by cloning the snowflake-examples repository from GitHub. You can do this by running the following command in your terminal:

    git clone https://github.com/entechlog/snowflake-examples.git
    
  • Navigate into the snowflake-examples\snow-chat\streamlit\src\.streamlit\ directory and here, you will find a file named secrets.toml.template. Create a copy of this file and rename it as secrets.toml. Update the secrets.toml file with your OpenAI and Snowflake credentials. These credentials will allow Streamlit to interact with Snowflake and OpenAI

  • This container will be utilized later when we start the Streamlit app

Validate container

  • Run the following command to verify the status of your containers:

    docker ps
    
  • To interact with your container via SSH, execute the following command:

    docker exec -it developer-tools /bin/bash
    
  • Once inside the container, validate the dbt version by running the following command

    dbt --version
    

Download dataset

Download your dataset in preparation for the upcoming steps. If you are using a dataset from an external source, ensure you have the appropriate permissions and rights to use the data. We will be using yellow taxi trip data from Kaggle and data dictionary for this data can be found here

Generate DW models

  • Using a prompt in a ChatGPT chat session, generate the data model for the demo

  • Here’s the generated model. Keep in mind that the generated model might vary slightly. Always review the output and make any necessary changes to better fit your needs

Table Name Fields
dim_vendor vendor_id (PK), vendor_code, vendor_name
dim_date date_id (PK), date, year, month, day, day_name, month_name, day_of_week, day_of_year, is_weekend, is_holiday
dim_rate rate_id (PK), rate_code, rate_name
dim_location location_id (PK), longitude, latitude
dim_payment_type payment_type_id (PK), payment_type_code, payment_type_name
fact_trip trip_id (PK), vendor_id (FK), pickup_date_id (FK), dropoff_date_id (FK), pickup_location_id (FK), dropoff_location_id (FK), rate_id (FK), payment_type_id (FK), passenger_count, trip_distance, pickup_timestamp, dropoff_timestamp, fare_amount, extra, mta_tax, tip_amount, tolls_amount, improvement_surcharge, total_amount

Generate dbt models

In this section, we will generate dbt model SQL and schema YAML files, and prepare them for integration with our project:

  • Generate the SQL files for the dbt model using this prompt
  • Generate the YAML files for the dbt schema using this prompt
  • Review the generated SQL and YAML for correctness and make any necessary changes. You can add additional prompts until you get the correct SQL and YAML
  • The required SQL and YAML files have been pre-copied into the models directory. Take a look at snow-chat/dbt/models for further details
  • Finally, copy the ChatGPT-generated dbt models into your dbt project’s models directory

Load data into Snowflake

Next, we will load our data into Snowflake. Here are the steps for a successful data load:

  • Open your code editor (e.g., VSCode) and navigate into the dbt directory. Here, create a new dbt profile file named profiles.yml and update it with your database connection details

  • Modify the dbt_project.yml file to connect to the profile you just created. Ensure that the profile value matches the name in profiles.yml exactly (in this case, snow_chat)

  • From within the developer-tools container and project directory, run dbt-set-profile to update the DBT_PROFILES_DIR. This step will make it easier for you to switch between multiple dbt projects in the future

  • Validate your dbt profile and its connection by running the command dbt debug

  • Execute dbt seed to load your RAW tables

  • Next, run dbt run to load your DW tables

  • Log into Snowflake and execute the below SQL commands to validate data in both RAW and DW layers

    SELECT 'No of records in RAW table' as desc, COUNT (*) as count FROM "TST_ENTECHLOG_RAW_DB"."YELLOW_TAXI"."TRIP_DATA_NYC"
    UNION ALL
    SELECT 'No of records in FACT table' as desc, COUNT (*) as count FROM "TST_ENTECHLOG_DW_DB"."FACT"."TRIPS";
    

Create Streamlit app

Streamlit is an open-source app framework for Machine Learning and Data Science teams. Create beautiful web apps in minutes. In this section, we’ll guide you through the steps to create a Streamlit application:

  • Streamlit app details: The application is designed to scan all tables within the specified input schemas, gathering details such as table name, table description, column name, and column description. This information is then utilized in conjunction with OpenAI to facilitate the interactive chat functionality. For a more in-depth understanding, please refer to the source code here

    • get_table_context contains the logic to retrieve metadata from the Snowflake information schema
    • get_system_prompt incorporates the logic to read the prompt along with the table context for making the OpenAI call
    • The code is currently configured to use `gpt-3.5-turbo`` model. However, you can update this to use a different model based on your use case
  • Start the container: Use the following command to kickstart your container:

    docker-compose -f docker-compose-streamlit.yml up -d --build
    
  • Validate the containers: Confirm that your containers are running as expected with the following command:

    docker-compose -f docker-compose-streamlit.yml ps
    
  • SSH into the container: Gain access to your container’s shell by running:

    docker exec -it streamlit /bin/bash
    
  • Validate Streamlit version: Ensure you’re running the correct version of Streamlit by checking the version:

    streamlit --version
    
  • Verify the app’s status: Make sure your Streamlit app is running and listening properly. To do this, check the logs for any potential issues. If all is well, navigate to your Streamlit application at http://localhost:8501/

  • Verify the connections: Navigate to the 🌍 Validate Connection - Open AI and ❄️ Validate connection - Snowflake pages to verify if the connections are working correctly. If they are, then proceed to the main app page, 💻 Snow Chat

  • Start chatting: Now that your Streamlit app is up and running, you can start interacting with your Snowflake data. Feel free to ask any data-related questions you might have !

Clean up resources

Once you’re finished with your tasks, it’s good practice to clean up the resources that are no longer needed. This will free up system resources:

To stop the container, use the down command with Docker Compose. The –volumes flag will also remove any defined volumes in your services. The –remove-orphans flag will remove containers for services not defined in the Compose file. Run the following command in your terminal:

docker-compose -f docker-compose-streamlit.yml down --volumes --remove-orphans

For a hands-on, guided experience, check out below video

Hope this was helpful. Did I miss something ? Let me know in the comments OR in the forum section.

This blog represents my own viewpoints and not those of my employer, Snowflake. All product names, logos, and brands are the property of their respective owners.

Reference

Share this blog:
Comments

Related Articles