How to chat with data in Snowflake using ChatGPT, dbt, and Streamlit

How to chat with data in Snowflake using ChatGPT, dbt, and Streamlit

🕑 Published : Jul 16, 2023 | 🕣 Updated : Jul 16, 2023 | ⏳ 8 Min read

📁Snowflake #️⃣Snowflake ChatGPT Dbt

Table of Contents

Unleash the Power of Chatting with Data: A Deep Dive into Snowflake, ChatGPT, dbt, and Streamlit

In the era of data-driven decision making, being able to ‘converse’ with your data can unlock new levels of insights and understanding. Imagine posing queries as if you were chatting with an analyst, and receiving responses in natural, human-like language. This is no longer the stuff of science fiction, but a reality made possible by the fusion of Snowflake, OpenAI’s ChatGPT, dbt, and Streamlit.

In this comprehensive guide, I will take you on a fascinating journey where we don’t just analyze data - we chat with it! Together, we will delve into the heart of setting up a Docker environment, initiating interactive queries with Snowflake, and crafting intuitive Streamlit applications to facilitate a seamless dialogue with your data.

Whether you’re a data professional seeking to elevate your skills or a curious enthusiast eager to dabble in the realms of data and AI, this guide has something for you. Not only will you understand the technology underpinning these interactions, but you’ll also gain hands-on experience as you follow along with the code.

By the end of this guide, you’ll be able to transform your approach to data, shifting from traditional querying to engaging, insightful, and intuitive conversations. So, are you ready to chat with your data?

The code used in this article can be found here. Dive in, explore, and let’s bring data to life !

This blog assumes that you have a basic understanding of Docker, dbt, Snowflake, and ChatGPT.

Prerequisite

Download and install Docker for your platform. Click here for instructions
Create a Snowflake account for the demo. Click here for instructions
Create an OpenAI account for the demo. Click here for instructions

Account setup

OpenAI

Log in to your OpenAI account, navigate to API keys, and create an API key named streamlit. This API key will enable Streamlit to interact with OpenAI

If you already have an API key, it’s recommended to create a new one instead of reusing keys created for different use cases

Snowflake

A Snowflake user and role will be required for Streamlit to perform queries
Log into Snowflake and execute the SQL provided here to create the user, role, and warehouse

Be sure to update the names and credentials based on your setup.

Development environment setup

For this demo, we will utilize a Docker container named developer-tools, which comes pre-equipped with dbt. If you’ve already set up a machine with dbt, feel free to skip this step and proceed directly to setup the container for Streamlit.

Container for dbt

Clone the developer-tools repo from here by running the command:
```
git clone https://github.com/entechlog/developer-tools.git
```
Navigate into the developer-tools directory and create a copy of .env.template as .env. For this demo, there’s no need to alter any variables:
```
cd developer-tools
```

Initiate the container using the following command

docker-compose -f docker-compose-reg.yml up -d --build

Container for streamlit

Start by cloning the snowflake-examples repository from GitHub. You can do this by running the following command in your terminal:
```
git clone https://github.com/entechlog/snowflake-examples.git
```
Navigate into the snowflake-examples\snow-chat\streamlit\src\.streamlit\ directory and here, you will find a file named secrets.toml.template. Create a copy of this file and rename it as secrets.toml. Update the secrets.toml file with your OpenAI and Snowflake credentials. These credentials will allow Streamlit to interact with Snowflake and OpenAI
This container will be utilized later when we start the Streamlit app

Validate container

Run the following command to verify the status of your containers:
```
docker ps
```
To interact with your container via SSH, execute the following command:
```
docker exec -it developer-tools /bin/bash
```
Once inside the container, validate the dbt version by running the following command
```
dbt --version
```

Download dataset

Download your dataset in preparation for the upcoming steps. If you are using a dataset from an external source, ensure you have the appropriate permissions and rights to use the data. We will be using yellow taxi trip data from Kaggle and data dictionary for this data can be found here

Generate DW models

Using a prompt in a ChatGPT chat session, generate the data model for the demo
Here’s the generated model. Keep in mind that the generated model might vary slightly. Always review the output and make any necessary changes to better fit your needs

Table Name	Fields
dim_vendor	vendor_id (PK), vendor_code, vendor_name
dim_date	date_id (PK), date, year, month, day, day_name, month_name, day_of_week, day_of_year, is_weekend, is_holiday
dim_rate	rate_id (PK), rate_code, rate_name
dim_location	location_id (PK), longitude, latitude
dim_payment_type	payment_type_id (PK), payment_type_code, payment_type_name
fact_trip	trip_id (PK), vendor_id (FK), pickup_date_id (FK), dropoff_date_id (FK), pickup_location_id (FK), dropoff_location_id (FK), rate_id (FK), payment_type_id (FK), passenger_count, trip_distance, pickup_timestamp, dropoff_timestamp, fare_amount, extra, mta_tax, tip_amount, tolls_amount, improvement_surcharge, total_amount

Generate dbt models

In this section, we will generate dbt model SQL and schema YAML files, and prepare them for integration with our project:

Generate the SQL files for the dbt model using this prompt
Generate the YAML files for the dbt schema using this prompt
Review the generated SQL and YAML for correctness and make any necessary changes. You can add additional prompts until you get the correct SQL and YAML
The required SQL and YAML files have been pre-copied into the models directory. Take a look at snow-chat/dbt/models for further details
Finally, copy the ChatGPT-generated dbt models into your dbt project’s models directory

Load data into Snowflake

Next, we will load our data into Snowflake. Here are the steps for a successful data load:

Open your code editor (e.g., VSCode) and navigate into the dbt directory. Here, create a new dbt profile file named profiles.yml and update it with your database connection details
Modify the dbt_project.yml file to connect to the profile you just created. Ensure that the profile value matches the name in profiles.yml exactly (in this case, snow_chat)
From within the developer-tools container and project directory, run dbt-set-profile to update the DBT_PROFILES_DIR. This step will make it easier for you to switch between multiple dbt projects in the future
Validate your dbt profile and its connection by running the command dbt debug
Execute dbt seed to load your RAW tables
Next, run dbt run to load your DW tables

Log into Snowflake and execute the below SQL commands to validate data in both RAW and DW layers

SELECT 'No of records in RAW table' as desc, COUNT (*) as count FROM "TST_ENTECHLOG_RAW_DB"."YELLOW_TAXI"."TRIP_DATA_NYC"
UNION ALL
SELECT 'No of records in FACT table' as desc, COUNT (*) as count FROM "TST_ENTECHLOG_DW_DB"."FACT"."TRIPS";

Create Streamlit app

Streamlit is an open-source app framework for Machine Learning and Data Science teams. Create beautiful web apps in minutes. In this section, we’ll guide you through the steps to create a Streamlit application:

Streamlit app details: The application is designed to scan all tables within the specified input schemas, gathering details such as table name, table description, column name, and column description. This information is then utilized in conjunction with OpenAI to facilitate the interactive chat functionality. For a more in-depth understanding, please refer to the source code here
- get_table_context contains the logic to retrieve metadata from the Snowflake information schema
- get_system_prompt incorporates the logic to read the prompt along with the table context for making the OpenAI call
- The code is currently configured to use `gpt-3.5-turbo`` model. However, you can update this to use a different model based on your use case
Start the container: Use the following command to kickstart your container:
```
docker-compose -f docker-compose-streamlit.yml up -d --build
```
Validate the containers: Confirm that your containers are running as expected with the following command:
```
docker-compose -f docker-compose-streamlit.yml ps
```
SSH into the container: Gain access to your container’s shell by running:
```
docker exec -it streamlit /bin/bash
```
Validate Streamlit version: Ensure you’re running the correct version of Streamlit by checking the version:
```
streamlit --version
```
Verify the app’s status: Make sure your Streamlit app is running and listening properly. To do this, check the logs for any potential issues. If all is well, navigate to your Streamlit application at http://localhost:8501/
Verify the connections: Navigate to the 🌍 Validate Connection - Open AI and ❄️ Validate connection - Snowflake pages to verify if the connections are working correctly. If they are, then proceed to the main app page, 💻 Snow Chat
Start chatting: Now that your Streamlit app is up and running, you can start interacting with your Snowflake data. Feel free to ask any data-related questions you might have !

Clean up resources

Once you’re finished with your tasks, it’s good practice to clean up the resources that are no longer needed. This will free up system resources:

To stop the container, use the down command with Docker Compose. The –volumes flag will also remove any defined volumes in your services. The –remove-orphans flag will remove containers for services not defined in the Compose file. Run the following command in your terminal:

docker-compose -f docker-compose-streamlit.yml down --volumes --remove-orphans

For a hands-on, guided experience, check out below video

Hope this was helpful. Did I miss something ? Let me know in the comments OR in the forum section.

This blog represents my own viewpoints and not those of my employer, Snowflake. All product names, logos, and brands are the property of their respective owners.

Reference

Comments

Visualizing Snowpipe usage with dbt and Superset

Snowpipe is Snowflake’s continuous data ingestion service. Snowpipe loads data within minutes after files are …

Siva Nadesan

Mar 22, 2022 - 4 Min read

#Dbt #Snowflake #Superset

Exploring dbt with Snowflake

In this article we will see how to use dbt with Snowflake. dbt (data build tool) does the T in ELT (Extract, Load, …

Siva Nadesan

Jan 19, 2021 - 21 Min read

#Dbt #Docker #Snowflake