preloader
blog-post

How to setup dbt for Postgres database

Table of Contents

In this article we will see how to initialize and configure a new dbt project for postgres database. Code used in this article can be found here.

  • Here we are using dbt seed to load raw and reference tables just for demo purposes.
  • In real use cases, you should never use dbt seed to load your raw layer, though it can be used to load simple reference tables. All other data ingestion to your RAW layer will happen outside dbt.

Prerequisite

Download and install docker for your platform. Click here for instructions

Start Docker containers

Let’s create the infrastructure required for this demo using Docker containers.

  • Clone this repo

  • Open a new terminal and cd into dbt-docker directory. This directory contains infra components for this demo

  • Create a copy of .env.template as .env

  • Start dbt container by running

    docker-compose up -d --build
    
  • Start Postgres container by running

    docker-compose -f docker-compose-postgres.yml up -d --build
    
  • Validate the container by running

    docker ps
    
  • Navigate to pgAdmin in your browser and login with admin@admin.com and postgres

  • Click on Add New Server and enter following details
    ✅ Name : dbt demo
    ✅ Host Name : postgres
    ✅ User : postgres
    ✅ Password : postgres

    If you change the credentials in .env then please make sure to use them

  • SSH into the dbt container by running

    docker exec -it dbt /bin/bash
    

Create dbt project

  • Validate dbt is installed correctly by running

    dbt --version
    
  • cd into your preferred directory

    cd /C/
    
  • Create dbt project by running

    dbt init dbt-postgres
    

Configure dbt profile

  • Navigate into dbt-postgres directory, create a new dbt profile file profiles.yml and update it with Postgres database connection details

    dbt-postgres:
      target: dev
      outputs:
        dev:
          type: postgres
          host: postgres
          user: postgres
          password: postgres
          port: 5432
          dbname: postgres
          schema: demo
          threads: 3
          keepalives_idle: 0 # default 0, indicating the system default
    

Configure dbt project

  • Edit the dbt_project.yml to connect to the profile which we just created. The value for profile should exactly match with the name in profiles.yml

  • From the project directory, run dbt-set-profile to update DBT_PROFILES_DIR.

    dbt-set-profile is alias to unset DBT_PROFILES_DIR && export DBT_PROFILES_DIR=$PWD

  • Validate the dbt profile and connection by running

    dbt debug
    
  • Update name in dbt_project.yml to appropriate project name (say dbt_postgres_demo)

dbt commands

Delete the contents from data, macros, models, tests and snapshots directories to follow along this demo

Seeds

Seeds are CSV files in your dbt project that dbt can load into your data warehouse.

  • Copy the sample data from dbt-postgres\demo-artifacts\data to dbt-postgres\data

  • Review the seed configuration in dbt_project.yml. The seed configuration should have project name followed by names which should match with the structure of data directory.

    seeds:
      dbt_postgres_demo:
        schema: raw # all seeds in this project will use the mapping schema by default
        imdb:
          schema: imdb # seeds in the `data/imdb/ subdirectory will use the imdb schema
        sakila:
          schema: sakila # seeds in the `data/sakila/ subdirectory will use the sakila schema
          address:
            +column_types:
              phone: varchar(50)
        lookups:
          schema: lookups # seeds in the `data/lookups/ subdirectory will use the lookups schema  
    
  • Load the seed files by running below command

    dbt seed
    
  • Data will be loaded into a schema with dbt_ prefix. To fix this we will create a small macro

Macros

Macros are pieces of code that can be reused multiple times.

  • Copy the macros from dbt-postgres\demo-artifacts\macros\utils to dbt-postgres\macros\utils

  • Macro generate_schema_name uses the custom schema when provided

  • Seed all files by running below command

    dbt seed
    

    This time data will be loaded into the correct schema without the dbt_ prefix

  • Seed select files by running

    dbt seed --select address
    

Sources

Sources make it possible to name and describe the data loaded into your warehouse by your Extract and Load tools.

We could either use ref or source function to use the data which we seeded, but to stay close to a real use case, we will use source function.

  • Copy the source definition from dbt-postgres\demo-artifacts\models\sources\ to dbt-postgres\models\sources\

  • Test sources by running below command

    dbt test --models source:*
    

Packages

dbt packages are in fact standalone dbt projects, with models and macros that tackle a specific problem area.

  • Create a new file named in packages.yml and add package configuration

  • Specify the package(s) you wish to add

    packages:
      - package: dbt-labs/dbt_utils
        version: 0.7.0
      - package: dbt-labs/codegen
        version: 0.4.0
    
  • Install the packages by running

    dbt deps
    

Models

Model is a select statement. Models are defined in .sql file.

  • Copy the model definition from dbt-postgres\demo-artifacts\models\ to dbt-postgres\models\

  • Review the model configuration in dbt_project.yml. The model configuration should have project name followed by names which should match with the structure of models directory

    models:
      dbt_postgres_demo:
          materialized: table
          staging:
            schema: staging
            materialized: view
          marts:
            dim:
              schema: dim
              materialized: table
            fact:
              schema: fact
              materialized: table
          presentation:
            schema: analytics
            materialized: view
    

    Default materialized configuration can be specified at model project level. materialized configuration can be overriden at each directory level

  • To define variables in a dbt project, add a vars config to dbt_project.yml file

    vars:
      start_date: '2005-01-01'
      high_date: '9999-12-31'
    
  • Build models by running below command

    dbt run --models stg_sakila__customer
    dbt run --models staging.*
    dbt run --models +fct_sales --var '{"start_date": "2005-05-24"}'
    dbt run --models +tag:presentation-dim
    dbt run --models +tag:presentation-fact --var '{"start_date": "2005-05-24"}'
    dbt run
    

Tests

Tests are assertions you make about your models and other resources in your dbt project (e.g. sources, seeds and snapshots).

Types of tests:

  • schema tests (more common)

  • data tests: specific queries that return 0 records

  • Generate the yaml for existing models by running

    dbt run-operation generate_model_yaml --args '{"model_name": "dim_customer"}'
    
  • schema tests can be added at table level OR column level. Here is an example of test under column definition in model yaml

    version: 2
    
    models:
      - name: dim_customer
        description: ""
        columns:
          - name: customer_id
            description: ""
            tests:
              - unique
              - not_null
    
  • Execute tests by running below commands

    dbt test --models +tag:presentation-dim
    dbt test --models +tag:presentation-fact
    

Validate data

-- Incremental demo
SELECT cast(date_id AS VARCHAR) as date_id
	,count(1) as rec_count
FROM analytics.sales
GROUP BY date_id
ORDER BY date_id;

-- Revenue by day
SELECT ad.day_name
	,af.film_rating
	,ac.customer_city
	,sum(amount) AS revenue
FROM analytics.sales asa
JOIN analytics.film af ON (af.film_id = asa.film_id)
JOIN analytics.DATE ad ON (ad.date_dim_id = asa.date_id)
JOIN analytics.customer ac ON (ac.customer_id = asa.customer_id)
GROUP BY (
		ad.day_name
		,af.film_rating
		,ac.customer_city
		)
ORDER BY revenue DESC limit 100;

Docs

dbt docs provides a way to generate documentation for your dbt project and render it as a website.

  • You can add descriptions to models, columns, sources in the related yml file

  • dbt also supports docs block using the jinja docs tag

  • Copy sample jinja docs from dbt-postgres\demo-artifacts\docs\ to dbt-postgres\docs\

  • Generate documents by running

    dbt docs generate
    
  • Publish the docs by running

    dbt docs serve --port 8085
    
  • Stop published docs by running ctrl + c

Clean Demo Resources

Open a new terminal and cd into dbt-docker directory. Run the below command to delete the docker containers and related volumes

docker-compose down --volume --remove-orphans

Hope this was helpful. Did I miss something ? Let me know in the comments OR in the forum section.

References

Share this blog:
Comments

Related Articles