How to initialize and configure dbt for Postgres database

  • Home
  • /
  • How to initialize and configure dbt for Postgres database
How to initialize and configure dbt for Postgres database

How to initialize and configure dbt for Postgres database

Dbt Published 12 sep 2021 Last Updated 12 sep 2021 Siva Nadesan
Table of Contents

Overview

In this article we will see how to initialize and configure a new dbt project for postgres database. Code used in this article can be found here.

post thumb
  • Here we are using dbt seed to load raw and reference tables just for demo purposes.
  • In real use cases, you should never use dbt seed to load your raw layer, though it can be used to load simple reference tables. All other data ingestion to your RAW layer will happen outside dbt.

Prerequisite

Download and install docker for your platform. Click here for instructions.

Start Docker containers

Let’s create the infrastrucure required for this demo using Docker containers.

  • Clone this repo

  • Open a new terminal and cd into dbt-docker directory. This directory contains infra components for this demo

  • Create a copy of .env.template as .env

  • Start dbt container by running

    docker-compose up -d --build
    
  • Start Postgres container by running

    docker-compose -f docker-compose-postgres.yml up -d --build
    
  • Validate the container by running

    docker ps
    
  • Navigate to pgAdmin in your browser and login with admin@admin.com and postgres

  • Click on Add New Server and enter following details
    ✅ Name : dbt demo
    ✅ Host Name : postgres
    ✅ User : postgres
    ✅ Password : postgres

    ⚠️ If you change the credentials in .env then please make sure to use them ⚠️

  • SSH into the dbt container by running

    docker exec -it dbt /bin/bash
    

Create dbt project

  • Validate dbt is installed correctly by running

    dbt --version
    
  • cd into your preferred directory

    cd /C/
    
  • Create dbt project by running

    dbt init dbt-postgres
    

Configure dbt profile

  • Navigate into dbt-postgres directory, create a new dbt profile file profiles.yml and update it with Postgres database connection details

    dbt-postgres:
      target: dev
      outputs:
        dev:
          type: postgres
          host: postgres
          user: postgres
          password: postgres
          port: 5432
          dbname: postgres
          schema: demo
          threads: 3
          keepalives_idle: 0 # default 0, indicating the system default
    

Configure dbt project

  • Edit the dbt_project.yml to connect to the profile which we just created. The value for profile should exactly match with the name in profiles.yml

  • From the project directory, run dbt-set-profile to update DBT_PROFILES_DIR.

    dbt-set-profile is alias to unset DBT_PROFILES_DIR && export DBT_PROFILES_DIR=$PWD

  • Validate the dbt profile and connection by running

    dbt debug
    
  • Update name in dbt_project.yml to appropriate project name (say dbt_postgres_demo)

dbt commands

  • ⚠️ Delete the contents from data, macros, models, tests and snapshots directories to follow along this demo

Seeds

Seeds are CSV files in your dbt project that dbt can load into your data warehouse.

  • Copy the sample data from dbt-postgres\demo-artifacts\data to dbt-postgres\data

  • Review the seed configuration in dbt_project.yml. The seed configuration should have project name followed by names which should match with the structure of data directory.

seeds:
  dbt_postgres_demo:
    schema: raw # all seeds in this project will use the mapping schema by default
    imdb:
      schema: imdb # seeds in the `data/imdb/ subdirectory will use the imdb schema
    sakila:
      schema: sakila # seeds in the `data/sakila/ subdirectory will use the sakila schema
      address:
        +column_types:
          phone: varchar(50)
    lookups:
      schema: lookups # seeds in the `data/lookups/ subdirectory will use the lookups schema  
  • Load the seed files by running below command

    dbt seed
    
  • Data will be loaded into a schema with dbt_ prefix. To fix this we will create a small macro

Macros

Macros are pieces of code that can be reused multiple times.

  • Copy the macros from dbt-postgres\demo-artifacts\macros\utils to dbt-postgres\macros\utils

  • Macro generate_schema_name uses the custom schema when provided

  • Seed all files by running below command

    dbt seed
    

    This time data will be loaded into the correct schema without the dbt_ prefix

  • Seed select files by running

    dbt seed --select address
    

Sources

Sources make it possible to name and describe the data loaded into your warehouse by your Extract and Load tools.

We could either use ref or source function to use the data which we seeded, but to stay close to a real use case, we will use source function.

  • Copy the source definition from dbt-postgres\demo-artifacts\models\sources\ to dbt-postgres\models\sources\

  • Test sources by running below command

    dbt test --models source:*
    

Packages

dbt packages are in fact standalone dbt projects, with models and macros that tackle a specific problem area.

  • Create a new file named in packages.yml and add package configuration

  • Specify the package(s) you wish to add

packages:
  - package: dbt-labs/dbt_utils
    version: 0.7.0
  - package: dbt-labs/codegen
    version: 0.4.0
  • Install the packages by running

    dbt deps
    

Models

Model is a select statement. Models are defined in .sql file.

  • Copy the model definition from dbt-postgres\demo-artifacts\models\ to dbt-postgres\models\

  • Review the model configuration in dbt_project.yml. The model configuration should have project name followed by names which should match with the structure of models directory

models:
  dbt_postgres_demo:
      materialized: table
      staging:
        schema: staging
        materialized: view
      marts:
        dim:
          schema: dim
          materialized: table
        fact:
          schema: fact
          materialized: table
      presentation:
        schema: analytics
        materialized: view

Default materialized configuration can be specified at model project level. materialized configuration can be overriden at each directory level

  • To define variables in a dbt project, add a vars config to dbt_project.yml file
vars:
  start_date: '2005-01-01'
  high_date: '9999-12-31'
  • Build models by running below command

    dbt run --models stg_sakila__customer
    dbt run --models staging.*
    dbt run --models +fct_sales --var '{"start_date": "2005-05-24"}'
    dbt run --models +tag:presentation-dim
    dbt run --models +tag:presentation-fact --var '{"start_date": "2005-05-24"}'
    dbt run
    

Tests

Tests are assertions you make about your models and other resources in your dbt project (e.g. sources, seeds and snapshots).

Types of tests:

  • schema tests (more common)

  • data tests: specific queries that return 0 records

  • Generate the yaml for existing models by running

    dbt run-operation generate_model_yaml --args '{"model_name": "dim_customer"}'
    
  • schema tests can be added at table level OR column level. Here is an example of test under column definition in model yaml

version: 2

models:
  - name: dim_customer
    description: ""
    columns:
      - name: customer_id
        description: ""
        tests:
          - unique
          - not_null
  • Execute tests by running below commands

    dbt test --models +tag:presentation-dim
    dbt test --models +tag:presentation-fact
    

Validate data

-- Incremental demo
SELECT cast(date_id AS VARCHAR) as date_id
	,count(1) as rec_count
FROM analytics.sales
GROUP BY date_id
ORDER BY date_id;

-- Revenue by day
SELECT ad.day_name
	,af.film_rating
	,ac.customer_city
	,sum(amount) AS revenue
FROM analytics.sales asa
JOIN analytics.film af ON (af.film_id = asa.film_id)
JOIN analytics.DATE ad ON (ad.date_dim_id = asa.date_id)
JOIN analytics.customer ac ON (ac.customer_id = asa.customer_id)
GROUP BY (
		ad.day_name
		,af.film_rating
		,ac.customer_city
		)
ORDER BY revenue DESC limit 100;

Docs

dbt docs provides a way to generate documentation for your dbt project and render it as a website.

  • You can add descriptions to models, columns, sources in the related yml file

  • dbt also supports docs block using the jinja docs tag

  • Copy sample jinja docs from dbt-postgres\demo-artifacts\docs\ to dbt-postgres\docs\

  • Generate documents by running

    dbt docs generate
    
  • Publish the docs by running

    dbt docs serve --port 8085
    
  • Stop published docs by running ctrl + c

Clean demo resources

Open an new terminal and cd into dbt-docker directory. Run below command to delete the docker containers and related volumes

docker-compose down --volume --remove-orphans

Hope this was helpful. Did I miss something ? Let me know in the forum section and I’ll add it in !

References



About The Authors
Siva Nadesan

Siva Nadesan is a Principal Data Engineer. His passion includes working on data engineering and writting technical blogs. He likes to learn new technologies and apply his knowledge to build solution for real world problems.

LinkedIn

Share: