The Medallion Architecture in Practice: Bronze, Silver, Gold with dbt
We use the medallion pattern on every data platform we build. Here is the exact dbt project structure, naming conventions, and SQL patterns we follow — and the mistakes we made before landing on them.
We have set up medallion architectures on Snowflake, BigQuery, and Databricks across a dozen engagements. The pattern itself is simple. Getting the details right — naming, where transformations belong, how to handle late-arriving data — that took us a few painful iterations.
This post is the playbook we wish we had when we started.
Bronze: The Raw Landing Zone
Bronze is where source data lands untouched. We do not transform anything here. The ingestion tool (Fivetran, Airbyte, a custom Airflow DAG) writes to schemas named raw_{source}:
Append-only or full-refresh. Depends on the source, but bronze never deletes.
Every table gets a _loaded_at timestamp. This is how we track freshness downstream.
Silver: Staging and Intermediate
This is where the real work happens. Silver has two sub-layers that serve different purposes.
Staging Models (stg_)
One staging model per source table. The job of a staging model is narrow: rename columns, cast types, and apply light cleaning. Nothing else.
-- models/staging/stripe/stg_stripe__charges.sql
with source as (
select * from {{ source('raw_stripe', 'charges') }}
),
renamed as (
select
id as charge_id,
customer as customer_id,
amount::number(12,2) / 100 as amount_usd, -- Stripe stores cents
currency,
status,
created::timestamp_ntz as created_at,
_fivetran_synced as _loaded_at
from source
where status != 'draft' -- Remove incomplete charges
)
select * from renamed
The naming convention stg_{source}__{table} is non-negotiable for us. Double underscore separates source from table. It looks odd the first time, but it pays off when you have 40+ staging models and need to find things fast.
Intermediate Models (int_)
Intermediate models handle the complex logic that does not belong in staging (too much business logic) or in marts (too granular, not consumption-ready). This is where we do:
Deduplication across sources
Pivoting and unpivoting
Window functions for sessionization
Joining related staging models before aggregation
-- models/intermediate/finance/int_orders__enriched.sql
with orders as (
select * from {{ ref('stg_shopify__orders') }}
),
charges as (
select * from {{ ref('stg_stripe__charges') }}
),
enriched as (
select
o.order_id,
o.customer_id,
o.order_date,
o.subtotal_usd,
o.discount_usd,
c.amount_usd as charged_amount_usd,
c.status as payment_status,
o.subtotal_usd - o.discount_usd as net_revenue_usd
from orders o
left join charges c on o.stripe_charge_id = c.charge_id
)
select * from enriched
Gold: The Marts
Marts are what your analysts, dashboards, and downstream applications consume. We follow Kimball-style dimensional modeling: fact tables (fct_) for events and measures, dimension tables (dim_) for descriptive attributes.
-- models/marts/finance/fct_revenue.sql
with orders as (
select * from {{ ref('int_orders__enriched') }}
),
final as (
select
order_id,
customer_id,
order_date,
net_revenue_usd,
payment_status,
-- Fiscal calendar
{{ fiscal_quarter('order_date') }} as fiscal_quarter,
{{ fiscal_year('order_date') }} as fiscal_year
from orders
where payment_status = 'succeeded'
)
select * from final
Marts are organized by business domain, not by source system:
YAML files prefixed with underscore. We keep source definitions and model configs in YAML files named _source__sources.yml and _source__models.yml. The underscore pushes them to the top of the directory listing, making them easy to find. Small thing, big quality-of-life improvement when you have 60+ models.
generate_schema_name macro. By default, dbt puts everything in your target schema. We override this so staging models land in a staging schema, intermediates in intermediate, and marts in marts_{domain}. This mirrors the medallion layers in the warehouse itself.
-- macros/generate_schema_name.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- if custom_schema_name is none -%}
{{ target.schema }}
{%- else -%}
{{ custom_schema_name | trim }}
{%- endif -%}
{%- endmacro %}
Mistakes We Made Along the Way
Skipping the intermediate layer. On our first two engagements, we went straight from staging to marts. The mart models became bloated, hard to test, and impossible for new team members to understand. Adding the intermediate layer increased the model count but cut debugging time dramatically.
Putting business logic in staging. Early on, we had staging models calculating things like net_revenue and applying business rules. When the rules changed (they always do), we had to touch staging and every downstream model. Now staging is pure rename-and-cast. Business rules live in intermediate or marts.
Inconsistent naming. We had one project where half the team used stg_stripe_charges (single underscore) and the other half used stg_stripe__charges (double underscore). It caused merge conflicts, broken refs, and confusion for weeks. Pick a convention on day one and enforce it with a CI check.
The medallion pattern is not complicated. The discipline to follow it consistently, especially under deadline pressure, is the hard part. But every time we have cut corners on this structure, we have paid for it later. Every single time.