Data Quality Gates: How We Catch Bad Data Before It Hits the Dashboard

Last year, a retail client's daily revenue dashboard showed a 300% spike on a Tuesday. The CEO sent a celebratory Slack message. Then the data team realized a payment processor had started sending amounts in cents instead of dollars. By the time they caught it, the board had already seen the wrong number.

That engagement is why we now run data quality gates on every pipeline, in every environment, with zero exceptions.

Gate 1: Source Freshness

Before transforming anything, verify the source data actually arrived. Stale source data produces stale results that look correct — and that is worse than an outright failure.

# models/staging/stripe/_stripe__sources.yml
sources:
  - name: raw_stripe
    database: analytics
    schema: raw_stripe
    freshness:
      warn_after: { count: 6, period: hour }
      error_after: { count: 12, period: hour }
    loaded_at_field: _fivetran_synced
    tables:
      - name: charges
        description: "Stripe charge events, synced by Fivetran"
      - name: refunds
      - name: customers

Run dbt source freshness before every dbt build. If a source is stale beyond the error threshold, the pipeline stops. No silent propagation of old data.

# In the Airflow DAG or CI script
dbt source freshness --select source:raw_stripe
# Exit code 1 if any source exceeds error_after threshold

Gate 2: Schema and Row-Level Tests

This is the core of our testing strategy. Every staging and mart model gets a set of dbt tests in its YAML definition.

The Minimum Test Suite

Every model gets these four tests at minimum. No exceptions, even for "temporary" models (those become permanent faster than anyone plans):

# models/staging/stripe/_stripe__models.yml
models:
  - name: stg_stripe__charges
    description: "One row per Stripe charge, cleaned and renamed"
    columns:
      - name: charge_id
        description: "Primary key"
        tests:
          - unique
          - not_null

      - name: amount_usd
        description: "Charge amount in USD"
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0
              max_value: 100000
              config:
                severity: warn

      - name: status
        tests:
          - accepted_values:
              values: ["succeeded", "pending", "failed", "refunded"]

      - name: created_at
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: "<= current_timestamp()"

Custom Tests That Caught Real Bugs

Generic tests catch generic problems. The bugs that actually hurt are domain-specific. Here are three custom tests from real engagements:

Test 1: Revenue should not swing more than 50% day-over-day.

This is the test that would have caught the cents-vs-dollars bug.

-- tests/assert_revenue_daily_swing.sql
-- Fails if daily revenue changes by more than 50% from previous day
with daily_revenue as (
    select
        order_date,
        sum(net_revenue_usd) as revenue
    from {{ ref('fct_revenue') }}
    where order_date >= dateadd(day, -7, current_date())
    group by order_date
),

with_prev as (
    select
        order_date,
        revenue,
        lag(revenue) over (order by order_date) as prev_revenue,
        case
            when lag(revenue) over (order by order_date) = 0 then null
            else abs(revenue - lag(revenue) over (order by order_date))
                 / lag(revenue) over (order by order_date)
        end as pct_change
    from daily_revenue
)

-- Return rows that violate the threshold (dbt fails if any rows returned)
select * from with_prev
where pct_change > 0.50
    and prev_revenue > 100  -- Ignore low-volume days

Test 2: Every order should have a matching customer.

-- tests/assert_orders_have_customers.sql
select
    o.order_id,
    o.customer_id
from {{ ref('fct_orders') }} o
left join {{ ref('dim_customers') }} c
    on o.customer_id = c.customer_id
where c.customer_id is null
    and o.order_date >= dateadd(day, -30, current_date())

Test 3: Row counts should not drop unexpectedly.

-- tests/assert_staging_row_count.sql
-- stg_orders should never have fewer rows today than yesterday
with counts as (
    select
        _loaded_at::date as load_date,
        count(*) as row_count
    from {{ ref('stg_shopify__orders') }}
    where _loaded_at::date >= dateadd(day, -3, current_date())
    group by 1
)

select
    load_date,
    row_count,
    lag(row_count) over (order by load_date) as prev_count
from counts
having row_count < prev_count * 0.8  -- 20% drop threshold

Gate 3: Anomaly Detection on Marts

For tier-1 models (the ones that feed executive dashboards and financial reports), we add statistical anomaly checks using Great Expectations or dbt-expectations:

# models/marts/finance/_finance__models.yml
models:
  - name: fct_revenue
    meta:
      tier: 1
      owner: "finance-analytics@client.com"
    tests:
      # Volume anomaly: flag if today's row count is a statistical outlier
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 1
          row_condition: "order_date = current_date() - 1"

      # Metric anomaly: daily revenue within 3 standard deviations
      - dbt_expectations.expect_column_quantile_values_to_be_between:
          column: net_revenue_usd
          quantile: 0.5
          min_value: 500
          max_value: 50000

The Alerting Layer

Tests are useless if nobody sees the failures. We wire dbt test results into Slack using a simple post-run hook:

# In Airflow: after the dbt build task
@task()
def notify_test_failures():
    """Parse dbt run_results.json and alert on failures."""
    import json
    from pathlib import Path
    from slack_sdk import WebClient

    results_path = Path("target/run_results.json")
    results = json.loads(results_path.read_text())

    failures = [
        r for r in results["results"]
        if r["status"] == "fail" and r["node"]["resource_type"] == "test"
    ]

    if not failures:
        return

    client = WebClient(token=os.environ["SLACK_BOT_TOKEN"])

    failure_list = "\n".join(
        f"• `{f['node']['name']}` — {f['message'][:100]}"
        for f in failures[:10]
    )

    client.chat_postMessage(
        channel="#data-alerts",
        text=f":rotating_light: *{len(failures)} dbt test(s) failed*\n\n{failure_list}",
    )

Severity Levels: Not Everything Is a Blocker

A critical insight we picked up after a few engagements: not every test failure should stop the pipeline.

# Warning: log it, investigate later
- dbt_utils.accepted_range:
    min_value: 0
    max_value: 100000
    config:
      severity: warn    # Pipeline continues

# Error: stop the pipeline
- unique:
    config:
      severity: error   # Pipeline stops

We use this framework:

Severity	When to use	Example
error	Duplicate primary keys, null required fields, broken relationships	`unique`, `not_null` on PKs
warn	Anomalies that need investigation but are not necessarily wrong	Revenue swing >50%, unexpected nulls in optional fields

The first time we set everything to error, a warn-worthy anomaly in a non-critical table blocked a morning pipeline for three hours while the team investigated what turned out to be a legitimate business event (a flash sale). Now we are deliberate about what blocks and what alerts.

Making It Stick

The biggest challenge with data quality is not technical — it is cultural. Tests rot when the team stops writing them for new models. We enforce it with CI:

# .github/workflows/dbt-ci.yml (simplified)
# Fails the PR if any model lacks a unique + not_null test on its primary key
dbt ls --resource-type model --output json | \
  python scripts/check_test_coverage.py --min-tests 2

Every pull request that adds a new model must include at least a unique and not_null test on the primary key. No exceptions. The CI check enforces it so humans do not have to nag each other.

Data quality is not a project with a finish line. It is a practice. The gates we described here — freshness checks, schema tests, anomaly detection — take about two days to set up on a new project. The bugs they catch in the first month alone make that investment pay for itself several times over.