2026Personal project

NYC Parking Violations Analytics (dbt + DuckDB)

A dbt project that models NYC parking violations with a medallion architecture (bronze -> silver -> gold). The pipeline cleans raw data, enriches it with fee logic, and delivers analytics-ready tables for reporting.

ETL/ELT
Medallion architecture for NYC parking violations

Portfolio highlights

  • End-to-end ELT pipeline in dbt with DuckDB.
  • Layered modeling with clear lineage from raw to curated outputs.
  • Data quality tests and documentation generation.

Architecture

  • Bronze models stage raw tables without transformation.
  • Silver models standardize columns, add flags, and join fee logic.
  • Gold models deliver aggregated business metrics.

Data

Core datasets and storage locations used by the pipeline.

  • DuckDB database: data/nyc_parking_violations.db
  • Raw tables: parking_violations_2023, parking_violation_codes
  • Reference CSVs: files in data/

Model summary

The layered dbt models that power the medallion architecture.

  • Bronze: bronze_parking_violations, bronze_parking_violation_codes
  • Silver: silver_parking_violations, silver_parking_violation_codes, silver_violation_tickets, silver_violation_vehicles
  • Gold: gold_ticket_metrics, gold_vehicle_metrics

Tests

Data quality coverage baked into the project.

  • Built-in tests on key fields (unique, not_null).
  • Custom generic test generic_not_null for column null checks.
  • Singular test violation_codes_revenue (warning severity).

Explore & Repo

Source code and documentation for the full dbt project.