2024 to 2025 · KPMG client work built · 2025

A data platform for oil & gas.

A large oil & gas client, a team of developers, and one lakehouse to bring the data together.

What it is

Together with a team of developers, I built and managed a data platform for a large oil & gas client on Microsoft Fabric. The architecture followed the medallion pattern (bronze, silver, gold) on a lakehouse foundation, fed by metadata-driven data pipelines with change data capture and Dataflow Gen2, and transformed with PySpark and Spark SQL.

My personal contribution: re-architecting how the pipelines run. The platform originally processed sequentially; I introduced a parallel execution pattern that cut processing time by more than 5x. I also slimmed down overbloated tables, reducing row counts for faster, cheaper ingestion.

The work

Co-built and managed the client's Fabric data platform end to end with a team of developers
Implemented medallion architecture on a lakehouse with metadata-driven pipelines
Change data capture and Dataflow Gen2 for ingestion; PySpark and Spark SQL for transformation
Re-architected pipeline orchestration from sequential to parallel: more than 5x faster processing
Reduced overbloated tables and row counts for better ingestion performance
Worked in an Agile team with git-based workflows and CI/CD

Concepts

Microsoft Fabric Lakehouse Medallion architecture Metadata-driven CDC Dataflow Gen2 PySpark Spark SQL

What it taught me

Sequential pipelines are a default, not a law. The biggest performance wins came from questioning how the work was scheduled, not how it was written.