Data Engineering Projects
Welcome to my "Data Engineering projects". No Small boy stuff here.
You wonโt find toy projects here โ no Spotify streams, crypto price trackers, or COVID-19 dashboards ๐คฎ.
Instead, I work with real, meaningful datasets to build serious projects that reflect the kind of problems you'd encounter in real-world data engineering.
Project 1: Building Robust Data Pipelines with Metadata: A Python MCP Pattern Project
Data pipelines are the backbone of modern data systems, but they can often become complex and brittle. A common pain point is managing state and passing context between different processing stages. Relying on cryptic filename conventions or implicit directory structures often leads to errors, difficult debugging, and maintenance headaches.
What if there was a better way? What if we could explicitly pass instructions and track the state of our data directly alongside it? This is where metadata-driven pipelines come in.
This project explores a pattern for building more robust pipelines using explicit metadata files for context passing. It showcases a simple Python project demonstrating this technique, inspired by the Model Context Protocol (MCP) concept, to manage a file validation and loading workflow.
๐ Link to the blog post here: Building Robust Data Pipelines with Metadata: A Python MCP Pattern Project
๐ Link to the Github repository: https://github.com/scriptstar/metadata-driven-mcp-pipeline
Project 2: Building Your First Local AI Agent: Ollama + smolagents Setup Guide
Learn how to build a powerful AI agent that runs entirely on your computer using Ollama and Hugging Face's smolagents. Unlike traditional AI chatbots, this agent thinks in Python code to solve problems - from complex calculations to multi-step reasoning. Complete setup guide included with no API keys, cloud services, or recurring costs required.
๐ Link to the blog post here: Building Your First Local AI Agent: Ollama + smolagents Setup Guide