Project Build Log

Systems I've architected, bugs I've hunted down, and experiments that taught me something new. Each commit represents real problems solved with real code that runs in production.

a8f5c92 2024 Kellogg x Overtone

feat: publisher-brand content alignment at scale

Built semantic matching system processing 1M+ articles using BERT embeddings and custom distance metrics. K-means clustering + LDA topic modeling achieved 85% alignment accuracy, reducing content mismatches by 67%. The system that actually solved the "needle in haystack" problem for brand-publisher partnerships.

BERTPyTorchK-meansLDASemantic Search
e7c2b41 2024 IDEAL x Michelin

build: deep Q-learning for fleet tire optimization

Predictive maintenance pipeline processing real-time IoT data from 10K+ vehicles using TensorFlow and Spark. Deep Q-Learning agent learned optimal maintenance scheduling, cutting costs by 25% and extending tire life by 30%. When machine learning actually saves millions of dollars in the real world.

Deep Q-LearningTensorFlowApache SparkIoT SensorsPredictive Maintenance
c9f1a87 2024 Chan Zuckerberg Biohub

refactor: biomedical RAG system processing 6M research papers

Scaled RAG pipeline with intelligent XML parsing to preserve document structure for semantic chunking. FAISS vector database + horizontal scaling reduced latency by 65% while maintaining research-grade accuracy. The system that makes 6 million papers actually searchable and useful.

RAGLLaMA 3.1FAISSPMC APIAirflow
f4b8e31 2024 Feinberg School

feat: computer vision models for ocular disease detection

ResNet18/34 architectures with transfer learning achieved 95% accuracy on 10K+ retinal images. StyleGAN2-ADA synthetic data generation improved minority class robustness by 20%. Grad-CAM interpretability made models actually usable by medical staff in clinical settings.

ResNetStyleGAN2Grad-CAMMedical ImagingPyTorch
b3d7c94 2024 CASMI

test: comprehensive bias evaluation framework for LLM recommendations

Built experimentation pipeline on AWS EC2 with MLflow and Docker for parallel testing across 100K+ user profiles. DVC for experiment version control ensured reproducibility across demographic groups. Framework that revealed 20% accuracy gaps and actually fixed them.

LLM EvaluationAWS EC2MLflowDockerDVC
d8e4a72 2023-2024 Healee

analytics: bayesian marketing mix models for budget optimization

Advanced Bayesian models analyzed 8-channel marketing spend with Pipedrive CRM + Google Analytics integration. Lead scoring algorithms improved qualified conversion by 35%, budget allocation optimization boosted ROI by 27%. The system that proved which marketing dollars actually worked.

Bayesian ModelingMarketing MixTableauPipedriveGoogle Analytics