Led product quality strategy for large language models (scale of GPT-5, NDA-bound), defining success metrics and ensuring outputs met real-world production standards for hundreds of millions of users.
Drove a 35% increase in user satisfaction by identifying friction points, prioritizing fixes, and implementing Playwright-based automated testing to catch real-user edge cases before release.
Established quality gates and performance dashboards that reduced recurring regressions by ~30% and got engineering and product aligned on what "good" actually looked like.
Technical Product ManagerAnalogue ShiftsJul 2022 – Mar 2024
Took ownership of platform delivery for backend and cloud initiatives, coordinating priorities across engineering and business.
Led release readiness for cloud systems, introducing workflows that improved execution speed and reduced operational risk.
Acted as bridge between infrastructure, application, and business concerns to keep work moving forward.
Software EngineerGoldenOx PartnersJan 2022 – Jun 2022
Owned technical delivery of mobile app modernization, migrating core services to serverless architecture.
Coordinated rollout sequencing and deployment to minimize disruption while improving scalability.
Case Study: Improving AI Model Quality at Turing
The challenge: Models passed technical benchmarks but failed in real-world use. Engineers optimized for metrics, not usability. No systematic way to catch gaps pre-release.
My role & ownership
As AI Product Manager, I owned the evaluation process and bridged engineering output with user needs.
Analyzed failure patterns across thousands of evaluations - identified root cause: testing for correctness, not real-world interaction.
Pushed for a rigorous approach: built automated testing with Playwright to simulate user journeys, capturing edge cases unit tests missed.
Established quality gates so nothing shipped until it passed both technical AND real-world checks.
Set up dashboards to track what actually broke for users, prioritizing fixes that mattered.
Result
Recurring regressions dropped significantly. Engineering and product started speaking the same language. The framework turned subjective concerns into actionable data - exactly the systematic approach I'd bring to a platform like Pear.