Engineering Manager – Product and Platform

Posted 15 January 2026
LocationUnited States of America
Job type Permanent
Reference34464

Job description

AI is rapidly transforming the world. Whether it’s developing the next generation of human-level intelligence, enhancing voice assistants, or enabling researchers to analyze genetic markers at scale, AI is increasingly integrated into various aspects of our daily lives.
This company is the leading AI observability and evaluation platform, empowering AI engineers to build and deploy high-performing, reliable models. As the AI landscape shifts from traditional ML to generative AI and agentic systems, they ensure teams have the tools to monitor, troubleshoot, and improve AI in production.
As Engineering Manager for the Event Platform team, you’ll report directly to the Head of Engineering, leading the team responsible for high-throughput event processing infrastructure. This critical system powers the company’s LLM evaluation and observability platform through a custom storage engine, real-time event processing infrastructure, and scalable query systems. This role is ideal for engineers passionate about optimizing distributed systems for durability, high availability, low latency, and scalability at massive scale. You’ll have the opportunity to shape mission-critical infrastructure that’s fundamental to the platform’s success.
The Role
We’re seeking an exceptional Engineering Manager who combines deep technical expertise in distributed systems with strong leadership abilities. You’ll lead a team of high-performing engineers while remaining hands-on with complex technical challenges across the event processing stack. The ideal candidate is as comfortable designing high-throughput distributed storage systems as they are mentoring engineers and driving technical strategy.

What You’ll Work On
• Lead and scale a team of engineers building next-generation event processing and storage infrastructure that handles millions of events per second
• Drive architectural decisions for mission-critical distributed systems, focusing on reliability, performance, and scalability
• Design and implement sophisticated storage and query engines optimized for AI observability workloads
• Partner with product teams to evolve platform capabilities while maintaining strict performance and reliability SLAs
• Mentor and grow engineering talent while fostering a culture of distributed systems excellence

What Will Set You Apart
• Strong background in performance optimization, particularly around low-latency storage and retrieval systems
• Track record of leading engineering teams working on distributed infrastructure while maintaining technical depth
• Experience with modern infrastructure (Kubernetes, cloud-native architectures) and distributed systems patterns
• Focus on pragmatic solutions while meeting strict performance and reliability requirements



Research indicates that men will apply to a role when they only meet 50-60% of the descriptions, however, when looking at women and other minority groups, they can look for up to a 99% match in order to apply to a role. If you feel you are a fit for our role, please still apply, don’t worry if you don’t tick every single box. We’d still love to hear from you. We encourage underrepresented talent to apply to all our roles & support accessibility needs