Why This Job is Featured on The SaaS Jobs
This DevOps Engineer role stands out in the SaaS ecosystem because it sits at the infrastructure layer for an API product that supports AI agent workflows, where reliability and latency are part of the customer experience. The remit spans Kubernetes, GitOps deployments, observability, and multi-region operations, which are common pressure points for SaaS platforms that serve external developers and enterprise users.
For a SaaS career, the work maps closely to the skills that travel across modern product-led and API-first companies: building repeatable delivery systems, treating infrastructure as code, and establishing monitoring and incident practices that scale with usage. Exposure to capacity planning and cloud cost management also aligns with how SaaS businesses balance performance and unit economics as adoption increases. The emphasis on production debugging across services reflects the operational maturity expected in subscription software.
This position is likely to suit an engineer who prefers end-to-end accountability over a narrow platform slice and who enjoys methodical incident response and systems thinking. It fits someone comfortable partnering closely with a small engineering group and taking ownership of foundational technical decisions that influence how a SaaS product is shipped and operated.
The section above is editorial commentary from The SaaS Jobs, provided to help SaaS professionals understand the role in a broader industry context.
Job Description
Employment Type
Full time
About Tavily
We're building the infrastructure layer for agentic web interaction at scale. Our API is designed from the ground up to power Retrieval-Augmented Generation (RAG) and real-time reasoning in AI systems. By connecting LLMs to high-quality, trustworthy web content, we help developers build agents that are not only intelligent — but also informed.
We work with some of the most innovative teams in AI — from small startups shaping the ecosystem to the largest enterprises deploying AI at scale. Whether it's powering sales assistants, research copilots, or internal knowledge tools, we're the missing link between LLMs and the real world.
The Role: DevOps Engineer
Managing Kubernetes clusters across multiple environments and regions
Owning infrastructure as code for all resources
Maintaining and improving CI/CD pipelines and GitOps-based deployments
Maintaining and optimize real-time data pipelines that process billions of events per day across distributed queues and stream processors
Building out monitoring, alerting, and observability
Debugging production issues across services
Managing cloud costs and capacity planning
Working closely with a small engineering team — you'd own infra, not a slice of it
What we're looking for
~3+ years in a DevOps or platform engineering role, working in production environments
Proven experience designing and operating large-scale, distributed systems, with a solid understanding of API design, reliability, and performance at scale
Strong Kubernetes experience in a managed cloud environment
Proficiency with infrastructure as code (Terraform or similar)
Experience with GitOps-based deployment workflows
Built or maintained observability stacks (logging, metrics, alerting)
Experience handling production incidents calmly and methodically
Nice to have:
Why Tavily?
Full ownership — small team, you own the entire infrastructure, not a slice of it
Real scaling challenges — bursty scraping workloads, cache invalidation, multi-region, millions of daily requests
AI-native company — your infra directly powers AI agents used by leading companies in the space
NYC-based — working closely with engineering, short feedback loops