Why This Job is Featured on The SaaS Jobs
Site reliability work is increasingly central to modern SaaS, where customer expectations are defined by uptime, latency, and predictable releases. This role stands out because it sits on the production-critical path for an enterprise platform with broad geographic reach and a large customer base, making reliability engineering a direct lever on product trust. The remit spans observability, incident response, and infrastructure management, a combination that reflects how mature SaaS operators run services at scale.
For a SaaS career, the long-term value is in building repeatable operational systems rather than one-off fixes. Experience with infrastructure as code, automated rollouts, and capacity planning translates across subscription businesses that need disciplined change management and measurable service health. The emphasis on postmortems and durable remediation is also a strong signal of learning loops that improve engineering outcomes over time.
This is best suited to an engineer who prefers ownership of end-to-end reliability outcomes and enjoys cross-functional work with product and security stakeholders. It will fit someone comfortable making pragmatic trade-offs between resilience, complexity, and cost, and who wants their technical decisions to shape how a SaaS platform scales globally.
The section above is editorial commentary from The SaaS Jobs, provided to help SaaS professionals understand the role in a broader industry context.
Job Description
Why Harvey
At Harvey, we’re transforming how legal and professional services operate. By combining frontier agentic AI, an enterprise-grade platform, and deep domain expertise, we’re reshaping how critical knowledge work gets done for decades to come.
This is a rare chance to help build a generational company at a true inflection point. With 1500+ customers in 60+ countries, strong product-market fit, and world-class investor support, we’re scaling fast and defining a new category in real time. The work is ambitious, the bar is high, and the opportunity for growth — personal, professional, and financial — is unmatched.
Our team moves fast, takes ownership, and is deeply committed to the mission — operating with intensity, staying close to our customers, and pushing each other for excellence. We live by three values: Decisiveness, Simplicity, and Job's Not Finished. We act quickly on clear judgment over perfect information, we believe simplicity is what scales, and we're never satisfied with where we are. If you want to do the best work of your career alongside people who share that drive, we'd love to build with you.
At Harvey, the future of professional services is being written today — and we’re just getting started.
Role Overview
As a Software Engineer on the Site Reliability team at Harvey, you will ensure the reliability, scalability, and performance of our legal AI platform. You’ll join a high-leverage team that sits at the intersection of infrastructure and product, owning the systems that keep our platform fast, secure, and always on. From scaling across 50+ regions to automating mission-critical operations, your work will ensure that Harvey remains resilient as we grow. If you’re passionate about building robust systems and reducing complexity through automation, we’d love to work with you.
What You'll Do
Design, implement, and manage monitoring, alerting, and infrastructure resources (compute, storage, networking) across 50+ global regions
Lead incident management processes, including postmortems, root cause analyses, and driving actionable improvements
Automate operational tasks and workflows, building tools and processes for capacity planning, graceful rollouts, and safe data access to maintain high reliability and reduce manual intervention
Collaborate across teams to drive reliability, security, and compliance throughout the software lifecycle
Optimize infrastructure costs through strategic capacity planning and build-versus-buy decisions while maintaining system performance, reliability, and functionality.
What You Have
3+ years of experience in Site Reliability Engineering or similar roles supporting production environments
Expertise in infrastructure as code(IaC) tools (Pulumi, Terraform, CloudFormation, etc.).
Deep familiarity with observability tools (Datadog, Sentry, etc.) and incident response practices (PagerDuty, IncidentIO, etc.)
Proficiency with cloud infrastructure platforms (Azure, GCP, AWS, etc.)
Strong programming skills (Python, Bash, Go, or similar languages)
Proven track record of diagnosing complex system problems and implementing durable solutions
Solid understanding of CI/CD, Kubernetes, containerization, networking, databases, and cloud security principles
Excellent problem-solving skills, meticulous attention to detail, and a commitment to operational excellence
Additional Information
Depending on your location, an Applicant Privacy Notice may apply to you. You can find all of our Applicant Privacy Notices [here].
#LI-AS2
Harvey is an equal opportunity employer and does not discriminate on the basis of race, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition, or any other basis protected by law.
We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made by emailing accommodations@harvey.ai