Software Engineer, Compute Infrastructure

Glean • Mountain View, CA • 1m ago

Why This Job is Featured on The SaaS Jobs

This Software Engineer, Compute Infrastructure role sits at a core layer of a modern SaaS stack: the shared runtime platform that makes customer-facing AI search and assistant features reliable in production. The emphasis on Kubernetes primitives, multi-cloud foundations, and cost-aware execution reflects how SaaS companies are increasingly differentiating through operational excellence in AI-heavy workloads, not only through model choice or UX.

For a long-term SaaS engineering career, work like this builds durable platform instincts: designing “golden paths” that let product teams ship safely, translating performance goals into SLOs and observability, and balancing latency, utilization, and spend as first-class engineering constraints. Experience operating multitenant runtime systems and supporting both online services and batch pipelines also transfers well across SaaS businesses that run data-intensive products at scale.

The role fits engineers who prefer systems ownership over narrow feature delivery, and who like working at the intersection of infrastructure and application behavior. It will suit someone comfortable collaborating across platform, data, and product engineering, and who views incident response and iterative hardening as part of building the product. The hybrid setup also signals a preference for frequent in-person technical collaboration.

The section above is editorial commentary from The SaaS Jobs, provided to help SaaS professionals understand the role in a broader industry context.

Job Description

About the Role:

Glean is seeking a Software Engineer, Compute Infrastructure to help design, build, and operate the core compute and runtime platform that powers our AI search, assistant, and agentic workloads. Sitting within the Platforms organization, this role focuses on Kubernetes-based runtime systems, multi-cloud infrastructure, and cost‑efficient, low‑latency execution for production services and pipelines that serve our customers at scale.

You will:

Design, build, and own backend/platform services that power Glean’s runtime infrastructure, with a focus on reliability, scalability, and performance for AI and search workloads.
Develop and evolve Kubernetes‑based runtime primitives (e.g., service orchestration, scheduling integrations, autoscaling patterns) across our multi‑cloud foundation (GCP, AWS, Azure).
Collaborate with platform, data, and product engineering teams to make it easy and safe to spin up new services and batch workloads, with clear golden paths for deployment, configuration, and runtime operations.
Drive end‑to‑end improvements in latency, resource utilization, and cost for core platform services, including multitenant runtime environments and experimental AI workloads.
Implement and harden infrastructure‑as‑code patterns, observability, and guardrails so teams can confidently ship and run services in production (e.g., SLOs, dashboards, alerts, safe rollout/rollback).
Partner with the Costs and Runtime teams to build shared mechanisms for attribution, guardrails, and automation that keep our runtime layer efficient as we 5x customers and traffic.
Participate in an on‑call rotation for critical platform services, lead incident response when needed, and translate learnings into better reliability, tooling, and documentation.
Contribute to technical direction for Runtime Infra: help define roadmaps around multitenancy, autoscaling, capacity/placement, and platformized patterns that reduce per‑team hand‑holding.

About you:

You are a backend/platform engineer who enjoys working close to the metal—where application behavior, infrastructure, and cost all intersect—and you are motivated by building shared systems that many teams depend on.
You have strong distributed systems fundamentals and experience operating high‑throughput, low‑latency services or batch pipelines in production environments.
You are comfortable owning systems end‑to‑end: design, implementation, testing, deployment, observability, and ongoing operations.
You think in terms of reliability and guardrails: SLOs, incident response, safe deployment strategies, and clear operational runbooks are part of how you build.
You are pragmatic and execution‑oriented: you can balance ideal architectures with the constraints of a fast‑moving startup and ship iterative improvements.
You communicate clearly with both infra and product engineers, and you like collaborating across teams to understand requirements and translate them into platform capabilities.
You are excited to work in a multi‑cloud, multi‑tenant environment and to help define best practices for running AI workloads efficiently at scale.

Location:

This role is hybrid (4 days a week in our Mountain View office)

Compensation & Benefits:

The standard base salary range for this position is $140,000 - $220,000 annually. Compensation offered will be determined by factors such as location, level, job-related knowledge, skills, and experience. Certain roles may be eligible for variable compensation, equity, and benefits.

We offer a comprehensive benefits package including competitive compensation, Medical, Vision, and Dental coverage, generous time-off policy, and the opportunity to contribute to your 401k plan to support your long-term goals. When you join, you'll receive a home office improvement stipend, as well as an annual education and wellness stipends to support your growth and wellbeing. We foster a vibrant company culture through regular events, and provide healthy lunches daily to keep you fueled and focused.

We are a diverse bunch of people and we want to continue to attract and retain a diverse range of people into our organization. We're committed to an inclusive and diverse company. We do not discriminate based on gender, ethnicity, sexual orientation, religion, civil or family status, age, disability, or race.

#LI-HYBRID