Senior Site Reliability Engineer

Glean • Palo Alto, CA • 1m ago

Why This Job is Featured on The SaaS Jobs

This Senior Site Reliability Engineer role sits at the heart of how a cloud SaaS product stays dependable as usage grows. The listing points to a hybrid cloud environment and a strong emphasis on automation, which are common markers of a platform that is moving beyond basic uptime work into deliberate reliability engineering. The remit spans availability, performance, and resilience, making it relevant to SaaS organisations where customer experience is tightly coupled to production stability.

From a SaaS career perspective, the role offers durable exposure to the operational disciplines that scale with recurring revenue products: incident response with blameless postmortems, monitoring and alerting design, and cost aware performance optimisation. It also signals meaningful proximity to the software development lifecycle through design and launch reviews, which is where SRE work becomes a multiplier for product engineering rather than a separate function. Experience across Kubernetes, infrastructure as code, and major cloud platforms remains highly portable across modern SaaS stacks.

This position is best suited to an experienced engineer who enjoys balancing hands-on engineering with setting standards across teams. It will fit someone comfortable taking on on call responsibility, translating reliability goals into tooling and process, and collaborating with security and application engineers to reduce operational risk over time.

The section above is editorial commentary from The SaaS Jobs, provided to help SaaS professionals understand the role in a broader industry context.

Job Description

About the Role:

We are seeking a skilled and motivated Senior Site Reliability Engineer (SRE) to become a valuable addition to our dynamic and innovative team. As a SRE, you will play a critical role in ensuring the reliability, availability, and performance of our cloud-based services and applications. You will work closely with our engineering teams to design, build, and maintain robust, scalable, and highly available cloud infrastructure.

Much of our software development focuses on building infrastructure to scale our operations in a hybrid cloud environment and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale and fast growth which are unique to Glean, while using your expertise in coding, algorithms, problem-solving, and SRE practices. We keep Glean applications up and running, ensuring our customers have the best and most reliable experience possible.

You are:

Technical Leadership and Mentorship: Play a key role in driving technical excellence and fostering a culture of reliability across engineering teams. You will lead by example, setting best practices for incident management, performance optimization, and automation. Influence best practices, drive cross-team collaborations, and contribute to the execution of key objectives in alignment with engineering leadership and cross-functional partners. Establish strong technical credibility, shaping architectural decisions and ensuring the delivery of high-quality, reliable systems.
Ensure High Availability: Implement and maintain resilient cloud architectures, monitor system performance, and proactively identify and resolve potential bottlenecks or points of failure.
Incident Management: Participate in primary oncall rotation; cultivate technical curiosity and growth mindset, and a blameless postmortem culture within the team. Continuously optimize the on-call process for sustainability and efficiency.
Automation and Tooling: Develop and maintain automation scripts, tools, and processes to streamline system deployment, monitoring, and management tasks. Your contributions will be vital in efficiently scaling cloud operations.
Performance Optimization: Optimize cloud infrastructure and applications for performance, scalability, and cost-effectiveness.
Security and Compliance: Collaborate with security engineers to implement best practices and ensure compliance with security standards and policies.
Monitoring and Alerting: Design and configure advanced monitoring systems to gain insights into system behavior, set up alerts, and respond proactively to potential issues. Create and maintain comprehensive dashboards and playbooks for production on-call.
Software Development Consultation: Engage actively in the entire software development lifecycle. Participate in system design reviews and provide valuable SRE insights during launch reviews, influencing and enhancing system architecture.

About you:

Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
8+ years of experience in a senior-level role within Site Reliability Engineering or similar role, particularly in managing cloud-based services and infrastructure.
5+ years of experience with software development in one or more programming languages.
2+ years of experience managing people or teams, leading projects, and designing, analyzing, and troubleshooting distributed systems running in Cloud.
Strong knowledge of cloud platforms such as Google Cloud Platform, AWS, or Azure.
Practical experience with containerization technologies, including Docker and Kubernetes. Familiarity with infrastructure as code tools like Terraform is essential.
Solid understanding of networking, security principles, and best SRE and security practices.
Proficiency in using monitoring and alerting tools to detect and respond to potential issues effectively

Location:

This role is hybrid (4 days a week in one of our Bay Area offices)

Compensation & Benefits:

The standard base salary range for this position is $155,000 - $250,000 annually. Compensation offered will be determined by factors such as location, level, job-related knowledge, skills, and experience. Certain roles may be eligible for variable compensation, equity, and benefits.

We offer a comprehensive benefits package including competitive compensation, Medical, Vision, and Dental coverage, generous time-off policy, and the opportunity to contribute to your 401k plan to support your long-term goals. When you join, you'll receive a home office improvement stipend, as well as an annual education and wellness stipends to support your growth and wellbeing. We foster a vibrant company culture through regular events, and provide healthy lunches daily to keep you fueled and focused.

We are a diverse bunch of people and we want to continue to attract and retain a diverse range of people into our organization. We're committed to an inclusive and diverse company. We do not discriminate based on gender, ethnicity, sexual orientation, religion, civil or family status, age, disability, or race.

Related Jobs

Apply