Why This Job is Featured on The SaaS Jobs
Site Reliability Engineering remains a core capability for mature SaaS platforms where uptime, latency, and safe change management directly shape customer trust. This listing stands out for its clear emphasis on reliability engineering as a product-adjacent function, pairing observability, incident response, and architecture influence across cloud services rather than limiting the remit to infrastructure maintenance.
For a SaaS career, the role builds durable experience in the operational disciplines that scale with usage and complexity: defining service health signals, automating repeatable runbooks, and turning post incident learning into system improvements. Exposure to Kubernetes based deployments, infrastructure as code, and APM tooling maps closely to how modern subscription software is delivered, and the cross functional collaboration with Product and Engineering reflects how reliability work is prioritized in SaaS organizations.
This position fits engineers who prefer measurable outcomes and systems thinking, and who are comfortable sharing ownership through on call rotations and documented post mortems. It will suit someone who enjoys balancing hands on automation with governance minded work, including security and compliance controls, while communicating clearly across technical and non technical stakeholders.
The section above is editorial commentary from The SaaS Jobs, provided to help SaaS professionals understand the role in a broader industry context.
Job Description
Site Reliability Engineer
Location: Remote, United States
Employment Type: Full-Time
Benefits Offered: Vision, Medical, Life, Dental, 401K
Gross Annual Base Salary: USD 114,000 - 148,000
Additional variable compensation and benefits may apply. Total compensation is based on experience, skills, and location using objective, job-related criteria.
Summary
As a Site Reliability Engineer, you will focus on ensuring the platform and services customers rely on are reliable, performant, and highly available. If you enjoy staying at the forefront of technology and automating infrastructure deployments, then this is the job for you. This vital role within Cloud Services requires knowledge and experience designing, implementing, and monitoring scalable and secure cloud services. The employee is expected to work well in a small team and willing to share responsibilities with other team members as needed. You will interact with internal staff, managers, and customers to implement and maintain operations. A passion for technology and learning, and the ability to grow others are vital for success in this role.
Primary Duties and Responsibilities
- Implement application/infrastructure observability solutions to ensure desired application availability, reliability, and performance.
- Participate in regular On-Call rotations and share details related to incidents and their resolution through post-mortem reports and regular review meetings.
- Proactively partner with Product and Engineering teams to identify, develop, deploy, and maintain reliable systems and services.
- Influence and create new designs, architectures, standards, and methods for large-scale systems.
- Sustain a high level of reliability for key services and automated systems.
- Automate processes to improve reliability, performance, and availability.
- Update technical documentation, workflows, and knowledge base articles.
- Provide feedback in pull requests and peer coding reviews.
- Implement codified automated solutions that build integrations between Dynatrace, Azure DevOps and Jira.
- Solid knowledge in focused areas of OneStream Software.
- Ability to mentor others in several technical areas.
- Understanding practical use of SOC/FedRAMP controls to assist Compliance and Security teams.
Required Education and Experience
- BS/BA in computer science, engineering, or technology-related field (or equivalent work experience).
- Proven work experience as a Site Reliability Engineer or in a similar role.
- 6+ years of cloud infrastructure and software development experience.
- 2+ years hands on experience of Azure Kubernetes Services (AKS) with container-based deployment skills or other platforms such as OpenShift, GKS, EKS.
- Advanced understanding of APM and observability tools such as Dynatrace, AppInsights, DataDog, Log Analytics, New Relic, Prometheus and Grafana.
- Advanced understanding of Infrastructure-as-Code (IaC) concepts and tooling (Terraform, CloudFormation templates, Bicep or ARM templates) on Microsoft Azure, Amazon Web Services (AWS), or Google Cloud Platform (GCP).
- Deep knowledge of Configuration Management/Orchestration utilities such as Ansible, PowerShell DSC, Chef, and Puppet.
- Advanced understanding of cloud concepts including elasticity, security, and identity management.
- Well versed familiarity with Agile Development methodologies utilizing Jira or Azure DevOps Boards.
- 6+ years of hands-on experience with the following technologies, tools, and concepts:
- Automating processes using PowerShell, Bash, CLI, REST APIs, python, ARM Templates or other scripting languages.
- Comfortable leveraging source control tools such as Git, Azure DevOps, or GitHub.
- Knowledge of container orchestration platforms such as Kubernetes, OpenShift, AKS, GKS or helm.
- Microsoft Azure, Amazon Web Services (AWS) or Google Cloud (GCP).
Preferred Education and Experience
- Experience working for a cloud service provider (CSP), managed service provider (MSP), or SaaS provider.
- 6+ years of relevant Azure experience deploying and managing leveraging Infrastructure-as-Code (IAC) concepts.
- Experience with Microsoft and .NET (.NET, C#, SQL).
- Experience writing efficient and reliable code in a development environment.
- Debian, Ubuntu, Alpine or other distributions of the Linux operating systems.
- Deep knowledge and understanding of containerized applications, with special attention to reliability and monitoring of those containerized applications.
Knowledge, Skills, and Abilities
- Deal well with ambiguous/undefined problems.
- Ability to self-motivate and work independently.
- Strong organizational and prioritization skills.
- Ability to find and apply effective solutions to emerging problems and challenges.
- Strong attention to detail.
- Comfortable communicating with all levels of management and engineering.
- Ability to get up to speed quickly with modern technologies and services.
- Ability to multitask on a variety of projects.
Travel
- Travel Requirement: Travel is not expected to exceed 5%.
Who We Are
OneStream is how today’s Finance teams can go beyond just reporting on the past and Take Finance Further™ by steering the business to the future. It’s the only enterprise finance platform that unifies financial and operational data, embeds AI for better decisions and productivity, and empowers the CFO to become a critical driver of business strategy and execution. Our vision is to be the operating system for modern finance, digitizing core financial functions and empowering the CFO to become a critical driver of business strategy. To learn more visit www.onestream.com.
Why Join The OneStream Team
- Transparency around corporate structure, salary, and benefits.
- Core value of customer success.
- Variety of project work (not industry-specific).
- Strong culture and camaraderie.
- Multiple training opportunities.
Benefits at OneStream
OneStream employees are passionate, hardworking individuals who go above and beyond to keep our customers happy and follow through on our mission statement. They consistently deliver the best and in turn, we make every effort to keep them cared for and happy. A sample of the benefits we provide are:
- Excellent Medical Plan.
- Dental & Vision Insurance.
- Life Insurance.
- Short & Long Term Disability.
- Vacation Time.
- Paid Holidays.
- Professional Development.
- Retirement Plan.
All candidates must be legally authorized to work for any company in the country where this position is located without sponsorship.
OneStream is an Equal Opportunity Employer.
#LI-CS1
#LI-Remote