Why This Job is Featured on The SaaS Jobs
This Senior Machine Learning Engineer role sits at the intersection of SaaS observability and the current shift toward agentic AI. Building systems that reason over high-volume machine log streams is a core problem for cloud-native SaaS platforms, where reliability, security, and real-time insight are product features rather than back-office concerns. The emphasis on heterogeneous data, context engineering, and evaluation reflects the practical constraints of deploying LLM-based capabilities into production-grade SaaS.
From a SaaS career standpoint, the work maps to durable platform competencies: creating repeatable evaluation methods, curating “golden” datasets, and instrumenting AI behavior with reliability and observability in mind. Experience balancing model quality with operational considerations (latency, cost, monitoring) translates across SaaS companies introducing AI into customer-facing workflows. The cross-functional delivery described also mirrors how ML work is typically productized in SaaS—tightly coupled to infrastructure and user outcomes.
This position is best suited to an engineer who prefers ownership in a small team and is comfortable defining scope amid ambiguity. It will fit someone motivated by applied ML and LLM systems—especially those who enjoy turning research-adjacent ideas into maintainable components that can be tested, monitored, and iterated in production.
The section above is editorial commentary from The SaaS Jobs, provided to help SaaS professionals understand the role in a broader industry context.
Job Description
Senior Machine Learning Engineer I
The proliferation of machine log data has the potential to give organizations unprecedented real-time visibility into their infrastructure and operations. With this opportunity comes tremendous technical challenges around ingesting, managing, and understanding high-volume streams of heterogeneous data
As a Machine Learning Engineer, you’ll build the intelligence behind the next generation of agentic AI systems that reason over massive, heterogeneous log data. You’ll combine machine learning, prompt engineering, and rigorous evaluation to create autonomous AI agents that help organizations understand and act on their data in real time.
You’ll be part of a small, high-impact team shaping how AI agents understand complex machine data. This is an opportunity to work on cutting-edge LLM infrastructure and contribute to defining best practices in context engineering and AI observability.
Responsibilities
- Design, implement, and optimize agentic AI components including context engineering, memory management, and prompts.
- Develop and maintain golden datasets by defining sourcing strategies, working with data vendors, and ensuring quality and representativeness at scale.
- Prototype and evaluate novel prompting strategies and reasoning chains for model reliability and interpretability.
- Collaborate cross-functionally with product, data, and infrastructure teams to deliver end-to-end AI-powered insights.
- Operate autonomously in a fast-paced, ambiguous environment - defining scope, setting milestones, and driving outcomes.
- Ensure reliability, performance, and observability of deployed agents through rigorous testing and continuous improvement.
- Maintain a strong bias for action—delivering incremental, well-tested improvements that directly enhance customer experience.
Required Qualifications
- B.Tech, M.Tech, or Ph.D. in Computer Science, Data Science, or a related field.
- 4-6 years of hands-on industry experience with demonstrable ownership and delivery.
- Strong understanding of machine learning fundamentals, data pipelines, and model evaluation.
- Proficiency in Python and ML/data libraries (NumPy, pandas, scikit-learn); familiarity with JVM languages is a plus.
- Working knowledge of LLM core concepts, prompt design, and agentic design patterns.
- Strong communication skills and a passion for shaping emerging AI paradigms.
Desired Qualifications
- Prior experience building and deploying AI agents or LLM applications in production.
- Familiarity with modern agentic AI frameworks (e.g., LangGraph, LangChain, CrewAI).
- Experience with ML infrastructure and tooling (PyTorch, MLflow, Airflow, Docker, AWS).
- Exposure to LLM Ops - infrastructure optimization, observability, latency, and cost monitoring.
About Us
Sumo Logic, Inc. helps make the digital world secure, fast, and reliable by unifying critical security and operational data through its Intelligent Operations Platform. Built to address the increasing complexity of modern cybersecurity and cloud operations challenges, we empower digital teams to move from reaction to readiness—combining agentic AI-powered SIEM and log analytics into a single platform to detect, investigate, and resolve modern challenges. Customers around the world rely on Sumo Logic for trusted insights to protect against security threats, ensure reliability, and gain powerful insights into their digital environments. For more information, visitwww.sumologic.com.
Sumo Logic Privacy Policy. Employees will be responsible for complying with applicable federal privacy laws and regulations, as well as organizational policies related to data protection.