Why This Job is Featured on The SaaS Jobs
This Senior AI/MLOps Engineer role sits at a core SaaS intersection: taking applied machine learning for search and turning it into dependable, observable services that can be operated continuously. In the SaaS ecosystem, that “production ML” layer is where product differentiation meets platform reliability—especially for search, ranking, and recommendations that must perform consistently under real user traffic.
From a SaaS career standpoint, the work builds durable leverage: lifecycle ownership from packaging and deployment through monitoring, rollback strategies, and cost/performance trade-offs in cloud environments. Experience designing CI/CD for ML workloads, defining service-level indicators, and managing model versioning translates across many SaaS categories that are adding AI features without compromising uptime or governance. The role also connects engineering decisions to product iteration via experimentation and analytics, a common operating model in mature SaaS teams.
This position is best suited to an engineer who prefers end-to-end responsibility and operational rigor over isolated model development. It fits someone comfortable collaborating across data science, product, and reliability functions, and who values repeatable systems—pipelines, tooling, and standards—more than one-off deployments. The hybrid expectation also signals a working style that blends independent execution with periodic in-person alignment.
The section above is editorial commentary from The SaaS Jobs, provided to help SaaS professionals understand the role in a broader industry context.
Job Description
THE MISSION
We are building the next generation of AI powered search products. We make AI explainable and we help customers make data driven decisions through. Work with the product function to guide product development through use of analytics and experimentation. You will be an integral part of building the future of AI search. If you’re passionate about turning product data into actionable insights and driving product success, we’d love to hear from you.
THE OPPORTUNITY
We are seeking a skilled Senior AI / ML Ops Engineer to enable our Data Scientists to move faster and our customers to receive smarter search & discovery experiences by turning prototypes into robust, scalable, and observable AI services. You will own the end-to-end engineering life-cycle—packaging, deploying, operating, and continuously improving machine-learning models that power search ranking, recommendations, and related information-retrieval features on our e-commerce platform.
What you'll be doing:
- Productionization & Packaging: Convert notebooks and research codebase into production-ready Python and Go micro-services, libraries, or kubeflow pipelines, and design reproducible build pipelines (Docker, Conda, Poetry) and manage artefacts in centralized registries.
- Scalable Deployment: Orchestrate real-time and batch inference workloads on Kubernetes, AWS/GCP managed services, or similar platforms, ensuring low latency and high throughput, and Implement blue-green / canary rollouts, automatic rollback, and model versioning strategies (SageMaker, Vertex AI, KServe, MLflow, BentoML, etc.).
- MLOps & CI/CD: Build and maintain CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, Argo) covering unit, integration, data-quality, and performance tests, and Automate feature store updates, model retraining triggers, and scheduled batch jobs using Airflow, Dagster, or similar orchestration tools.
- Observability & Reliability: Define and monitor SLIs/SLOs for model latency, throughput, accuracy, drift, and cost, and Integrate logging, tracing, and metrics (Datadog etc.) and establish alerting & on-call practices.
- Data & Feature Engineering: Collaborate with data engineers to create scalable pipelines that ingest clickstream logs, catalog metadata, images, and user signals, and Implement real-time and offline feature extraction, validation, and lineage tracking.
- Performance & Cost Optimization: Profile models and services; leverage hardware acceleration (GPU, TPU), libraries (ONNX, OpenVINO), and caching strategies (Redis, Faiss) to meet aggressive latency targets, and Right-size clusters and workloads to balance performance with cloud spend.
- Governance & Compliance: Embed security, privacy, and responsible-AI checks in pipelines; manage secrets, IAM roles, and data-access controls via Terraform or CloudFormation, and Ensure auditability and reproducibility through comprehensive documentation and artifact tracking.
- Collaboration & Mentorship: Partner closely with Data Scientists, Product Owners, and Site Reliability Engineers to align technical solutions with business goals, and Coach junior engineers on MLOps best practices and contribute to internal knowledge-sharing sessions.
Role Requirements:
- Spend 1-2 days per week in a local coworking space to collaborate with your teammates in person.
- 5+ years of experience in software engineering with 2+ years focused on deploying ML/AI systems at scale.
- Strong coding skills in Python (preferred) and at least one statically typed language (Go preferred).
- Hands-on expertise with containerization (Docker), orchestration (Kubernetes/EKS/GKE/AKS), and cloud platforms (AWS, GCP, or Azure).
- Proven record of building CI/CD pipelines and automated testing frameworks for data or ML workloads.
- Deep understanding of REST/gRPC APIs, message queues (Kafka, Kinesis, Pub/Sub), and stream/batch data processing frameworks (Spark, Flink, Beam).
- Experience implementing monitoring, alerting, and logging for mission-critical services.
- Familiarity with common ML lifecycle tools (MLflow, Kubeflow, SageMaker, Vertex AI, Feature Store, etc.).
- Working knowledge of ML concepts such as feature engineering, model evaluation, A/B testing, and drift detection.
#LI-Hybrid