Location: Remote (USA, Canada, United Kingdom, Europe)
Sponsorship: Not available. We cannot support visas, work permits, or extensions in any country (including OPT/CPT, PGWP, Graduate Route, or similar programs).
Salary: Varies by region — see details below
Summary
DistroKid is the world’s largest distributor of music to Spotify, Apple Music, YouTube, and beyond. Most new music today is released through DistroKid.
We are seeking a highly skilled Senior Systems Operations Engineer with deep expertise in cloud infrastructure, Infrastructure-as-Code (IaC), and AI-enhanced operations. This role is a critical technical leadership position on the Systems Operations (SysOps) team, responsible for architecting and managing our cloud environment, driving IaC maturity, and integrating AI-powered practices that improve reliability, reduce toil, and scale our operational capabilities.
You will serve as a subject matter expert in infrastructure domains, own complex workstreams end-to-end, and partner strategically with peers, engineering teams, and guidance to deliver impactful outcomes across the organization. This is a fully remote position, and success in the role depends on clear, open, and proactive communication to keep distributed teammates informed, aligned, and unblocked.
What You’ll Do
Cloud & Infrastructure Architecture
- Design, deploy, and manage scalable and highly available cloud infrastructure on AWS, with deep expertise in core services (EC2, EKS, S3, RDS, IAM, VPC, and beyond).
Develop and maintain disaster recovery plans leveraging AWS capabilities for backup and replication to ensure business continuity.
Collaborate with engineering and security teams to improve infrastructure health, security, and long-term scalability.
Infrastructure as Code (IaC)
- Design reusable Terraform/OpenTofu modules following DRY principles and organizational standards; implement module versioning and lifecycle strategies.
- Direct the migration of manual infrastructure to code; establish patterns and best practices for IaC adoption across the team.
- Implement IaC testing strategies, including validation, linting, and integration testing, using tools such as Terraform-Compliance or Checkov.
- Architect and maintain complex Bitbucket pipeline configurations for multi-environment IaC deployments; implement pipeline security best practices.
AI-Enhanced Operations (AIOps)
- Implement AIOps practices, leveraging AI tools to enhance monitoring, incident response, and predictive alerting.
- Use AI-assisted development and operations tools (e.g., Cursor, Claude) to accelerate troubleshooting, code review, and documentation generation.
- Evaluate and implement AI-powered automation to reduce operational toil, improve repeatability, and scale platform capabilities.
Reliability & Observability
- Define and implement SLOs for services; guide and/or participate in incident response and conduct blameless postmortems.
- Implement chaos engineering practices to proactively identify system weaknesses before they impact production.
- Build and maintain comprehensive monitoring solutions using tools such as CloudWatch and Datadog to track performance and drive optimization.
Automation, Developer Experience & Internal Developer Portal
- Develop automation scripts and tools in Python, Bash, or similar languages to streamline operations and eliminate manual toil.
- Build self-service capabilities for development teams to reduce cognitive load and enable developer autonomy across the organization.
- Guide the solution architecture and end-to-end implementation of DistroKid’s first Internal Developer Portal (IDP).
- Define the IDP roadmap and success criteria in partnership with engineering leadership; establish golden paths, service catalogs, and self-service workflows that reduce deployment friction and accelerate developer productivity.
- Drive adoption of the IDP across engineering teams; gather feedback, iterate on the platform, and measure impact through developer experience metrics and reduced time-to-deploy.
Cost Optimization
- Guide cost optimization initiatives; implement rightsizing recommendations, reserved-capacity strategies, and tagging standards for cost allocation.
- Monitor and optimize AWS resource usage; select appropriate services and configurations to meet performance requirements cost-effectively.
Technical Leadership & Collaboration
- Direct planning, decision-making, and execution for infrastructure projects; own workstreams end-to-end.
- Partner cross-functionally with engineering, security, and product teams; communicate impact in terms of company strategy and OKRs.
- Provide technical mentorship to junior and mid-level engineers; invest in team growth and foster a culture of continuous learning.
- Maintain and contribute to infrastructure documentation, runbooks, and architectural decision records to ensure knowledge sharing and operational consistency.
Qualifications
Education
- Bachelor’s degree in Computer Science, Information Technology, a related field, or equivalent practical experience.
Experience
- 5+ years of experience in systems operations, platform engineering, or DevOps with a focus on cloud infrastructure and containerized environments.
- Proven production experience with AWS services (EC2, EKS, S3, RDS, IAM, VPC, API Gateway, Event Bridge, etc) and Kubernetes.
- 5+ years of hands-on experience with Infrastructure as Code tools, specifically Terraform and/or OpenTofu, including module design, state management, remote backends, and IaC testing.
Technical Skills
- Strong knowledge of Linux/Unix administration, systems, and shell scripting.
- Proficiency in Python, Go, or similar programming languages.
- Experience with CI/CD pipelines for infrastructure deployments (Bitbucket Pipelines, Jenkins, or similar).
- Experience with monitoring and observability tools (Prometheus, Grafana, CloudWatch, or Datadog).
- Demonstrated experience implementing or working with AIOps tools, practices, or AI-assisted operations in a professional context.
- Experience using AI-assisted development tools (e.g., Cursor, Warp, Claude, or similar) to accelerate engineering work.
Soft Skills
- Strong communication skills with the ability to engage effectively across technical and non-technical audiences.
- Practices open, transparent, and proactive communication in a fully remote environment; defaults to over-communication to keep distributed teammates informed and aligned across time zones and async workflows.
- Demonstrated ability to guide and influence without formal authority.
Excellent problem-solving skills with the composure to guide through incidents under pressure. - Ability to work in a fast-paced, dynamic environment with shifting priorities while maintaining a high-quality bar.
Preferred Qualifications
- AWS Certified Solutions Architect, DevOps Engineer, or equivalent certification.
- Prior experience designing or implementing an Internal Developer Portal (IDP) using platforms such as Backstage, Port, Cortex, or equivalent.
- Experience with policy-as-code tools such as OPA, Checkov, or Sentinel.
- Experience with service mesh technologies (Istio, Linkerd, or similar).
- Familiarity with Docker and container orchestration tools beyond Kubernetes.
This salary range ONLY applies to candidates living in the USA for this job. Rates may differ in other regions.
USA salary range
$155,000—$170,000 USD
This salary range ONLY applies to candidates living in the UK for this job. Rates may differ in other regions.
UK salary range
£100,000—£120,000 GBP
This salary range ONLY applies to candidates living in the EU for this job. Rates may differ in other regions.
EU salary range
€55.000—€110.000 EUR
This salary range ONLY applies to candidates living in Canada for this job. Rates may differ in other regions.
Canada salary range
$160,000—$180,000 CAD
What We Offer
- Retirement plans (401k, SIPP, etc.), Health insurance, Generous paid time off, Parental leave, Home office allowance, Flexible work schedules, Paid and discounted subscriptions, Regular engagement activities
About DistroKid
DistroKid helps millions of independent artists get their music into streaming services and keep 100% of their earnings. We move fast, stay curious, and build tools that empower creativity.
If you want your work to directly impact how artists share their music with the world, we’d love to hear from you.
DistroKid is an Equal Opportunity Employer
We are committed to building a diverse and inclusive team and strongly encourage applications from individuals of all backgrounds, identities, and experiences. We value a wide range of perspectives and believe that our differences make us stronger.