Your role
As a Sr. System Administrator on the Data Centers team, you’ll own both our hardware and cloud infrastructures, with a particular focus on reliability, scalability, and operational excellence. You’ll manage the administration and lifecycle of our global server fleet and lead key initiatives in automation, observability, and performance.
You’ll also take the lead on capacity planning, including physical data center needs, and will be a technical partner to stakeholders across Engineering as we evolve our infrastructure footprint. In addition, you’ll evaluate, propose, and implement new project initiatives such as intrusion detection/prevention, system/security hardening, and automating manual processes.
This position reports to our Engineering Manager, Data Centers and Networking, and is based in Vancouver, Canada (with flexibility to collaborate across global time zones).
What you’ll do
- Scout, evaluate, and compare hardware options and colocation facilities, partnering with Engineering to align decisions with performance and cost objectives.
- Design and deploy a cloud expansion strategy that balances reliability, performance, and efficiency across providers and regions.
- Steer capacity planning and our expansion/upgrade strategy, using data to anticipate growth and proactively mitigate bottlenecks.
- Design and deploy servers at scale into data centers around the globe, ensuring consistent standards and automation from day one.
- Develop and maintain automation for a large fleet of servers, VMs, and containers, reducing toil and improving consistency across environments.
- Work with vendors to obtain quotes, make purchases, and schedule services, including coordinating logistics for data center installations and maintenance.
- Set up and evolve monitoring for server, network, and data center health, including alerting, dashboards, and SLO-oriented metrics.
- Develop and maintain proper documentation for engineering staff, including runbooks, standards, and architectural diagrams.
- Participate in a rotating on-call schedule within the larger Infrastructure Engineering division, helping drive rapid incident response and robust post-incident reviews.
- Lead complex systems and network troubleshooting, fault analysis, and resolution, acting as an escalation point for the broader team.
- Provide technical mentorship to other System Administrators and engineers, sharing best practices around Linux, automation, and operational excellence.
- Partner with Security and other stakeholders on initiatives such as system hardening, compliance, and intrusion detection/prevention.
- Occasionally travel for on-site work when remote hands are not available; an active passport is required.
Skills you’ll bring
- Background in Systems and/or Software Engineering, with a strong focus on infrastructure and operations.
- Extensive experience with Linux, both on-premise and in the cloud, including performance tuning, troubleshooting, and automation at scale.
- Familiarity with networking technologies: TCP/IP, DHCP, DNS, routing, firewalls, and load balancing concepts.
- Data center setup/deployment experience, including racking/stacking, cabling standards, and remote management.
- Exposure to cloud platforms such as GCP or AWS, and experience working in hybrid environments.
- Demonstrated ability to keep abreast of industry standards and trends, and to translate them into practical improvements in a production environment.
- Proven experience in a senior or lead capacity (typically 5+ years in systems administration or similar roles), including driving cross-team initiatives and mentoring others.
- Strong communication skills and the ability to collaborate effectively with distributed teams.