Job Details
Job Location: Fort Collins, CO
Position Type: Full Time
Salary Range: $132800.00 - $196500.00 Salary
Travel Percentage: Negligible
Job Shift: Day
Description
Position Overview
The Lead Platform Engineer is a senior technical leader responsible for driving BillGO’s infrastructure strategy and evolution across Platform Engineering, Site Reliability Engineering (SRE), and DevOps. This role emphasizes architectural leadership, technical influence, and enablement over day-to-day operations.
A key outcome of this role is the successful design and deployment of self-service capabilities and tools that empower autonomous software engineering teams—reducing dependency on centralized infrastructure resources and accelerating delivery velocity.
This leader will also play a critical role in advancing end-to-end observability and telemetry across infrastructure and applications to improve reliability, reduce MTTR, and enable faster incident triage. As BillGO expands its use of AI, this role will help design and operationalize AI infrastructure components, including model-serving platforms and vector databases, in a scalable, compliant, and cost-optimized way within AWS.
While not responsible for people management or full-time domain ownership, this leader will contribute hands-on as needed based on business priorities and team workload. They will collaborate extensively across teams, aligning infrastructure strategy with BillGO’s broader business objectives and 10x growth strategy.
What You’ll Do
All other duties as assigned, plus…
Technical & Architectural Leadership
- Define, evolve, and communicate the architectural vision for infrastructure, platform, and delivery systems across AWS, Kubernetes, and CI/CD ecosystems.
- Establish and drive adoption of infrastructure, reliability, security, and observability standards,
- including reference architectures, patterns, and best practices.
- Provide expert guidance across infrastructure, reliability, and automation domains—ensuring scalability, resilience, security, and cost effectiveness.
- Deliver self-service infrastructure, platform, and deployment capabilities to enable autonomous development teams.
- Contribute hands-on within key domains (Platform, SRE, DevOps) when necessary to drive outcomes, validate patterns, or remove critical blockers.
Self-Service Enablement
- Lead initiatives that simplify and automate infrastructure provisioning, deployment, and operational processes.
- Drive adoption of internal developer platforms, infrastructure-as-code modules, and standardized automation patterns that reduce dependence on a centralized cloud engineering team.’Enable teams to onboard easily to shared platforms, standards, and pipelines through documentation, tooling, and guardrails.
- Establish metrics to measure self-service maturity, engineering autonomy, and dependency reduction on centralized teams.
Monitoring, Observability & Telemetry
- Design and operationalize monitoring and observability architectures that provide end-to-end visibility across infrastructure, applications, and third-party integrations.
- Advance telemetry capabilities (metrics, logs, traces, events) to support proactive detection, faster incident triage, and data-driven reliability improvements.
- Define and standardize SLIs/SLOs, error budgets, and alerting practices, and influence their adoption across teams.
- Partner with engineering teams to embed observability into services by default (e.g., instrumentation standards, dashboards, runbooks).
AI & Future-State Platform Capabilities
- Design and operationalize AI infrastructure components within AWS, including model-serving platforms, vector databases, and related data/feature pipelines.
- Ensure AI workloads are secure, compliant, observable, and cost-optimized, aligned with regulatory and business requirements.
- Collaborate with data, AI/ML, and product teams to understand use cases and translate them into scalable, supportable infrastructure patterns.
- Evaluate emerging technologies (e.g., new AI services, managed vector stores, observability tools) and make recommendations that align with BillGO’s 10x strategy.
Cross-Functional Collaboration
- Build strong, trusted relationships with engineering, security, product, data/AI, and leadership stakeholders.
- Facilitate alignment across diverse priorities, finding common ground to achieve business and technical objectives.
- Champion collaboration and shared ownership across teams to support BillGO’s 10x strategy and long-term scalability.
- Represent infrastructure interests in strategic planning and roadmap discussions, ensuring alignment with organizational goals and platform standards.
Team Prioritization & Workload Management
- Partner with the Sr. Director of Platform Engineering & Technology Operations to shape and maintain the team’s backlog, priorities, and delivery roadmap.
- Help balance strategic initiatives, technical debt, and operational work across the team based on capacity and business impact.
- Assist with grooming and organizing work items so engineers have clear, actionable priorities and understand trade-offs.
- Provide input and feedback on workload distribution to ensure a sustainable pace and effective use of team skills.
Coaching & Technical Mentorship
- Mentor engineers across Platform, SRE, DevOps, and related domains, promoting a culture of continuous learning and improvement.
- Support the Sr. Director of Platform Engineering & Technology Operations in developing technical competencies, architectural maturity, and standards adoption across the organization.
- Serve as a role model for technical excellence, accountability, and pragmatic problem-solving, especially around observability, automation, and platform design.
Reliability & Delivery Excellence
- Define and influence reliability and observability standards (SLIs/SLOs, error budgets, incident response patterns, chaos engineering practices).
- Promote modern DevSecOps practices, CI/CD automation, and operational resilience across services and platforms.
- Partner with teams to identify and eliminate friction in software delivery and operations, using telemetry and incident data to drive continuous improvement.
- Ensure that monitoring, alerting, and runbooks are in place and effective for critical systems and services.
Qualifications
What You Bring
- 8+ years of experience in infrastructure, platform, SRE, or DevOps engineering, with proven architectural or technical leadership.
- Strong technical depth in cloud architecture (AWS), Kubernetes, Terraform, CI/CD, and automation frameworks.
- Demonstrated experience delivering self-service platforms and developer enablement capabilities that increase team autonomy.
- Hands-on experience with observability and monitoring platforms, and a track record of improving reliability and incident response through better telemetry.
- Exposure to or experience with AI/ML infrastructure (e.g., model serving, vector databases, or data/feature pipelines) in a cloud environment is a strong plus.
- Skilled at balancing strategic vision with hands-on execution as business needs dictate.
- Proven ability to build consensus and promote productive partnerships across diverse technical and business teams.
- Experience assisting with or influencing team prioritization, backlog management, or workload planning in an engineering context.
- Excellent communication, systems thinking, and mentoring skills.
- Strong alignment with BillGO’s culture of innovation, autonomy, and 10x strategic growth.