SRE Lead

Nexthink • Full-time • Bengaluru, Karnataka, India • 3d ago

Company Description

Nexthink is the leader in digital employee experience management software. The company provides IT leaders with unprecedented insight allowing them to see, diagnose and fix issues at scale impacting employees anywhere, with any application or network, before employees notice the issue. As the first solution to allow IT to progress from reactive problem solving to proactive optimization, Nexthink enables its more than 1,200 customers to provide better digital experiences to more than 15 million employees. Dual headquartered in Lausanne, Switzerland and Boston, Massachusetts, Nexthink has 9 offices worldwide.

#LI-Hybrid

Job Description

Nexthink is looking for a Lead Site Reliability Engineer who is passionate about building and running a high-performance cloud platform and enabling best-in-class site reliability and operations practices. This role will support Nexthink operations globally. The candidate will drive the development of modern, cloud-native SRE processes and the management and operations for Nexthink’s multi-tenant, microservices-based cloud platform. The platform has multiple instances deployed across the globe.

This role involves working closely with cross-functional teams to integrate reliability and security into our systems, ensuring they meet standards. The ideal candidate will have extensive experience in both software engineering and systems administration, with a strong understanding of SRE concepts, requirements and security practices.

Leadership and Team Management:

Lead, mentor, and develop a team of India-based Site Reliability Engineers.
Foster a culture of continuous improvement, collaboration, and innovation.

Infrastructure Management:

Oversee the design, deployment, and management of scalable and secure cloud infrastructure.
Drive automation of infrastructure provisioning, configuration, and management using Infrastructure as Code (IaC) tools.

Monitoring and Performance:

Develop and maintain comprehensive monitoring, logging, and alerting systems to ensure high availability and performance.
Lead efforts in performance tuning and optimization for applications and infrastructure.

Security and Compliance:

Ensure implementation and maintenance of security controls and best practices to achieve compliance with standards and certifications.
Conduct and oversee regular security assessments, vulnerability scans, and penetration testing.
Collaborate with the compliance team to prepare for and respond to audits.

Incident Management:

Lead incident management efforts, ensuring rapid resolution and thorough root cause analysis.
Develop and implement strategies for improving incident response and minimizing downtime.

Collaboration and Communication:

Work closely with development, operations, and security teams to integrate reliability and security into the software development lifecycle.
Communicate effectively with stakeholders, providing regular updates on system performance, reliability, and compliance status.

Qualifications

Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
5+ years of experience in site reliability engineering, DevOps, or a related role, with at least 2 years in a leadership position.
Proficiency in cloud platforms (AWS, Azure, GCP) and cloud-native services.
Strong scripting and programming skills (Python, Bash, Go, or similar).
Experience with Infrastructure as Code (IaC) tools such as Terraform, CrossPlane, CloudFormation, or Ansible.
Knowledge of containerization and orchestration (Docker, Kubernetes).
Familiarity with CI/CD pipelines and tools (Jenkins, GitLab, GitHub, etc.).
In-depth knowledge of standards (ISO, SOC2...) requirements and best practices.
Experience with security tools and practices (SIEM, IDS/IPS, firewalls).
Understanding of network security, encryption, and secure software development practices.
Ability to collaborate with and foster effective communication with global and multicultural engineering teams in EU and US timezones.
Ability to report timely and effectively to the upper engineering management.

#LI-Hybrid

Additional Information

We are the pioneers and trailblazers of a global IT Market Category (DEX) that is shaping the future of how the world works, giving our customers’ IT Teams total digital visibility across their enterprise. Our innovative solutions integrate real-time analytics, automation, and employee feedback across all endpoints. This enables our IT teams to solve complex technical challenges, create ever more productive workplaces, and deliver happy, satisfied employees in the digital workplace.

With over 1000 employees across 5 continents, Nexthink operates as One Team, connecting, collaborating and innovating to continuously grow. We call our employees ‘Nexthinkers’ and our commitment to diversity, inclusion, and equity is second to none. We currently have over 75 nationalities working with us, from all cultures and backgrounds, speaking many different languages.

If you are looking for a change and like a nice atmosphere, lots of challenges, and having fun while working, this is a great opportunity for you! Check what we offer:

💼 Permanent Contract and a competitive compensation package (including stock options).
🏡 Hybrid work model balancing office and remote work, with a structured approach for new hires to foster connections and onboarding.
🏖️ Flexible Hours and unlimited vacation (employees have unlimited paid time off on top of the 22 days of holidays we offer) plus 3 company-paid volunteer days.
🍉 Fresh fruit, cookies, and soft drinks as well.
🤝 Regular company and team events like Voluntary Days, Pizza talks, Team Building activities, hosting Meetups at the office and more!
📣 Bonuses for referring successful hires after three months of continuous employment.

Please note that not all the benefits listed above are available for temporary, contract, and internship roles. To ensure you have the most up-to-date information, we recommend checking with your Recruitment Partner.