About the Role:
As a Cloud Ops Engineer at Wrike, you have advanced skills in supporting cloud and data center infrastructure with security in mind. You know how to work with monitoring and logging systems, containers, networking, automation, and debugging a reasonably complex infrastructure. You feel comfortable defining your own work based on the team OKRs. You can also help others do so when necessary. You are used to proposing meaningful improvements to the existing infrastructure in alignment with architects and tech leads, and you can drive the execution
In this role, you would join a core development team of 250+ engineers developing Wrike and become a part of the whole operations department which is exposed to various technologies and systems. Does this sound like you? If your answer is yes, we'd love to speak with you!
Team Dynamics:
We have two dozen folks in the SysOps Department, consisting of three teams distributed in Prague, Cyprus, and Tallinn. As a core member of our team you will be:
- Managing the Wrike product infrastructure
- Implementing reliable solutions to ensure a product uptime SLA of 99.9%
- Working with GCP, AWS and other cloud providers in the IaC paradigm
- Introducing and supporting new infrastructure services
- Actively participating in incident response and management, including on-call duties
- Developing and maintaining professional connections within and outside of the team
Technical Environment:
We run 150+ Java based SaaS applications in Kubernetes for a massive audience of over 20,000 organizations in 3 Data Centers both on-premises and in cloud.
Key technologies and tools include:
- Linux (core platform for all services)
- Kubernetes and ArgoCD (Service-oriented architecture)
- Nginx, HAproxy and Istio for load balancing
- GCP, AWS and Cloudflare are our cloud providers
- Zabbix and Prometheus (VictoriaMetrics) for monitoring and alerting
- Graylog, Elasticsearch/Logstash, Fluentd for centralized logging
- Puppet, Ansible and Terraform for defining everything as a code
- Python and Bash for automation and tooling
- Jenkins and Gitlab-CI for CI/CD
- PostgreSQL as DB platform
- Kafka and RabbitMQ for messaging
Your Impact:
- Lead the evolution of our enterprise-grade logging and monitoring platforms (Graylog, Elasticsearch, Zabbix, Prometheus), ensuring they scale with business growth.
- Design and extend observability pipelines (data ingestion, storage, correlation, and alerting).
- Partner with developers, SysOps, and security teams to proactively improve visibility, reliability, and incident response.
- Ensure availability, performance, and security of mission-critical Linux-based infrastructure.
- Drive automation-first approaches for infrastructure and operational tasks using Python, Bash, and configuration management/IaC tools.
- Influence architectural decisions with a strong site reliability engineering mindset, balancing performance, cost, and resilience.
Your Qualifications:
- Deep expertise in monitoring and logging ecosystems, ideally with Zabbix, Prometheus (VictoriaMetrics), Graylog, and Elasticsearch.
- Expert-level Linux administration skills with proven experience running large-scale, highly available infrastructure for SaaS/web applications.
- Hands-on production experience with Kubernetes.
- Strong background in infrastructure as code (Terraform, Ansible, or Puppet).
- Experience operating across multi-cloud and hybrid environments (AWS, GCP, on-prem).
- Experience working with cross-functional teams, driving initiatives, and acting as a technical authority.
- Effective communication skills (Upper-Intermediate English or higher).
Why Join Wrike?
- 28 calendar days of paid vacation
- Sick leave compensation
- Life insurance plan
- Health insurance plan
- Fitness plan (800 EUR/year)
- Parental leave
- 2 volunteer days
- Full-remote & On-demand access to Co-working space
- Utility allowance (30 EUR/month, subject to taxation)
Your recruitment buddy will be Alexandra Vorobyova, Lead Recruiter.
#LI-AV1