QA Engineer - Load Testing Specialist (2 months contract)

Monolith AI • London, UK • 1m ago

Why This Job is Featured on The SaaS Jobs

Performance and concurrency are recurring differentiators for SaaS products, and this contract focuses squarely on that layer of reliability. The scope spans API endpoints, workflow orchestration (Temporal), and AWS service interactions, reflecting the kind of distributed architecture common in modern SaaS platforms where customer experience is tightly coupled to latency and throughput under load.

From a SaaS career standpoint, the work builds durable expertise in defining and validating non-functional requirements, designing realistic load scenarios, and translating findings into engineering and product decisions. The emphasis on observability, service limits, and cross-component behavior maps well to roles in platform engineering, SRE-adjacent QA, and performance engineering across subscription products that must handle predictable concurrency spikes and sustained usage.

This role suits practitioners who prefer investigative, systems-level testing over feature QA, and who are comfortable forming a testing strategy with incomplete documentation. It also aligns with engineers who communicate clearly across technical and non-technical stakeholders, since success depends on making performance constraints legible and actionable within a short delivery window.

The section above is editorial commentary from The SaaS Jobs, provided to help SaaS professionals understand the role in a broader industry context.

Job Description

Software · London, United Kingdom · Temporarily Remote

QA Engineer - Load Testing Specialist (2 months contract)

Position Overview

Monolith AI is seeking an experienced QA Engineer to lead load testing efforts for a critical system

release focused on improving concurrency and high request load handling. This fast-paced, short-

term engagement requires someone who can quickly understand complex distributed systems,

design comprehensive load tests, and work collaboratively with a rapidly growing engineering team

to ensure our new environment meets performance requirements.

Primary Responsibilities

Design and Implement Automated Load Testing Framework

◦ Develop comprehensive load tests for FastAPI endpoints, Temporal workflows/

activities, and AWS service interactions

◦ Create realistic test scenarios simulating concurrent workflow execution patterns,

including graph-based workflow orchestration

◦ Build automated test suites that measure system behavior under varying concurrency

levels and request loads

Performance Analysis and Bottleneck Identification

◦ Monitor and analyze system performance across the entire stack (API layer,

Temporal workers, AWS services)

◦ Identify concurrency limitations in Temporal workflow execution, AWS service

limits (Athena, ECS), and inter-component communication

◦ Document performance characteristics including response times, throughput limits,

and failure modes under load

Collaborate on Non-Functional Requirements (NFR) Definition

◦ Work with Customer Success and Product teams to understand business

requirements and translate them into measurable performance criteria

◦ Iterate on acceptable concurrency thresholds, latency targets, and throughput

requirements◦ Validate that proposed NFRs are realistic and achievable given architectural

constraints

System Documentation and Knowledge Extraction

◦ Understanding of the existing system through code review, discussions with the

development team, and exploratory testing

◦ Create clear documentation of test methodologies, results, and recommendations for

future testing

Recommendation and Optimization Guidance

◦ Provide actionable recommendations for removing identified bottlenecks

◦ Suggest configuration optimizations for Temporal (worker pools, task queues) and

AWS services (Athena concurrency, ECS capacity)

Rapid Communication and Status Reporting

◦ Maintain daily/frequent communication with the Tech Lead regarding project

progress, blockers, and findings

◦ Quickly escalate issues that could impact the aggressive timeline

◦ Present findings and recommendations to technical and non-technical stakeholders

Cross-Component Integration Testing

◦ Test complex scenarios involving graph execution triggering node workflows across

multiple system boundaries

◦ Validate S3 read/write operations under concurrent load

◦ Ensure inter-component communication (API → Temporal, Temporal Activity →

API triggers) performs reliably at scale

Key Performance Indicators

Test Coverage and Execution

◦ Complete automated load test suite covering all critical components within first 3

weeks

◦ Execute baseline and progressive load tests identifying maximum sustainable

concurrency levels

Bottleneck Identification and Impact

◦ Identify and document top 5-7 performance bottlenecks with clear impact analysis

◦ Provide actionable remediation recommendations with estimated effort and impact

for each bottleneck

3. NFR Definition and Validation

◦ Collaborate with stakeholders to define measurable NFRs within first 2 weeks

◦ Validate system meets or document gaps against agreed NFR criteria by project end

Documentation and Knowledge Transfer

◦ Deliver comprehensive test documentation, results analysis, and system performance

characteristics

◦ Conduct knowledge transfer sessions ensuring team can maintain and extend testing

framework

Project Velocity and Communication

◦ Meet weekly milestone targets in this fast-paced 2-month engagement

◦ Maintain proactive communication rhythm (daily standups, weekly detailed reports

to Tech Lead)

Required Qualifications

Experience:

4+ years of experience in QA/performance testing roles

2+ years of hands-on experience with load testing distributed systems and microservices

architectures

Proven experience with load testing tools (e.g., k6, JMeter, Locust, Gatling, Artillery)

Experience testing workflow orchestration systems (Temporal, Airflow, Prefect, or similar)

Demonstrated ability to test systems integrating with AWS services (particularly Athena,

ECS, S3)

Technical Skills:

Strong proficiency in Python (required for test automation and working with FastAPI/

Temporal)

Experience with REST API testing and performance validation

Understanding of distributed systems concepts: concurrency, queueing, backpressure, rate

limiting

Familiarity with AWS infrastructure and service limits• Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or

similar)

Proficiency with Git and CI/CD pipelines

Ability to read and understand code in order to design effective tests

Immediate Availability:

Ability to start in early January 2025 and commit to focused 3-month engagement

Availability for full-time contract work during project duration

Preferred Qualifications

Direct experience with http://Temporal.io (workflows, activities, workers)

Experience with containerized workloads and Docker/ECS

Prior work in fast-paced startup or scale-up environments

Experience with infrastructure-as-code (Terraform, CloudFormation)

Background in Site Reliability Engineering (SRE) or DevOps practices

Familiarity with data processing pipelines and analytics systems

Previous contract/consulting experience with rapid knowledge acquisition

Experience with graph-based workflow systems or DAG execution engines

Knowledge of AWS service limits and optimization strategies

Essential Soft Skills

Self-Direction and Initiative:

Ability to operate independently in an ambiguous, fast-moving environment with minimal

documentation

Proactive problem-solving mindset; doesn't wait for perfect information before taking action

Comfortable making pragmatic decisions quickly in a time-constrained project

Communication and Collaboration:

Exceptional communication skills for extracting knowledge through conversations with

existing team members

Ability to translate technical findings into clear, actionable recommendations for diverse

audiences• Comfortable asking clarifying questions and challenging assumptions respectfully

Strong written communication for documentation and status updates

Adaptability and Learning Agility:

Quick learner who can rapidly understand complex, poorly documented systems

Flexible and comfortable with changing priorities in a 15-person team that's doubling in size

Thrives in fast-paced environments with aggressive timelines

Pragmatism and Results Orientation:

Focused on delivering practical, actionable outcomes within tight timeframes

Understands the balance between thoroughness and speed in a 2-month engagement

Comfortable with "good enough" when perfect isn't achievable within constraints

Stakeholder Management:

Skilled at managing expectations with technical leadership about realistic timelines and

trade-offs

Diplomatic when delivering difficult news about performance limitations or bottlenecks

Collaborative approach when working with CS and Product on NFR definition

Key Challenges in This Role

Rapid Knowledge Acquisition with Limited Documentation

◦ The existing system lacks comprehensive documentation, requiring you to quickly

build understanding through code review, system exploration, and frequent

discussions with the development team

◦ Success requires comfort with ambiguity and strong investigative skills

Aggressive Timeline with High Impact

◦ A 3-month timeline to design tests, execute comprehensive load testing, identify

bottlenecks, and deliver actionable recommendations is extremely tight

◦ Must balance thoroughness with pragmatism; prioritize ruthlessly to ensure critical

areas are covered

Complex Distributed System with Multiple Integration Points

◦ The system involves multiple layers (FastAPI, Temporal, AWS services) with

complex inter-component communication patterns (graph → node workflows)◦ Must understand the entire stack sufficiently to design realistic, comprehensive load

tests that expose real-world bottlenecks