Performance and concurrency are recurring differentiators for SaaS products, and this contract focuses squarely on that layer of reliability. The scope spans API endpoints, workflow orchestration (Temporal), and AWS service interactions, reflecting the kind of distributed architecture common in modern SaaS platforms where customer experience is tightly coupled to latency and throughput under load.
From a SaaS career standpoint, the work builds durable expertise in defining and validating non-functional requirements, designing realistic load scenarios, and translating findings into engineering and product decisions. The emphasis on observability, service limits, and cross-component behavior maps well to roles in platform engineering, SRE-adjacent QA, and performance engineering across subscription products that must handle predictable concurrency spikes and sustained usage.
This role suits practitioners who prefer investigative, systems-level testing over feature QA, and who are comfortable forming a testing strategy with incomplete documentation. It also aligns with engineers who communicate clearly across technical and non-technical stakeholders, since success depends on making performance constraints legible and actionable within a short delivery window.
Position Overview
Monolith AI is seeking an experienced QA Engineer to lead load testing efforts for a critical system
release focused on improving concurrency and high request load handling. This fast-paced, short-
term engagement requires someone who can quickly understand complex distributed systems,
design comprehensive load tests, and work collaboratively with a rapidly growing engineering team
to ensure our new environment meets performance requirements.
Primary Responsibilities
Design and Implement Automated Load Testing Framework
◦ Develop comprehensive load tests for FastAPI endpoints, Temporal workflows/
activities, and AWS service interactions
◦ Create realistic test scenarios simulating concurrent workflow execution patterns,
including graph-based workflow orchestration
◦ Build automated test suites that measure system behavior under varying concurrency
levels and request loads
Performance Analysis and Bottleneck Identification
◦ Monitor and analyze system performance across the entire stack (API layer,
Temporal workers, AWS services)
◦ Identify concurrency limitations in Temporal workflow execution, AWS service
limits (Athena, ECS), and inter-component communication
◦ Document performance characteristics including response times, throughput limits,
and failure modes under load
Collaborate on Non-Functional Requirements (NFR) Definition
◦ Work with Customer Success and Product teams to understand business
requirements and translate them into measurable performance criteria
◦ Iterate on acceptable concurrency thresholds, latency targets, and throughput
requirements◦ Validate that proposed NFRs are realistic and achievable given architectural
constraints
System Documentation and Knowledge Extraction
◦ Understanding of the existing system through code review, discussions with the
development team, and exploratory testing
◦ Create clear documentation of test methodologies, results, and recommendations for
future testing
Recommendation and Optimization Guidance
◦ Provide actionable recommendations for removing identified bottlenecks
◦ Suggest configuration optimizations for Temporal (worker pools, task queues) and
AWS services (Athena concurrency, ECS capacity)
Rapid Communication and Status Reporting
◦ Maintain daily/frequent communication with the Tech Lead regarding project
progress, blockers, and findings
◦ Quickly escalate issues that could impact the aggressive timeline
◦ Present findings and recommendations to technical and non-technical stakeholders
Cross-Component Integration Testing
◦ Test complex scenarios involving graph execution triggering node workflows across
multiple system boundaries
◦ Validate S3 read/write operations under concurrent load
◦ Ensure inter-component communication (API → Temporal, Temporal Activity →
API triggers) performs reliably at scale
Key Performance Indicators
Test Coverage and Execution
◦ Complete automated load test suite covering all critical components within first 3
weeks
◦ Execute baseline and progressive load tests identifying maximum sustainable
concurrency levels
Bottleneck Identification and Impact
◦ Identify and document top 5-7 performance bottlenecks with clear impact analysis
◦ Provide actionable remediation recommendations with estimated effort and impact
for each bottleneck
3. NFR Definition and Validation
◦ Collaborate with stakeholders to define measurable NFRs within first 2 weeks
◦ Validate system meets or document gaps against agreed NFR criteria by project end
Documentation and Knowledge Transfer
◦ Deliver comprehensive test documentation, results analysis, and system performance
characteristics
◦ Conduct knowledge transfer sessions ensuring team can maintain and extend testing
framework
Project Velocity and Communication
◦ Meet weekly milestone targets in this fast-paced 2-month engagement
◦ Maintain proactive communication rhythm (daily standups, weekly detailed reports
to Tech Lead)
Required Qualifications
Experience:
architectures
Proven experience with load testing tools (e.g., k6, JMeter, Locust, Gatling, Artillery)
ECS, S3)
Technical Skills:
Temporal)
limiting
Familiarity with AWS infrastructure and service limits• Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or
similar)
Immediate Availability:
Preferred Qualifications
Essential Soft Skills
Self-Direction and Initiative:
documentation
Communication and Collaboration:
existing team members
audiences• Comfortable asking clarifying questions and challenging assumptions respectfully
Adaptability and Learning Agility:
Pragmatism and Results Orientation:
Stakeholder Management:
trade-offs
Key Challenges in This Role
Rapid Knowledge Acquisition with Limited Documentation
◦ The existing system lacks comprehensive documentation, requiring you to quickly
build understanding through code review, system exploration, and frequent
discussions with the development team
◦ Success requires comfort with ambiguity and strong investigative skills
Aggressive Timeline with High Impact
◦ A 3-month timeline to design tests, execute comprehensive load testing, identify
bottlenecks, and deliver actionable recommendations is extremely tight
◦ Must balance thoroughness with pragmatism; prioritize ruthlessly to ensure critical
areas are covered
Complex Distributed System with Multiple Integration Points
◦ The system involves multiple layers (FastAPI, Temporal, AWS services) with
complex inter-component communication patterns (graph → node workflows)◦ Must understand the entire stack sufficiently to design realistic, comprehensive load
tests that expose real-world bottlenecks