Position Overview
Monolith AI is seeking an experienced QA Engineer to lead load testing efforts for a critical system
release focused on improving concurrency and high request load handling. This fast-paced, short-
term engagement requires someone who can quickly understand complex distributed systems,
design comprehensive load tests, and work collaboratively with a rapidly growing engineering team
to ensure our new environment meets performance requirements.
Primary Responsibilities
Design and Implement Automated Load Testing Framework
◦ Develop comprehensive load tests for FastAPI endpoints, Temporal workflows/
activities, and AWS service interactions
◦ Create realistic test scenarios simulating concurrent workflow execution patterns,
including graph-based workflow orchestration
◦ Build automated test suites that measure system behavior under varying concurrency
levels and request loads
Performance Analysis and Bottleneck Identification
◦ Monitor and analyze system performance across the entire stack (API layer,
Temporal workers, AWS services)
◦ Identify concurrency limitations in Temporal workflow execution, AWS service
limits (Athena, ECS), and inter-component communication
◦ Document performance characteristics including response times, throughput limits,
and failure modes under load
Collaborate on Non-Functional Requirements (NFR) Definition
◦ Work with Customer Success and Product teams to understand business
requirements and translate them into measurable performance criteria
◦ Iterate on acceptable concurrency thresholds, latency targets, and throughput
requirements◦ Validate that proposed NFRs are realistic and achievable given architectural
constraints
System Documentation and Knowledge Extraction
◦ Understanding of the existing system through code review, discussions with the
development team, and exploratory testing
◦ Create clear documentation of test methodologies, results, and recommendations for
future testing
Recommendation and Optimization Guidance
◦ Provide actionable recommendations for removing identified bottlenecks
◦ Suggest configuration optimizations for Temporal (worker pools, task queues) and
AWS services (Athena concurrency, ECS capacity)
Rapid Communication and Status Reporting
◦ Maintain daily/frequent communication with the Tech Lead regarding project
progress, blockers, and findings
◦ Quickly escalate issues that could impact the aggressive timeline
◦ Present findings and recommendations to technical and non-technical stakeholders
Cross-Component Integration Testing
◦ Test complex scenarios involving graph execution triggering node workflows across
multiple system boundaries
◦ Validate S3 read/write operations under concurrent load
◦ Ensure inter-component communication (API → Temporal, Temporal Activity →
API triggers) performs reliably at scale
Key Performance Indicators
Test Coverage and Execution
◦ Complete automated load test suite covering all critical components within first 3
weeks
◦ Execute baseline and progressive load tests identifying maximum sustainable
concurrency levels
Bottleneck Identification and Impact
◦ Identify and document top 5-7 performance bottlenecks with clear impact analysis
◦ Provide actionable remediation recommendations with estimated effort and impact
for each bottleneck
3. NFR Definition and Validation
◦ Collaborate with stakeholders to define measurable NFRs within first 2 weeks
◦ Validate system meets or document gaps against agreed NFR criteria by project end
Documentation and Knowledge Transfer
◦ Deliver comprehensive test documentation, results analysis, and system performance
characteristics
◦ Conduct knowledge transfer sessions ensuring team can maintain and extend testing
framework
Project Velocity and Communication
◦ Meet weekly milestone targets in this fast-paced 2-month engagement
◦ Maintain proactive communication rhythm (daily standups, weekly detailed reports
to Tech Lead)
Required Qualifications
Experience:
architectures
Proven experience with load testing tools (e.g., k6, JMeter, Locust, Gatling, Artillery)
ECS, S3)
Technical Skills:
Temporal)
limiting
Familiarity with AWS infrastructure and service limits• Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or
similar)
Immediate Availability:
Preferred Qualifications
Essential Soft Skills
Self-Direction and Initiative:
documentation
Communication and Collaboration:
existing team members
audiences• Comfortable asking clarifying questions and challenging assumptions respectfully
Adaptability and Learning Agility:
Pragmatism and Results Orientation:
Stakeholder Management:
trade-offs
Key Challenges in This Role
Rapid Knowledge Acquisition with Limited Documentation
◦ The existing system lacks comprehensive documentation, requiring you to quickly
build understanding through code review, system exploration, and frequent
discussions with the development team
◦ Success requires comfort with ambiguity and strong investigative skills
Aggressive Timeline with High Impact
◦ A 3-month timeline to design tests, execute comprehensive load testing, identify
bottlenecks, and deliver actionable recommendations is extremely tight
◦ Must balance thoroughness with pragmatism; prioritize ruthlessly to ensure critical
areas are covered
Complex Distributed System with Multiple Integration Points
◦ The system involves multiple layers (FastAPI, Temporal, AWS services) with
complex inter-component communication patterns (graph → node workflows)◦ Must understand the entire stack sufficiently to design realistic, comprehensive load
tests that expose real-world bottlenecks