QA Engineer - Load Testing Specialist (2 months contract)
Position Overview
Monolith AI is seeking an experienced QA Engineer to lead load testing efforts for a critical system
release focused on improving concurrency and high request load handling. This fast-paced, short-
term engagement requires someone who can quickly understand complex distributed systems,
design comprehensive load tests, and work collaboratively with a rapidly growing engineering team
to ensure our new environment meets performance requirements.
Primary Responsibilities
Design and Implement Automated Load Testing Framework
◦ Develop comprehensive load tests for FastAPI endpoints, Temporal workflows/
activities, and AWS service interactions
◦ Create realistic test scenarios simulating concurrent workflow execution patterns,
including graph-based workflow orchestration
◦ Build automated test suites that measure system behavior under varying concurrency
levels and request loads
Performance Analysis and Bottleneck Identification
◦ Monitor and analyze system performance across the entire stack (API layer,
Temporal workers, AWS services)
◦ Identify concurrency limitations in Temporal workflow execution, AWS service
limits (Athena, ECS), and inter-component communication
◦ Document performance characteristics including response times, throughput limits,
and failure modes under load
Collaborate on Non-Functional Requirements (NFR) Definition
◦ Work with Customer Success and Product teams to understand business
requirements and translate them into measurable performance criteria
◦ Iterate on acceptable concurrency thresholds, latency targets, and throughput
requirements◦ Validate that proposed NFRs are realistic and achievable given architectural
constraints
System Documentation and Knowledge Extraction
◦ Understanding of the existing system through code review, discussions with the
development team, and exploratory testing
◦ Create clear documentation of test methodologies, results, and recommendations for
future testing
Recommendation and Optimization Guidance
◦ Provide actionable recommendations for removing identified bottlenecks
◦ Suggest configuration optimizations for Temporal (worker pools, task queues) and
AWS services (Athena concurrency, ECS capacity)
Rapid Communication and Status Reporting
◦ Maintain daily/frequent communication with the Tech Lead regarding project
progress, blockers, and findings
◦ Quickly escalate issues that could impact the aggressive timeline
◦ Present findings and recommendations to technical and non-technical stakeholders
Cross-Component Integration Testing
◦ Test complex scenarios involving graph execution triggering node workflows across
multiple system boundaries
◦ Validate S3 read/write operations under concurrent load
◦ Ensure inter-component communication (API → Temporal, Temporal Activity →
API triggers) performs reliably at scale
Key Performance Indicators
Test Coverage and Execution
◦ Complete automated load test suite covering all critical components within first 3
weeks
◦ Execute baseline and progressive load tests identifying maximum sustainable
concurrency levels
Bottleneck Identification and Impact
◦ Identify and document top 5-7 performance bottlenecks with clear impact analysis
◦ Provide actionable remediation recommendations with estimated effort and impact
for each bottleneck
3. NFR Definition and Validation
◦ Collaborate with stakeholders to define measurable NFRs within first 2 weeks
◦ Validate system meets or document gaps against agreed NFR criteria by project end
Documentation and Knowledge Transfer
◦ Deliver comprehensive test documentation, results analysis, and system performance
characteristics
◦ Conduct knowledge transfer sessions ensuring team can maintain and extend testing
framework
Project Velocity and Communication
◦ Meet weekly milestone targets in this fast-paced 2-month engagement
◦ Maintain proactive communication rhythm (daily standups, weekly detailed reports
to Tech Lead)
Required Qualifications
Experience:
4+ years of experience in QA/performance testing roles
2+ years of hands-on experience with load testing distributed systems and microservices
architectures
Proven experience with load testing tools (e.g., k6, JMeter, Locust, Gatling, Artillery)
Experience testing workflow orchestration systems (Temporal, Airflow, Prefect, or similar)
Demonstrated ability to test systems integrating with AWS services (particularly Athena,
ECS, S3)
Technical Skills:
Strong proficiency in Python (required for test automation and working with FastAPI/
Temporal)
Experience with REST API testing and performance validation
Understanding of distributed systems concepts: concurrency, queueing, backpressure, rate
limiting
Familiarity with AWS infrastructure and service limits• Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or
similar)
Proficiency with Git and CI/CD pipelines
Ability to read and understand code in order to design effective tests
Immediate Availability:
Ability to start in early January 2025 and commit to focused 3-month engagement
Availability for full-time contract work during project duration
Preferred Qualifications
Direct experience with http://Temporal.io (workflows, activities, workers)
Experience with containerized workloads and Docker/ECS
Prior work in fast-paced startup or scale-up environments
Experience with infrastructure-as-code (Terraform, CloudFormation)
Background in Site Reliability Engineering (SRE) or DevOps practices
Familiarity with data processing pipelines and analytics systems
Previous contract/consulting experience with rapid knowledge acquisition
Experience with graph-based workflow systems or DAG execution engines
Knowledge of AWS service limits and optimization strategies
Essential Soft Skills
Self-Direction and Initiative:
Ability to operate independently in an ambiguous, fast-moving environment with minimal
documentation
Proactive problem-solving mindset; doesn't wait for perfect information before taking action
Comfortable making pragmatic decisions quickly in a time-constrained project
Communication and Collaboration:
Exceptional communication skills for extracting knowledge through conversations with
existing team members
Ability to translate technical findings into clear, actionable recommendations for diverse
audiences• Comfortable asking clarifying questions and challenging assumptions respectfully
Strong written communication for documentation and status updates
Adaptability and Learning Agility:
Quick learner who can rapidly understand complex, poorly documented systems
Flexible and comfortable with changing priorities in a 15-person team that's doubling in size
Thrives in fast-paced environments with aggressive timelines
Pragmatism and Results Orientation:
Focused on delivering practical, actionable outcomes within tight timeframes
Understands the balance between thoroughness and speed in a 2-month engagement
Comfortable with "good enough" when perfect isn't achievable within constraints
Stakeholder Management:
Skilled at managing expectations with technical leadership about realistic timelines and
trade-offs
Diplomatic when delivering difficult news about performance limitations or bottlenecks
Collaborative approach when working with CS and Product on NFR definition
Key Challenges in This Role
Rapid Knowledge Acquisition with Limited Documentation
◦ The existing system lacks comprehensive documentation, requiring you to quickly
build understanding through code review, system exploration, and frequent
discussions with the development team
◦ Success requires comfort with ambiguity and strong investigative skills
Aggressive Timeline with High Impact
◦ A 3-month timeline to design tests, execute comprehensive load testing, identify
bottlenecks, and deliver actionable recommendations is extremely tight
◦ Must balance thoroughness with pragmatism; prioritize ruthlessly to ensure critical
areas are covered
Complex Distributed System with Multiple Integration Points
◦ The system involves multiple layers (FastAPI, Temporal, AWS services) with
complex inter-component communication patterns (graph → node workflows)◦ Must understand the entire stack sufficiently to design realistic, comprehensive load
tests that expose real-world bottlenecks
- Department
- Software
- Locations
- London, United Kingdom
- Remote status
- Temporarily Remote
Already working at Monolith AI?
Let’s recruit together and find your next colleague.