Senior QA Engineer (AI Solutions)
Elixirr Digital
Elixirr Digital is a dynamic and innovative global consulting firm, recognized for delivering transformative solutions to our clients across a wide range of industries. As part of our growing AI practice, we are committed to developing enterprise-grade products, accelerators and agentic solutions that drive innovation for our clients and our people. We help leading organizations harness advanced AI technologies — from agentic workflows and retrieval-augmented generation (RAG) to multi-agent orchestration — grounded in their proprietary data and operating within the security and compliance standards their industries demand.
We are looking for a Senior QA Engineer to take ownership of quality for Elixirr’s AI-enabled solutions — from our internal agent platform to the growing portfolio of accelerators we build to deliver faster and grow the business.
Quality on AI systems is a different thing. The fundamental shift is from deterministic assertions to probabilistic scoring: you’re testing for meaning, grounding, accuracy and safety, not exact outputs. Behaviour drifts as models change, and a single regression in a prompt, tool or retrieval step can cascade across an entire agent workflow. We need a QA leader who embraces that reality, automates aggressively, and uses AI itself to accelerate how we surface, reproduce and triage issues.
This is an SDET-shaped role: you’ll shape our test strategy, build the automation and evaluation harnesses that keep our agents honest, and partner with engineering to bake quality into the way we build — shift-left into requirements and design, and shift-right into production observability.
Candidates applying for employment contract kindly note this position is a onsite working opportunity from our locations in Cape Town or Johannesburg.
What you will be doing as a Senior Back End Engineer at Elixirr Digital?
QA Strategy for AI-Enabled Solutions
- Own the end-to-end QA strategy for Elixirr’s agent platform and AI-enabled accelerators — functional, non-functional, behavioural and safety.
- Define what “good” looks like for agent behaviour: accuracy, grounding, tool use, escalation, latency, cost, safety and user experience.
- Shift left: engage at requirements and design so gaps, edge cases and testability issues are caught before code is written.
- Shift right: use production traces, evals and user feedback as part of the regression loop, so quality keeps improving after release.
- Establish deployment gates so AI features ship with evidence, not hope.
Automation & Evaluation
- Build and maintain automated test suites across UI, API, contract, integration and end-to-end layers, running as a first-class part of CI/CD.
- Design LLM and agent evaluation harnesses using a layered approach: automated checks, LLM-as-judge scoring, rubric-based evaluation and targeted human review.
- Maintain golden datasets, regression suites and red-teaming scenarios that evolve with the product.
- Bring LLM quality metrics into the same unified reporting as functional, performance and security results.
- Drive automation-first: if a check can be automated, it should be.
AI-Accelerated QA
- Use AI tooling (code-generation, LLM-based test generation, synthetic data, AI-assisted exploratory testing, self-healing test frameworks) to accelerate coverage and reduce manual toil.
- Practice prompt engineering for QA: constrain AI tools with precise specifications so they produce tests and triage artefacts worth keeping.
- Build or integrate AI-assisted triage — clustering failures, summarizing root causes, drafting repro steps and proposing fixes to engineers.
- Continuously evolve the QA tooling stack as the AI developer ecosystem matures.
Reliability, Performance & Safety
- Run performance, load and chaos testing against agent workflows and backend services.
- Stress-test guardrails, prompt injection defences, tool-use restrictions and data boundaries.
- Cover accessibility, visual regression and security checks (SAST/DAST) as part of the standard test pipeline.
- Partner with security and compliance on safety reviews for client-facing AI features.
Process & Collaboration
- Work day-to-day with backend, front-end, AI and platform engineers — quality is a team sport and you’re the coach.
- Feed defect patterns, evaluation results and production telemetry back into engineering and product priorities.
- Coach engineers on writing better tests, better evals and more observable code — and keep humans in the loop where AI judgement isn’t enough.
Experience
- 5+ years in QA / SDET / Test Engineering, including hands-on ownership of automation frameworks.
- Experience testing modern cloud-native applications (microservices, APIs, event-driven systems) in AWS and/or Azure.
- Exposure to testing AI, ML or agent-based systems — or a clear, demonstrable plan for how you’d go about it.
Technical Expertise
- Strong proficiency in at least one automation language/stack (e.g., Python with pytest, TypeScript with Playwright, Java with JUnit/RestAssured).
- Comfort with API testing, contract testing and service virtualization.
- Familiarity with CI/CD pipelines (GitHub Actions, GitLab CI, Azure DevOps) and running tests as a first-class part of the pipeline.
- Solid understanding of observability tooling — logs, metrics, traces — and how to use it in testing and production monitoring.
AI & Evaluation Skills
- Hands-on use of AI developer tools for test generation, code review and triage.
- Understanding of LLM evaluation concepts: reference-based vs reference-free metrics, LLM-as-judge, rubric-based scoring, regression harnesses and continuous evaluation.
- Awareness of AI-specific failure modes: hallucination, prompt injection, tool misuse, retrieval failures, context window issues and model drift.
Soft Skills
- Relentlessly curious — you want to know why something failed, not just that it did.
- Clear written communication; defect reports and evaluation summaries that engineers actually enjoy reading.
- Self-driven in a distributed, multi-timezone team.
Preferred Qualifications
- Experience with performance testing tools (k6, Locust, JMeter).
- Familiarity with evaluation frameworks (e.g., OpenAI Evals, Ragas, DeepEval, Promptfoo, Langfuse).
- Exposure to security testing, accessibility testing and compliance-driven environments.
Why is Elixirr Digital the right next step for you?
From working with cutting-edge technologies to solving complex challenges for global clients, we make sure your work matters. And while you’re building great things, we’re here to support you.
Compensation & Equity
- Performance bonus
- Employee Stock Options Grant
- Employee Share Purchase Plan (ESPP)
- Competitive compensation
Health & Wellbeing
- Health benefits plan
- Flexible working hours
- Pension plan
Projects & Tools
- Modern equipment
- Big clients and interesting projects
- Cutting-edge technologies
Learning & Growth
- Growth and development opportunities
- Internal LMS & knowledge hubs
We don’t just offer a job - we create space for you to grow, thrive, and be recognized.
Intrigued? Apply now!