Data Scientist
Imizizi
Reference: JHB001446-NS-1
ESSENTIAL SKILLS
- Proven experience designing and building agentic system architectures using Amazon Bedrock AgentCore and
- agent frameworks (e.g., LangChain, LangGraph, Strands Agents).
- Strong expertise in orchestrating multi-step reasoning, tool invocation, state management, and workflow
- automation for AI agents.
- Deep hands-on knowledge of training and deploying models with PyTorch and TensorFlow.
- Experience defining model strategy, including architecture selection, fine-tuning approaches, inference
- patterns, and cost/performance trade-offs.
- Containerization and orchestration skills: Docker and Kubernetes for scalable, fault-tolerant ML/GenAI
- deployments.
- Solid understanding of networking for ML workloads, including VPC design, ingress/egress, private and
- internet-facing communication patterns, and low-latency design.
- Systems-level performance engineering: selecting CPU/GPU/accelerator hardware, plus experience with load
- testing, stress testing, and capacity planning for ML systems.
- MLOps and GenAI Ops experience: CI/CD for models, model versioning, observability, logging, monitoring, and incident response practices.
- Strong software engineering skills in Python and familiarity with building robust, production-ready APIs and back-end services.
- Experience integrating foundation models with Retrieval-Augmented Generation (RAG) pipelines, tool use, and agentic workflows for enterprise use cases.
ADVANTAGEOUS SKILLS
- Prior experience working with Amazon Bedrock and other cloud-managed foundation model services.
- Familiarity with LangChain extensions, LangGraph, Strands Agents or similar orchestration toolkits at scale.
- Experience with infrastructure-as-code tools (e.g., Terraform, Terragrunt) for reproducible cloud infrastructure.
- Knowledge of serverless components (Lambda, Step Functions, EventBridge) for orchestration and eventdriven
- workflows.
- Background in secure cloud architectures, IAM best practices, and security hardening for AI platforms.
- Experience with data engineering and building reliable ETL/data pipelines for model training and feature stores.
- Familiarity with observability stacks (Prometheus, Grafana, CloudWatch) and distributed tracing for ML
- services.
- Experience optimizing inference costs through batching, quantization, and model distillation techniques.
- Prior work with enterprise customers in regulated industries (e.g., automotive, pharma, finance) and
- understanding of compliance considerations.
- Knowledge of hybrid and multi-cloud deployment patterns for AI workloads.
ROLE & RESPONSIBILITIES
- Define and build agentic system architectures that leverage Amazon Bedrock AgentCore and agent frameworks
- to enable multi-step reasoning and automated workflows.
- Lead technical strategy for model selection, fine-tuning, and inference, advising on cost vs. performance tradeoffs.
- Design and implement containerized deployment standards using Docker and Kubernetes to ensure consistent,
- scalable, and fault-tolerant ML operations.
- Architect secure, low-latency networking for model-to-service and service-to-service communication across
- private and public networks.
- Perform systems-level performance engineering: select appropriate compute accelerators, run load and stress
- tests, and conduct capacity planning for production readiness.
- Establish and operate MLOps and GenAI Ops practices, including CI/CD pipelines, model versioning, and deployment automation.
- Implement observability, logging, monitoring, and incident response for production AI systems to ensure operational excellence.
- Own end-to-end system design for AI workloads: data pipelines, model training, inference, orchestration, and lifecycle management.
- Integrate foundation models into enterprise RAG and tool-use pipelines, enabling complex, real-world use cases.
- Provide technical leadership and mentorship to engineers and stakeholders on architecture, best practices, and operational standards.
QUALIFICATIONS/EXPERIENCE
- Appropriate academic qualification such as Computer Science, Engineering or Statistics
- Demonstrated track record delivering large-scale AI solutions for enterprise customers, including end-to-end
- ownership of architecture, operations, and stakeholder engagement
Submit your CV to: ***email_hidden*** and Subject line