Senior Infrastructure Engineer (SRE)

ExecutivePlacements.com

Recruiter

A 1L Realization (Pty) Ltd

Job Ref

JHB001004/Tshid

Date posted

Thursday, July 2, 2026

Location

Johannesburg, South Africa

SUMMARY

Our client in the telecoms sector is looking for a Senior Infrastructure Engineer (SRE) on a contract duration of 6 months.

Role Purpose

The Senior Infrastructure Engineer (Site Reliability Engineer – SRE) is responsible for ensuring the reliability, availability, performance and scalability of enterprise data, analytics and AI platforms. The role focuses on observability, operational excellence and disaster recovery readiness across cloud-based infrastructure supporting DataCo’s data products, APIs, ML systems and digital platforms.

POSITION INFO

Key Responsibilities Design, implement and maintain highly available and resilient infrastructure for data, analytics and AI platforms. Define and drive site reliability engineering (SRE) practices including service-level objectives (SLOs), service-level indicators (SLIs) and service-level agreements (SLAs). Implement observability frameworks including monitoring, logging, tracing and alerting across all platforms and services. Ensure proactive detection, diagnosis and resolution of infrastructure and application reliability issues. Lead disaster recovery (DR) planning, testing and readiness activities across cloud and on-prem environments. Automate infrastructure health checks, failover processes and operational runbooks. Collaborate with Cloud Engineers, Platform Engineers and Solution Architects to ensure resilient architecture design. Support capacity planning, performance tuning and scalability improvements across systems. Manage incident response, root cause analysis (RCA) and post-incident reviews to drive continuous improvement. Ensure infrastructure complies with security, governance and enterprise architecture standards. Drive reliability engineering best practices across DevOps, Data Engineering and Platform teams. Support CI/CD and deployment reliability in collaboration with Platform Engineering teams. Qualifications & Experience Bachelor's degree in Computer Science, Information Technology, Engineering or a related discipline. 6-10 years' experience in infrastructure engineering, DevOps or Site Reliability Engineering roles. Strong experience with cloud platforms (Azure, AWS or Google Cloud Platform). Experience implementing observability stacks (e.g. Prometheus, Grafana, ELK/EFK, Azure Monitor or equivalent). Strong understanding of distributed systems, high availability and fault tolerance. Experience with automation and scripting (Python, Bash, PowerShell or similar). Experience with container orchestration technologies such as Kubernetes is advantageous. Strong experience in incident management, root cause analysis and production support. Knowledge of disaster recovery strategies, backup systems and business continuity planning. SRE or cloud certifications (e.g. Google SRE, Azure Administrator/Architect, AWS SysOps) are advantageous. Key Competencies Site Reliability Engineering (SRE) principles Infrastructure reliability and resilience Observability and monitoring Incident management and root cause analysis Disaster recovery and business continuity Cloud infrastructure engineering Automation and scripting Performance and capacity management CI/CD reliability support Cross-functional collaboration