Data Engineer (Senior
Imizizi
Reference: JHB001461-ZN-1
ESSENTIAL SKILLS
- Strong experience with Python (Python 3.x) and PySpark for developing data processing jobs.
- At least 3 years’ experience with AWS services commonly used by data engineers, such as Athena, Glue, Lambda, S3 and ECS.
- Hands-on experience with NoSQL databases such as DynamoDB and relational databases (Oracle/PostgreSQL) including strong Oracle SQL skills.
- Experience with Oracle Cloud Infrastructure (OCI) services and tooling for databases, storage, and data processing.
- Expertise in data formats and schema design, including Parquet, AVRO, JSON, XML and CSV, and technical data modelling (“not drag and drop”).
- ETL and data pipeline development experience, including building pipelines with AWS Glue or similar platforms.
- Experience with containerization and orchestration technologies such as Docker (Kubernetes/OpenShift advantageous).
- Proficiency with scripting for automation (Bash, PowerShell) and familiarity with Linux/Unix environments.
- Experience with data quality tooling and validation (e.g., Great Expectations) and performing thorough data testing and validation.
- Familiarity with cloud infrastructure as code and DevOps tools such as Terraform, CloudFormation, CI/CD pipelines, Git and Jenkins.
ADVANTAGEOUS SKILLS
- Knowledge of Kafka or other streaming technologies and AWS Kinesis for real-time data ingestion.
- Experience with AWS Redshift, EMR and other analytics/warehouse technologies.
- Familiarity with Cloud Data Hub (CDH) or similar organizational cloud data blueprints.
- Java / JEE experience and understanding of Java application servers.
- Experience with monitoring and observability tools such as CloudWatch and Grafana.
- AWS solution architecture experience and certifications (e.g., AWS Certified Cloud Practitioner) are advantageous.
- Familiarity with REST APIs and building integrations with external systems.
- Experience with schema design for BI and data warehousing, and preparing specifications for development.
- Experience with MongoDB or other NoSQL stores.
- Familiarity with Agile/Scrum delivery models and working within cross-functional teams.
ROLE & RESPONSIBILITIES
- Design, build and maintain scalable data pipelines and ETL workflows to ingest and transform data for analytics and reporting.
- Implement and optimize data storage solutions including data lakes and data warehouses on cloud platforms.
- Develop PySpark and Python applications for large-scale data processing and transformations.
- Ensure data quality, consistency and integrity through testing, validation and the use of data quality tools.
- Collaborate with stakeholders to translate business requirements into technical specifications and data models.
- Propose and review system and solution designs and evaluate technical alternatives.
- Maintain and operate cloud infrastructure and CI/CD pipelines for data platform components.
- Create and maintain technical documentation, runbooks and artefacts for developed solutions.
- Support production troubleshooting, monitoring and incident management for data services.
- Work closely with BI teams to prepare and optimize data for reporting tools such as Business Objects or Tableau.
- Coach and support fellow engineers, and help improve team capability through knowledge sharing and training.
- Participate in Agile ceremonies and contribute to continuous improvement of delivery processes.
QUALIFICATIONS/EXPERIENCE
- Minimum 3-5 years’ experience as a data engineer with demonstrated hands-on experience in Python, PySpark
- and cloud data services (AWS and/or OCI).
- Relevant IT/Computer Science/Engineering degree or equivalent proven experience; advanced degrees
- advantageous.
- Certifications such as AWS Certified Cloud Practitioner, Oracle Cloud certifications or other relevant cloud/data
- engineering certifications preferred
Submit your CV to: ***email_hidden*** and Subject line