Data Engineer (Senior

Imizizi

Reference: JHB001461-ZN-1

ESSENTIAL SKILLS

  • Strong experience with Python (Python 3.x) and PySpark for developing data processing jobs.
  • At least 3 years’ experience with AWS services commonly used by data engineers, such as Athena, Glue, Lambda, S3 and ECS.
  • Hands-on experience with NoSQL databases such as DynamoDB and relational databases (Oracle/PostgreSQL) including strong Oracle SQL skills.
  • Experience with Oracle Cloud Infrastructure (OCI) services and tooling for databases, storage, and data processing.
  • Expertise in data formats and schema design, including Parquet, AVRO, JSON, XML and CSV, and technical data modelling (“not drag and drop”).
  • ETL and data pipeline development experience, including building pipelines with AWS Glue or similar platforms.
  • Experience with containerization and orchestration technologies such as Docker (Kubernetes/OpenShift advantageous).
  • Proficiency with scripting for automation (Bash, PowerShell) and familiarity with Linux/Unix environments.
  • Experience with data quality tooling and validation (e.g., Great Expectations) and performing thorough data testing and validation.
  • Familiarity with cloud infrastructure as code and DevOps tools such as Terraform, CloudFormation, CI/CD pipelines, Git and Jenkins.

ADVANTAGEOUS SKILLS

  • Knowledge of Kafka or other streaming technologies and AWS Kinesis for real-time data ingestion.
  • Experience with AWS Redshift, EMR and other analytics/warehouse technologies.
  • Familiarity with Cloud Data Hub (CDH) or similar organizational cloud data blueprints.
  • Java / JEE experience and understanding of Java application servers.
  • Experience with monitoring and observability tools such as CloudWatch and Grafana.
  • AWS solution architecture experience and certifications (e.g., AWS Certified Cloud Practitioner) are advantageous.
  • Familiarity with REST APIs and building integrations with external systems.
  • Experience with schema design for BI and data warehousing, and preparing specifications for development.
  • Experience with MongoDB or other NoSQL stores.
  • Familiarity with Agile/Scrum delivery models and working within cross-functional teams.

ROLE & RESPONSIBILITIES

  • Design, build and maintain scalable data pipelines and ETL workflows to ingest and transform data for analytics and reporting.
  • Implement and optimize data storage solutions including data lakes and data warehouses on cloud platforms.
  • Develop PySpark and Python applications for large-scale data processing and transformations.
  • Ensure data quality, consistency and integrity through testing, validation and the use of data quality tools.
  • Collaborate with stakeholders to translate business requirements into technical specifications and data models.
  • Propose and review system and solution designs and evaluate technical alternatives.
  • Maintain and operate cloud infrastructure and CI/CD pipelines for data platform components.
  • Create and maintain technical documentation, runbooks and artefacts for developed solutions.
  • Support production troubleshooting, monitoring and incident management for data services.
  • Work closely with BI teams to prepare and optimize data for reporting tools such as Business Objects or Tableau.
  • Coach and support fellow engineers, and help improve team capability through knowledge sharing and training.
  • Participate in Agile ceremonies and contribute to continuous improvement of delivery processes.

QUALIFICATIONS/EXPERIENCE

  • Minimum 3-5 years’ experience as a data engineer with demonstrated hands-on experience in Python, PySpark
  • and cloud data services (AWS and/or OCI).
  • Relevant IT/Computer Science/Engineering degree or equivalent proven experience; advanced degrees
  • advantageous.
  • Certifications such as AWS Certified Cloud Practitioner, Oracle Cloud certifications or other relevant cloud/data
  • engineering certifications preferred

Submit your CV to: ***email_hidden*** and Subject line

Role title