Consulting Data Engineer

LexisNexis


Date: 12 hours ago
City: Cape Town, Western Cape
Contract type: Full time
About The Team

The Content Engineering teams at LexisNexis Intellectual Property (IP) are driving innovation in how patent data is processed, enriched, and leveraged. These teams comprise Data Engineers, Data Scientists, and Data Analysts, supported by subject matter experts in patent data. Our teams work closely with Databricks to migrate legacy ETL systems to a modern Strategic Data Platform built on Python, PySpark, and Databricks. This platform uses a medallion architecture to ingest, transform, and enrich global patent data. It will serve not only our flagship product PatentSight+ but also many other strategic products within our IP portfolio.

This is an opportunity to join a high-impact initiative still in its formative stages, where you can influence and evolve architecture, tooling, and best practices from the ground up.

About The Role

As the Principal Data Engineer for one of the Data Platform teams you will act as a technical lead in bringing new content and enrichments to the strategic Data Platform at LexisNexis IP. You will be instrumental in executing our data strategy for the Data Platform. Your role will be pivotal in developing and implementing advanced solutions for data integration, quality control, and continuous delivery, driving our data operations to new heights.

This is a senior technical individual contributor and does not have line management responsibilities. You will technically lead a high-performing agile team, guiding them through complex delivery projects, identifying technical needs, and devising innovative solutions. Your expertise will be crucial in embedding best practices and state-of-the-art data engineering tools, ensuring that our workflows are both efficient and scalable.

You will work closely with a range of technical leaders, data scientists, and data analysts across the Data Platform and the wider technology department. With colleagues based in the UK and EU, you will also engage with a diverse range of stakeholders across the UK, Germany, Netherlands, and the USA.

What does success look like? In the first 3 months you will lead and deliver content expansion ETL’s in our key pipeline’s, setting architectural best practices and standards as you go. You will build networks with other teams in Content, especially on the Data Platform.

Responsibilities

  • Architect and lead the development of our patent data ingestion pipeline using Databricks, Python, and PySpark.
  • Mentor and guide a team of data engineers, fostering a collaborative environment that encourages growth and innovation. You will enable and lead technical discussions within the team and with stakeholders.
  • Ensure the pipeline is efficient, scalable, and robust, capable of handling terabytes of data with low latency. Eliminate inefficiencies and teach the techniques to the team.
  • Work closely with the wider cross-functional engineering department, including data scientists, analysts, and product managers, to ensure the pipeline meets business needs.
  • Contribute to the overall data engineering strategy and drive the adoption of best practices in coding, architecture, and deployment.
  • Identify and resolve technical challenges, ensuring the smooth operation of the data ingestion pipeline.
  • Translate strategic business objectives into technical architecture and delivery plans.
  • Contribute to platform-wide standards, tooling, and architecture decisions.

Requirements

  • Expertise in Python and PySpark is essential for you to lead and develop the skills of the team.
  • Expertise in Databricks would be highly desirable and advantageous.
  • Demonstrated ability to design and implement scalable data architectures for both batch and streaming data processing.
  • Proficiency in using cloud platforms such as AWS, Azure, or Google Cloud for data infrastructure management would be beneficial.
  • Prior experience with Patent data, or other complex data sources, is extremely beneficial.
  • Knowledge of data governance practices, including data quality management, metadata management, and data lineage is also beneficial.
  • Exposure to CI/CD for data pipelines using tools like GitHub Actions, Azure DevOps, or Airflow.
  • Proven experience in technically leading and mentoring data engineering teams.

Work in a way that works for you

We promote a healthy work/life balance across the organization. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance and sabbaticals, we will help you meet your immediate responsibilities and your long-term goals.

  • Working flexible hours - flexing the times when you work in the day to help you fit everything in and work when you are the most productive

Working for you

Benefits

We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:

  • Medical Aid
  • Retirement Plan inclusive of Risk Benefits (Disability, Critical Illness, Life Cover & Funeral Cover)
  • Modern family benefits, including adoption and surrogacy
  • Study Leave

About The Business

LexisNexis Legal & Professional provides legal, regulatory, and business information and analytics that help customers increase their productivity, improve decision-making, achieve better outcomes, and advance the rule of law around the world. As a digital pioneer, the company was the first to bring legal and business information online with its Lexis and Nexis services.
Post a CV