PURPOSE
We are seeking a skilled and motivated Data Engineer to join our dynamic team.
As a Data Engineer, you will be responsible for designing, implementing, and maintaining our data infrastructure and pipelines. You will collaborate closely with our data scientists, analysts, and software engineers to ensure efficient and reliable data flows throughout the organization.
The ideal candidate has a strong background in data engineering, excellent problem-solving skills, and a passion for working with large datasets.
OBJECTIVES (main duties and responsibilities)
- Design, develop, and maintain scalable and efficient data pipelines and ETL processes to ingest, transform, and load data from various sources.
- Collaborate with cross-functional teams to understand data requirements and translate them into technical solutions.
- Optimize data infrastructure, including data storage, data retrieval, and data processing for enhanced performance and scalability.
- Implement data quality and data governance processes to ensure accuracy, consistency, and integrity of data.
- Monitor and troubleshoot data pipelines to identify and resolve issues in a timely manner.
- Perform data profiling and analysis to identify data quality issues and propose improvements.
- Collaborate with data scientists and analysts to provide them with the necessary data sets for analysis and reporting.
- Stay up-to-date with emerging technologies and trends in data engineering and recommend new tools and frameworks to improve data infrastructure.
ROLE REQUIREMENTS
Formal Qualifications
- Bachelor's degree in Computer Science, Data Science, Information Systems, or a related field. Master's degree is a plus.
- Proven experience as a Data Engineer or similar role, with a strong understanding of data modelling, data warehousing, and ETL processes.
- Proficient in SQL and experience working with relational databases (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra).
- Strong programming skills in at least one scripting language (e.g., Python, Ruby) and experience with data manipulation and transformation libraries (e.g., Pandas, PySpark).
- Hands-on experience with cloud-based data technologies and services, such as Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure.
- Familiarity with data pipeline orchestration tools (e.g., Apache Airflow, Luigi) and workflow management systems.
- Solid understanding of distributed computing and big data processing frameworks (e.g., Hadoop, Spark).
- Experience with version control systems (e.g., Git) and agile software development methodologies.
- Excellent problem-solving skills and the ability to work independently and collaboratively in a fast-paced environment.
- Strong communication skills and the ability to effectively present complex technical concepts to non-technical stakeholders.
Advanced degree in Computer Science, Data Science, or a related field. - Experience with real-time data streaming technologies (e.g., Apache Kafka, Apache Flink).
- Knowledge of containerisation technologies and orchestration tools (e.g., Docker, Kubernetes).
- Familiarity with machine learning concepts and frameworks (e.g., TensorFlow, scikit-learn).
- Experience with data visualization tools (e.g., Tableau, Power BI) and dashboard development.
Knowledge, Skills & Experience
Knowledge:
- Data Warehousing: Understanding of data storage solutions such as SQL and NoSQL databases (e.g., PostgreSQL, MongoDB).
- ETL Processes: Knowledge of Extract, Transform, Load (ETL) tools and techniques to move and prepare data for analysis (e.g., Apache Airflow, Talend).
- Programming Languages: Proficiency in languages like Python, Java, or Scala for data manipulation and pipeline creation.
- Cloud Platforms: Experience with cloud computing services (e.g., AWS, Google Cloud Platform, Azure), especially cloud storage and processing services like AWS S3, Redshift, or Google BigQuery.
- Big Data Tools: Understanding of big data ecosystems like Hadoop, Spark, Kafka, or Hive.
- APIs and Data Integration: Experience integrating data from various sources (REST APIs, file systems, streaming data).
- Data Modeling: Familiarity with database schema design, normalisation, and denormalisation.
- Data Governance & Security: Knowledge of data privacy laws (GDPR, CCPA), encryption standards, and access control policies.
Competencies:
- Data Pipeline Development: Ability to design and build scalable data pipelines that collect, process, and transform large data sets.
- Problem-Solving Skills: Competency in troubleshooting and optimising existing data infrastructure and workflows.
- Collaboration: Working closely with data scientists, analysts, software developers and stakeholders to understand data needs and support the business.
- Data Management: Competence in ensuring data integrity, quality, and availability across systems.
- Scalability & Performance Optimisation: Ability to build systems that can handle increasing data loads efficiently.
- Automation: Expertise in automating routine data tasks to improve efficiency and accuracy.
- Documentation: Strong ability to document workflows, code, and data architecture for ease of understanding and future development.
Skills:
- SQL & Query Optimisation: Advanced proficiency in writing efficient SQL queries for data extraction and performance tuning.
- Coding: Expertise in programming for building and automating data workflows (e.g., Python, Java, Scala).
- Distributed Computing: Familiarity with parallel data processing techniques and distributed computing frameworks.
- Version Control: Skills in using version control systems like Git for managing code changes.
- Data Visualisation Tools: Experience with tools like Tableau, Power BI, or Looker for basic reporting and visualisation, though not necessarily a primary responsibility.
- Testing & Debugging: Ability to write unit tests for code and debug issues in data pipelines.
- Agile Methodologies: Familiarity with agile practices such as Scrum or Kanban, ensuring iterative development and delivery of data solutions.
Job Related Experience
Intermediate Data Engineer: 3 to 5 years of experience working in data engineering or a related field. This includes experience in designing and maintaining data pipelines, working with cloud platforms, and handling various data storage solutions.
Senior Data Engineer: 5 to 8+ years of experience. A senior data engineer is expected to have deep expertise in advanced data processing techniques, big data frameworks, cloud infrastructure, and system optimisation. Senior engineers lead projects, mentor junior engineers, and drive architectural decisions.
Languages
English
Afrikaans (Beneficial)
CULTURE
We feel that culture and team fit is incredibly important. We are of the opinion that you can upskill and learn new things however you cannot necessarily learn culture. We like to hire lekker at AgrigateOne and like to find people with similar interests, vision, mission and values.
COMPANY OVERVIEW
Visit our About Us page to discover more about our culture, work ethic, and vision.