Position Objectives:
Design, develop, and operationalise scalable data pipelines and platforms to support advanced analytics, machine learning, and AI-driven applications. Focus on structured, unstructured, and streaming data, enabling seamless integration with hybrid AI systems, including Agentic AI and Retrieval-Augmented Generation (RAG).
Job Description & Responsibilities:
• Design and implement robust ETL/ELT pipelines for structured, semi-structured, and unstructured data.
• Build and optimize data warehouses, data lakes, and lakehouse architectures.
• Manage ingestion, transformation, and integration of sensor data, streaming data, big data, and time-series data from diverse sources.
• Ensure data quality, governance, lineage, and observability across all pipelines.
• Work with large-scale distributed systems (e.g., Spark, Hadoop, Kafka) for batch, real-time, and streaming data processing.
• Collaborate with ML/AI teams to provision high-quality datasets for training, evaluation, and deployment.
• Support modular architectures that integrate with RAG workflows, agentic systems, and AI inference pipelines.
• Communication: Explaining complex data flows, presenting insights, writing documentation.
• Collaboration & Teamwork: Work with Data, AI/ML, MLOps, and Application teams.
• Problem-Solving: Debugging data issues, optimizing pipelines, and proposing scalable solutions.
• Adaptability: Working with evolving tools, frameworks, and emerging data technologies.
• Time Management: Delivering pipelines and integrations on schedule.
Languages:
• English – mandatory
• Arabic - Welcomed
Qualifications & Experience:
• 5+ years in data engineering, data architecture, or related roles
• Bachelor’s or Master’s in Computer Science, Data Engineering, Software Engineering, or related field
• Strong coding in Python, SQL. Experience with Spark, Hadoop, or Kafka
• Proficiency in building and managing relational and NoSQL databases (PostgreSQL, MongoDB, Cassandra, etc.)
• Experience with cloud-native data services (AWS/GCP/Azure) and containerized platforms (Docker, Kubernetes)
• Familiar with data observability, governance, and reproducibility practices
• Knowledge of OpenShift AI and data pipeline integration with ML workflows is a plus
Relevant Certifications or Workshops:
• Data Engineering, Cloud Data Platforms (AWS/GCP/Azure), Big Data, or Streaming Technologies