|
Position
Objectives:
|
Design, develop, and operationalize scalable data
pipelines and platforms to support advanced analytics, machine learning, and
AI-driven applications. Focus on structured, unstructured, and streaming
data, enabling seamless integration with hybrid AI systems, including Agentic
AI and Retrieval-Augmented Generation (RAG).
|
|
Job Description &
Responsibilities:
|
Must-Have Skills:
- Design and implement robust ETL/ELT pipelines for structured,
semi-structured, and unstructured data.
- Strong in SQL,
NoSQL, and data warehousing/lakehouse design
- Manage ingestion, transformation, and integration of sensor data,
streaming data, big data, and time-series data from diverse sources.
- Proficient
with Apache Spark, Kafka, Airflow, and Python (PySpark/Pandas)
- Ensure data quality, governance, lineage, and observability
across all pipelines.
- Work with large-scale distributed systems (e.g., Spark, Hadoop,
Kafka) for batch, real-time, and streaming data processing.
- Collaborate with ML/AI teams to provision high-quality datasets
for training, evaluation, and deployment.
·
Experience with
Docker, Kubernetes, and cloud data platforms (AWS/GCP/Azure)
Desired Skills:
·
Knowledge of
Agentic AI frameworks (LangChain, LangGraph)
·
Understanding of
Responsible AI principles and model explainability (SHAP, LIME).
·
Experience in
OpenShift AI
Soft Skills:
·
Communication:
Explaining complex data flows, presenting insights, writing documentation.
·
Collaboration &
Teamwork: Work with Data, AI/ML, MLOps, and Application teams.
·
Problem-Solving:
Debugging data issues, optimizing pipelines, and proposing scalable
solutions.
·
Adaptability:
Working with evolving tools, frameworks, and emerging data technologies.
·
Time Management:
Delivering pipelines and integrations on schedule.
Languages:
·
English – mandatory
·
Arabic - Welcomed
|
|
Qualifications
& Experience:
|
- 5+ years in data engineering, data architecture, or related roles
- Bachelor’s or Master’s in Computer Science, Data Engineering,
Software Engineering, or related field
- Strong coding in Python, SQL. Experience with Spark, Hadoop, or
Kafka
- Proficiency in building and managing relational and NoSQL
databases (PostgreSQL, MongoDB, Cassandra, etc.)
- Experience with cloud-native data services (AWS/GCP/Azure) and
containerized platforms (Docker, Kubernetes)
- Familiar with data observability, governance, and reproducibility
practices
- Knowledge of OpenShift AI and data pipeline integration with ML
workflows is a plus
Relevant Certifications or Workshops:
·
Data Engineering, Cloud Data Platforms
(AWS/GCP/Azure), Big Data, or Streaming Technologies
|