@CommerceIQ, you will:
- Design, implement, and maintain robust and scalable data pipelines that support machine learning applications and real-time decision-making systems.
- Work closely with ML engineers, analysts, and product teams to understand data needs and translate them into efficient data engineering solutions.
- Build and maintain workflows using tools like Apache Airflow, ensuring the reliability and timeliness of data availability across the platform.
- Develop ETL/ELT pipelines using PySpark and Python, and optimize them for performance and cost at scale in a production environment.
- Own and manage critical parts of the data infrastructure, ensuring high availability, consistency, and security of large-scale distributed data processing systems.
- Proactively monitor, troubleshoot, and enhance data workflows to ensure quality, reliability, and performance SLAs are consistently met.
- Collaborate in code reviews, technical design discussions, and mentoring junior data engineers within the team.
Experience:
4–6 years of hands-on experience designing, building, and deploying large-scale data processing pipelines in production environments.
Skillset:
- Proficiency in Python is a must, with strong software engineering fundamentals and experience writing clean, maintainable code.
- Extensive hands-on experience with PySpark and distributed data processing frameworks.
- Production experience with Apache Airflow or similar workflow orchestration tools.
- Solid understanding of data modeling, performance tuning, and optimization for large datasets.
- Proven experience working with cloud-based data infrastructure (e.g., AWS, GCP, or Azure) is a strong plus.
- Experience supporting ML pipelines or working in ML-driven environments is an advantage.
- Strong sense of ownership, attention to detail, and a passion for building high-quality data solutions that deliver business value.