Senior Principal ML Engineer

Location
Berlin
Contract
Full time
Job Category

ABOUT THE TEAM

Zalando Research is at the forefront of driving innovation in fashion e-commerce. We are a dynamic and diverse group of scientists and ML engineers dedicated to solving complex challenges through cutting-edge machine learning and AI. Our work directly impacts the experience of millions of Zalando customers and empowers internal teams with state-of-the-art tools and capabilities. We foster a collaborative environment where ambitious research ideas are transformed into impactful solutions. As a Senior Principal MLOps Engineer, you will play a pivotal role in supercharging the productivity and innovation potential of our applied scientists by architecting and delivering world-class MLOps infrastructure. You will be the most senior engineer in this area. As we raise the bar in ML research, we are evolving our infrastructure to eliminate friction and empower our scientists to focus on pushing forward the science—by making MLOps seamless, reproducible, and future-proof.

WHERE YOUR EXPERTISE IS NEEDED

  • Ensure Persistent, Secure and Reproducible R&D Environments: Tackle bottlenecks and improve scalability, resilience, and cost-effectiveness in distributed training workloads across our research teams. Guarantee scientists can resume work across sessions and share exact research setups, enabling robust experiment tracking and ease of collaboration. 

  • Curate the R&D ML Stack: Evaluate, select, and integrate the best-in-class technologies for our end-to-end R&D ML stack, ensuring our scientists have access to the most powerful tools all while hardening the security of our cloud setup.

  • Enable Advanced Visualization: Implement and manage streamlined setup processes for 3D GPU-backed remote desktops in the cloud with persistent storage and seamless RDP/VNC experiences, providing scientists with powerful interactive research environments backed by the latest GPUs.

  • Innovate with LLMs: Stay at the cutting edge of Large Language Model (LLM) advancements and spearhead their integration into the Applied Scientists' UX.

WHAT WE ARE LOOKING FOR

  • Proven MLOps Leadership: Extensive experience (6+ years) in designing, building, and maintaining scalable, reliable, and performant MLOps infrastructure, particularly on AWS with a strong focus on GPU-accelerated compute clusters.

  • Passion for Empowering Scientists: Always looking for ways to save users’ time, eliminate skill barriers, and amplify scientific impact.

  • HPC & GPU Optimization Expert: Deep understanding of HPC architectures, job scheduling, GPU utilization, and cost optimization strategies in a cloud environment.

  • Containerization, Orchestration & Technology Expert: Strong hands-on experience with Docker, EC2, AMI(s), EFS, Lustre, S3, JupyterHub, SQL, Superset, Databricks, SageMaker, Slurm, Ray, Kubeflow, Kubernetes (EKS), Nix, Devbox, and other containerization, environment isolation and orchestration technologies for ML workloads.

  • Infrastructure as Code (IaC) and automation first mindset: Proficiency with IaC tools like CloudFormation, CRDs, Terraform to automate infrastructure provisioning and management along with strong skills in CI/CD.

  • Champion of Reproducibility: A passion for building systems that ensure experimental reproducibility, environment consistency, and end-to-end automation of ML workflows. Experience with tools like MLflow, Weights & Biases, or similar for tracking, sharing, and deployment. You’re able to provide both ephemeral and persisted ML environments depending on the use case

  • Excellent Communicator & Collaborator: Ability to articulate complex technical concepts clearly to diverse audiences and work effectively with research scientists, engineers, heads, directors and product managers to understand their needs and drive solutions.

  • Able to understand ML-related scientific challenges and translate them into ergonomic, reliable MLOps solutions for diverse user groups.

  • Problem Solver & Strategic Thinker: A proactive approach to identifying pain points, devising innovative solutions, and thinking strategically about the long-term evolution of the MLOps landscape at Zalando Research.

PERKS AT WORK

Culture of trust, empowerment and constructive feedback, open source commitment, meetups, game nights, 70+ internal technical and fun guilds, knowledge sharing through tech talks, internal tech academy and blogs, product demos, parties & events.

Competitive salary, employee share shop, 40% Zalando shopping discount, discounts from external partners, centrally located offices, public transport discounts, municipality services, great IT equipment, flexible working times, additional holidays and volunteering time off, free beverages and fruits, diverse sports and health offerings.

Extensive onboarding, mentoring and personal development opportunities and an international team of experts.

Relocation assistance for internationals, PME family service and parent & child rooms* (*available in selected locations)

We celebrate diversity and are committed to building teams that represent a variety of backgrounds, perspectives and skills. All employment is decided on the basis of qualifications, merit and business need. 

Recruiter

Sohaib Rubnawaz

sohaib.rubnawaz.external@zalando.de

Bitte beachten, dass alle Bewerbungen auf dieser Seite über das Online-Formular erfolgen müssen – wir akzeptieren keine Bewerbungen per E-Mail. Nach der Prüfung werden unsere Recruiter*innen über eine offizielle Zalando E-Mail-Adresse (@zalando.de) Kontakt aufnehmen.

In einigen Fällen arbeiten wir auch mit einer Auswahl von Headhunter*innen und Agenturen zusammen, um bestimmte Positionen zu besetzen. Bitte beachte, dass weder Zalando noch unsere Rekrutierungspartner*innen irgendeine Art von Bezahlung verlangen, um sich für eine Stelle zu bewerben oder an einem Vorstellungsgespräch teilzunehmen. 

Wenn du Fragen zu unserem Rekrutierungsprozess hast, wirf bitte einen Blick auf unsere FAQ-Seite.

Über Zalando

Es ist die perfekte Zeit, sich Zalando auf unserer Reise anzuschließen, das führende E-Commerce-Ökosystem für den europäischen Mode- und Lifestyle-Markt aufzubauen. Hilf uns, rund 50 Millionen aktiven Kund*innen in 25 Märkten ein inspirierendes und qualitätsorientiertes Einkaufserlebnis für Mode- und Lifestyle-Produkte zahlreicher Marken aus einer Hand zu bieten. Oder sei Teil unserer Zalando Logistik-, Software- und Service-Infrastruktur, um Marken und Einzelhändler bei ihren E-Commerce-Transaktionen in ganz Europa zu unterstützen – sowohl auf als auch außerhalb der Zalando Plattform. Komm zu uns, um mit diesem Ökosystem einen positiven Wandel in der Mode- und Lifestylebranche zu bewirken.

Erfahre mehr über unsere Kultur