Our ML Platform team builds the foundational intelligence layer that powers Zalando’s AI-native experiences. We provide low-latency features, embeddings, and real-time updates that enable applied science and product teams to deliver search, recommendations, personalization, forecasting, and emerging GenAI use cases. Today, we operate Zalando’s central Feature Store and are scaling it into a broader discovery hub for customer, product, and content understanding in real time..
As a Senior Software Engineer (ML Platform), you will play a key role in designing, building, and scaling these core ML infrastructure services. You’ll work hands-on with distributed systems, streaming pipelines, and feature platforms, while also mentoring peers and contributing to engineering best practices across the team.
INCLUSIVE BY DESIGN
At Zalando, our vision is to be inclusive by design. And this vision starts with our hiring - we do not discriminate on the basis of gender identity, sexual orientation, personal expression, ethnicity, religious belief, or disability status. You are welcome to leave out your picture, age, or marital status from your application. We only assess candidates on their qualifications and merit.
We want to provide you with a great candidate experience. Feel free to inform us of any accommodations you may need, so we can best support you throughout the hiring process.
- do.BETTER - our diversity & inclusion strategy: https://corporate.zalando.com/en/our-impact/dobetter-our-diversity-and-inclusion-strategy
- *Our employee resource groups: https://corporate.zalando.com/en/our-impact/our-employee-resource-groups
WHAT WE’D LOVE YOU TO DO (AND LOVE DOING)
- ML Subject Matter Expertise: Own the design and implementation of golden paths for building scalable, real-time features and embedding systems. Bring strong technical judgment to ensure our foundations are reliable and reusable.
- Platform reliability: Deliver and maintain SLOs for feature freshness, data quality, and online/offline consistency; implement monitoring and safe deployments practices.
- Embed security by design: Implement identity and access management, secrets management, network isolation, and data governance built in from the start to ensure compliance and trustworthiness by default..
- Enable developer productivity: Drive automation and self-service (IaC, GitOps, CI/CD), Deliver documentation and onboarding assets that reduce time-to-first-success for applied scientists and engineers.
- Provide technical guidance: Act as a go-to engineer for complex ML infrastructure challenges, mentor junior colleagues, and raise the engineering bar through reviews, pairing, and knowledge sharing.
- Contribute to strategic decisions: Take ownership of technical design decisions within the team and bring informed input to long-term platform strategy discussions with product and senior engineering leadership.
- Grow the team Play an active role in hiring, onboarding, and mentoring engineers, helping to build a strong technical culture around ML infrastructure.
WE’D LOVE TO MEET YOU IF YOU HAVE
- Strong ML/Data platform experience 4+ years of experience building and operating ML Infrastructure or large-scale data system on cloud platform (AWS/EKS or equivalent)with a strong track record on AWS (EKS) (or equivalent cloud). Experience with mission-critical systems serving multiple teams.
- Feature platform or data transformation expertise: Hands-on experience with data/feature engineering pipelines schema evolution, and ensuring online/offline consistency. Familiarity with feature stores (e.g., Feast, Hopsworks, SageMaker Feature Store) or a strong willingness to specialize.
- Distributed systems builder: Skilled in container (Docker), orchestration (Kubernetes), and streaming/batch processing technologies (e.g., Kafka/Kinesis, Spark/Flink).
- Model serving and low-latency systems: Practical experience with production serving stacks (e.g., KServe/Triton/TorchServe/custom), request shaping, caching, and traffic management.
- Observability and reliability: Track record of building reliable systems with SLOs, monitoring, and deployment safeguards. Comfortable running incident response, capacity planning, and post-incident reviews.
- Security and governance: Proficiency in IAM, secrets management, network boundaries and data protection. Experience embedding compliance and governance into engineering workflows..
- Collaboration and communication: Able to work closely with engineers, applied scientists, and product partners; able to translate requirements into reusable, reliable platform capabilities.
BONUS / NICE TO HAVE
Streaming and batch data for ML: Experience with feature pipelines (batch + streaming) and exposure to data contracts and schema/versioning practices to reduce drift and breakage
Vector search and retrieval: Experience with similarity search and embeddings in production (e.g., FAISS, Milvus, OpenSearch KNN, pgvector) and their integration into ranking/retrieval workflows.
GenAI infrastructure: Experience with LLM serving, prompt management, or RAG architectures at scale.
If you think you have what it takes, we encourage you to apply even if you don't meet every single requirement. You may just be the right candidate for this or other roles.
OUR OFFER
Zalando provides a range of benefits, here’s an overview of what you can expect. Ask your Talent Acquisition Partner to learn more about what we offer. Learn all about Zalando and our values here: https://jobs.zalando.com/en/?gh_src=22377bdd1us
Employee shares program
40% off fashion and beauty products sold and shipped by Zalando, 30% off Zalando Lounge, discounts from external partners
2 paid volunteering days a year
Hybrid working model with 60% remote per week, actual practice is up to each team to best support their collaboration
Work from abroad for up to 30 working days a year
27 days of vacation a year to start
Relocation assistance available (subject to prior agreement)
Family services, including counselling and support
Health and wellbeing options (including Gympass)
Mental health support and coaching available