Description & Requirements
We are a technology consulting firm building and operating next-generation AI supercompute infrastructure for the world's most ambitious organizations. As Platform Engineer, you will work hands-on across the full infrastructure stack with a particular focus on the physical and logical networking layer that makes large-scale GPU clusters perform at their theoretical limits.
As a repeatedly awarded NVIDIA Consulting Partner of the Year in EMEA, we hold one of the deepest and most recognized NVIDIA partnerships in the region. This gives our engineers privileged access to adoption programmes and NVIDIA's engineering teams.
You will work with technology and at a scale that most engineers won't encounter for years.
This is a role for someone at a mid-career stage in platform or infrastructure engineering. You have solid foundations and real hands-on experience, and you are ready to level up by working on problems of genuine complexity and scale. You know enough to know what you don't know yet, and you are hungry to close that gap fast.
What We Expect
- 5–8 years of hands-on experience in infrastructure, networking, or systems engineering
- Solid understanding of networking fundamentals: OSI model, switching and routing (BGP, OSPF), VLANs, MTU, and traffic engineering
- Working knowledge of high-performance networking technologies: InfiniBand, RDMA, RoCE, or equivalent HPC interconnects
- Familiarity with Linux networking: interfaces, bridges, bonding, namespaces, tc/qdisc, and kernel network tuning
- Basic hands-on experience with Kubernetes or Slurm: enough to navigate cluster operations, understand pod scheduling, and troubleshoot node-level issues
- Experience with at least one monitoring stack: Prometheus, Grafana, Zabbix, or similar
- Experience with network automation and IaC
- Comfort working directly with physical hardware: servers, switches, cabling and data centre environments
Bonus Experience
- Exposure to NVIDIA networking products: Mellanox/ConnectX NICs, Quantum InfiniBand switches, Spectrum Ethernet switches
- Familiarity with NCCL tuning, collective communication patterns, or distributed training networking requirements
- Hands-on time with DCGM, iperf3, perftest, or ibdiagnet for infrastructure benchmarking and validation
- Exposure to container networking
- Any experience in a consulting or client-facing technical role
- Physical Network Configuration. You will not be pulling cables or mounting servers yourself, but you will own the correctness of it. You will define standards, review configurations, oversee data centre teams and third-party contractors performing physical installation, and be accountable for the outcome — every port, every cable, every label
- Network Fabric Management. Configure and operate InfiniBand (IB subnet manager, QoS policies, adaptive routing) and RoCEv2 fabrics purpose-built for GPU-to-GPU communication and distributed training
- Network Performance Optimisation. Profile and tune network throughput, latency, and congestion for AI workloads; work with NCCL, GPUDirect RDMA, and high-bandwidth interconnects including NVLink and NVSwitch
- Cluster Platform Operations. Support deployment, day-2 operations, and troubleshooting of Kubernetes and Slurm clusters; contribute to OS-level configuration, driver management, and node lifecycle automation
- Monitoring & Observability. Instrument network and cluster health using Prometheus, Grafana, and DCGM Exporter; build dashboards that surface GPU utilisation, link errors, and fabric saturation
- with rigour and clear documentation
Most platform engineers spend years in environments where the network is someone else's problem. Here, it is front and centre because at GPU cluster scale, the network is the performance. You will configure fabrics that move terabits per second between hundreds of GPUs, diagnose issues that block multi-million dollar training runs, and build the intuition for distributed systems that takes most engineers a decade to develop. Backed by our standing as NVIDIA's top EMEA partner, you will learn directly from NVIDIA's hardware and software teams and grow faster than almost any other environment in the industry can offer.
- Flexible working hours,
- Permanent employment or contract,
- Medical and health insurance,
- Multisport and other lifestyle benefits,
- Language courses,
- Friendly coworkers & team spirit,
- Multiple geographies and clients,
- Work for well-known brands,
- Exposure to trailblazing business and technology projects,
- A place in the first line of a digital transformation,
- Everyday opportunities to influence how and where we do our business,
- A development path to fit your needs.
#LI-MB3
