
Cisco is a technology company focused on innovation, revolutionizing how enterprises utilize AI. The role of an AI Infrastructure Engineer includes designing and implementing high-performance and reliable AI systems that support workloads across Cisco’s ecosystem. This position plays a crucial role in enhancing AI efficiency and reliability while promoting collaboration to create groundbreaking solutions that redefine enterprise operations and uphold Cisco’s leadership in AI infrastructure.
Location: Bangalore, India
Technology Interest: Software Development
Area of Interest: Engineer - Software
Job Type: Professional
Job ID: 1445534
Apply: Click Here
As an AI Infrastructure Engineer at Cisco, you will:
Design and develop node-level infrastructure components to support high-performance AI workloads.
Benchmark, analyze, and optimize the performance of AI infrastructure, including CUDA kernels and GPU memory management.
Minimize downtime through seamless configuration and upgrade architecture for software components.
Manage the installation and deployment of AI infrastructure on Kubernetes clusters, utilizing CRDs and operators.
Develop and deploy efficient telemetry collection systems for nodes and hardware components that do not impact workload performance.
Work with fundamental concepts of distributed systems to ensure scalability, resilience, and reliability.
Collaborate across teams and time zones to shape the overall direction of AI infrastructure development and achieve shared goals.
Experience with programming languages such as Rust, C/C++, Golang, Python, or eBPF.
Familiarity with Linux operating systems at both the user space and kernel level.
Experience with Linux user space development, including packaging, logging, telemetry, and lifecycle management of processes.
Familiarity with Kubernetes (K8s) and related technology, such as custom resource definitions (CRDs).
Ability to debug and problem solve at a system level for complex issues.
A Bachelor's degree plus 5 years of relevant engineering experience.
Experience with the Linux kernel and device drivers is a plus.
Experience with GPU programming and optimization, including CUDA and UCX, is a plus.
Experience with high-speed data transfer technologies such as RDMA.
Experience with the use of Nvidia GPU operators, Nvidia container toolkit, and Nsight, CUPTI.
Experience with Nvidia MIG and MPS concepts for managing GPU consumption.
Cisco is a worldwide technology leader that is reinventing how enterprises leverage AI. We act like a start-up with our innovation team, building enterprise-level, cutting-edge AI infrastructure and software solutions, and enabling them to collaborate on ideas and prototype new solutions at a high pace. Our culture is built on diversity, continuous education, and empowering our employees to create industry-defining breakthroughs.