Job Openings

AI Software System Engineer HPC Infrastructure Engineering, AMD

Written By : Srinivas
Reviewed By : Sankha Ghosh

AMD is seeking an AI Software System Engineer for HPC Infrastructure Engineering to design, build, and optimize next-generation high-performance computing and AI systems. The role focuses on GPU cluster management, AI workload automation, and distributed computing innovation, offering engineers a chance to advance performance, automation, and adaptive computing solutions.

Location: Hyderabad, India

Job Type: Full Time

Job ID: 69060

Apply: Click Here

Responsibilities

  • Design, build and support AI-related GPU-intensive HPC Cluster computing capabilities. 

  • Maintain AI-ML services and Applications on the distributed architecture of Tensorflow or PyTorch as well as Inferencing systems built with Large Language Models. 

  • Implement automated cluster capabilities with tools such as Terraform and Ansible, and develop a monitoring framework for our clusters with Prometheus. 

  • Develop a collaborative working relationship with our partners in both North America and Europe in order to meet and understand their specific needs relative to AI infrastructure. 

  • Use design-thinking and AI/ML in the optimization of internal processes and our delivery of services to our customer base.

Requirements

  • Minimum 5 years of experience in Python-based HPC infrastructure engineering and AI application development. 

  • Strong experience with SLURM, Kubernetes, and GPU Clusters. 

  • Expertise with RoCEv2, KVM, Ubuntu, GPU drivers, and 400G Network Interconnects. 

  • Working knowledge of automated tools such as Terraform, Saltstack, and Prometheus. 

  • Exceptional problem-solving, interpersonal skills, and communication skills.

Education

Bachelor's or Master's degree in Computer Science, Artificial Intelligence, or Related Field.

About AMD

AMD is a global leader in Technology that specializes in High Performance Computing, Graphics, and AI solutions. AMD develops cutting-edge products for the Data Center, Gaming and Professional Markets through the design and development of advanced Processors, GPUs, and Adaptive Processing solutions. With a strong commitment to innovation and efficiency, the company continues to be a pioneer in the areas of AI HPC infrastructure and Digital Transformation.

Crypto Custody Bill 2025: Australia Brings Digital Assets Under Full Financial Regulation

Crypto News Today: Avalanche Secures EU Milestone, SpaceX Moves $105M in Bitcoin, Upbit Recovers Solana Funds

Altseason Signals Flash as BTC Dominance Drops Below 60% – Bitcoin Munari Primed for January SPL Launch

XRP, SHIB, and DOGE Forecast Growth—Yet Ozak AI’s Outlook Dominates Discussions

With Black Friday Live, Digitap ($TAP) Becomes the Best Crypto Presale to Buy – ETH News and TON News