Senior Research Engineer, Foundation Model Training Infrastructure, Senior Research Engineer, Foundation Model Training Infrastructure - Santa Clara

Only for registered members Santa Clara, United States

1 day ago

Default job background
$224,000 - $356,500 (USD)
NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the Generalist Embodied Agent Research (GEAR) group. · Our team is leading Project GR00T, NVIDIA's moonshot initiative at bu ...
Job description

NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the Generalist Embodied Agent Research (GEAR) group.

Our team is leading Project GR00T, NVIDIA's moonshot initiative at building foundation models and full-stack technology for humanoid robots.


You will work with an amazing and collaborative research team that consistently produces influential works on multimodal foundation models, large-scale robot learning, embodied AI, and physics simulation.

Our past projects include Eureka, VIMA, Voyager, MineDojo, MimicPlay, Prismer, and more. Your contributions will have a significant impact on our research projects and product roadmaps.

What You Will Be Doing

Design and maintain large-scale distributed training systems to support multi-modal foundation models for robotics.
Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets.
Implement scalable data loaders and preprocessors tailored for multimodal datasets, including videos, text, and sensor data.
Develop robust monitoring and debugging tools to ensure the reliability and performance of training workflows on large GPU clusters.
Collaborate with researchers to integrate cutting-edge model architectures into scalable training pipelines.

What We Need To See

Bachelor's degree in Computer Science, Robotics, Engineering, or a related field;
10+ years of full-time industry experience in large-scale MLOps and AI infrastructure;
Proven experience designing and optimizing distributed training systems with frameworks like PyTorch, JAX, or TensorFlow.
Deep understanding of GPU acceleration, CUDA programming, and cluster management tools like Kubernetes.
Strong programming skills in Python and a high-performance language such as C++ for efficient system development.
Strong experience with large-scale GPU clusters, HPC environments, and job scheduling/orchestration tools (e.g., SLURM, Kubernetes).

Ways To Stand Out From The Crowd

Master's or PhD's degree in Computer Science, Robotics, Engineering, or a related field;
Demonstrated Tech Lead experience, coordinating a team of engineers and driving projects from conception to deployment;
Strong experience at building large-scale LLM and multimodal LLM training infrastructure;
Contributions to popular open-source AI frameworks or research publications in top-tier AI conferences, such as NeurIPS, ICRA, ICLR, CoRL.

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and productive people in the world. Please join us and be part of the forefront of developing general-purpose robots and large-scale foundation models

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 224,000 USD - 356,500 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until January 13, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.

As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

, , JR1992361


Similar jobs

  • Work in company

    AI Training Infrastructure Engineer

    Only for registered members

    We are looking for an experienced Training Infrastructure Engineer to take our infrastructure to the next level. This role is focused on managing the training cluster, implementing distributed training algorithms, data loaders and developer tools for AI researchers. · ...

    San Jose, CA

    1 month ago

  • Work in company

    AI Training Infrastructure Engineer

    Only for registered members

    Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. · ...

    San Jose, CA

    1 week ago

  • Work in company

    AI Training Infrastructure Engineer

    Only for registered members

    · Design, deploy, and maintain Figure's training clusters · Architect and maintain scalable deep learning frameworks for training on massive robot datasets · ...

    San Jose $150,000 - $350,000 (USD)

    1 month ago

  • NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the Generalist Embodied Agent Research (GEAR) group. Our team is leading Project GR00T, NVIDIA's moonshot initiative at buil ...

    Santa Clara, CA

    1 day ago

  • Work in company

    Post-Training Platform Infrastructure Engineer

    Only for registered members

    We are looking for a systems-minded engineer who lives at the intersection of large-scale model inference, distributed systems, and performance optimization. · The Role · We are looking for a systems-minded engineer who lives at the intersection of large-scale model inference, di ...

    San Jose

    1 week ago

  • We are searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the Generalist Embodied Agent Research (GEAR) group. · We will work with an amazing and collaborative research team that consis ...

    Santa Clara $224,000 - $356,500 (USD)

    1 month ago

  • Nvidia is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the Generalist Embodied Agent Research GEAR group Our team is leading Project GR00T NVIDIA s moonshot initiative at building ...

    Santa Clara $224,000 - $356,500 (USD) Full time

    1 month ago

  • We are searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training. · ...

    Santa Clara, CA

    1 month ago

  • NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the Generalist Embodied Agent Research (GEAR) group. · NVIDIA is widely considered to be one of the technology world's most ...

    US, CA, Santa Clara

    1 week ago

  • Work in company

    Software Engineer, AI Training and Infrastructure

    Only for registered members

    We are looking for a Software Engineer to work at the forefront of developing and optimizing the software infrastructure and tools necessary for training cutting-edge AI models. · You will focus on building robust scalable efficient training pipelines frameworks that support enti ...

    San Francisco Bay Area

    1 month ago

  • Work in company

    Tech Lead Manager, ML Training Infrastructure

    Only for registered members

    Nuro is seeking an experienced Technical Lead Manager with deep expertise in quantized training and model compression to join our ML Infrastructure team.In this role, you will drive the adoption of state-of-the-art quantization techniques, enabling training and deployment of high ...

    Mountain View $235,030 - $352,290 (USD)

    1 month ago

  • Our team is dedicated to solving the high-quality training data problem at the scale required to train advanced Foundation Models. · We believe that the advanced model performance fundamentally depends on a data-centric approach to Machine Learning. · Our objective is to engineer ...

    Cupertino $181,100 - $318,400 (USD)

    4 weeks ago

  • +We are seeking a Senior Research Engineer who possesses a deep understanding of distributed systems and a strong intuition for Machine Learning. · +Description · This position operates at the convergence of Software Engineering and Machine Learning Research.+You will work alongs ...

    Cupertino Full time

    1 month ago

  • We are seeking a Senior Research Engineer who possesses a deep understanding of distributed systems and a strong intuition for Machine Learning. · You will join a culture that values engineering craftsmanship, privacy, and rigorous scientific inquiry. · ...

    Cupertino $181,100 - $318,400 (USD)

    1 week ago

  • We are seeking a Senior Research Engineer who possesses a deep understanding of distributed systems and a strong intuition for Machine Learning. · ...

    Cupertino, CA

    1 month ago

  • Work in company

    Member of Technical Staff, AI Training Infrastructure

    Only for registered members

    We're building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry.We've been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovati ...

    San Mateo $175,000 - $220,000 (USD)

    4 weeks ago

  • Work in company

    Member of Technical Staff, Pre-training Data Infrastructure

    Only for registered members

    +Job summary · About xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. · +Design and implement petabyte-scale high-throughput data processing systems that involve both CPU- based processing. · Build run ...

    Palo Alto $180,000 - $440,000 (USD)

    1 month ago

  • Work in company

    AIML Senior Capacity Engineer

    Only for registered members

    We build the infrastructure that powers large-scale ML training and inference workloads. · ...

    Santa Clara Full time

    1 week ago

  • Work in company

    Senior Software Engineer, Generative AI Research

    Only for registered members

    We are now looking for a Senior Software Engineer for Generative AI Research At NVIDIA,p we believe the next generation of AI will be physical AI – systems that perceive reason and act in the real world. · Cosmos enables large-scale AI models for robots autonomous agents and AI s ...

    Santa Clara

    1 month ago

  • Work in company

    Software Engineer, Generative AI Research

    Only for registered members

    NVIDIA is seeking a Senior Software Engineer for Generative AI Research to build infrastructure that enables physical AI at scale. · ...

    Santa Clara $224,000 - $356,500 (USD)

    1 month ago