Senior Manager, Professional Services HPC Deployment - Santa Clara, United States - NVIDIA

NVIDIA Santa Clara, United States

3 weeks ago

Description

NVIDIA is in search of an HPC Deployment Manager to bolster its Professional Services division.

Across academia and industry, NVIDIA's products are driving ground-breaking advancements in deep learning, data analytics, and the optimization of data centers.

Join our team, where we are at the forefront of constructing some of the globe's most expansive and rapid data centers We seek an individual capable of supervising the deployment of cutting-edge InfiniBand and Ethernet technologies with a team comprising AI and HPC experts.

This role demands dynamic interpersonal abilities and a customer-centric approach.
The chosen candidate will engage with clients, collaborators, and internal units to assess, delineate, and complete large-scale AI/HPC initiatives. They will orchestrate the day-to-day operations, guidance, and cultivation of a multi-layered team of HPC service professionals. This entails ensuring the timely delivery of a varied spectrum of AI HPC data center projects.

Furthermore, this role offers an opportunity to thrive within a fast-paced, inventive, and technologically sophisticated atmosphere, emphasizing unparalleled performance and the exploration of an array of novel hardware and software technologies in AI supercomputing.

What you will be doing:

Directs and supervises the service HPC engineering functions in designing, developing, installing, and validating hardware and software for the Customer AI High-Performance Computing (HPC) systems.
Leads, handles, mentors, and builds a very hardworking HPC service engineering team to deliver innovative advances in high-performance computing AI systems.
Responsible for leading our HPC projects' planning, implementation, and performance. Improves the integrity of system services bring-up and related by applying groundbreaking technical and operational knowledge to configure and maintain HPC AI network and server platforms.
Drives HPC team hardware and software deployment, plans, develops, and deploys procedures for system validation.
Lead team activities and drive tests and plans for Customer's HPC AI systems implementations, custom scripts, and testing procedures to ensure operational reliability for the system.
Supports the HPC Engineering team, working with other internal collaborators to develop and run a well-rounded strategy for delivering service quality and continuous service improvement. Supports governance for software engineering through the implementation of standards and quality measures.
Leads team member development, helping them set and achieve goals for their career growth. Develop an inclusive environment that values team member differences, creating a sense of belonging and appreciation. Chips in to a culture of trust and clarity.
Build strong relationships with INVIDIA leaders, customers, partners, and collaborators. Works closely to identify, implement, and support leading NVIDIA's AI solutions engineering, maintaining currency with industry standards and innovations. Provides input around process optimization, department budgeting, and the monitoring and management of resources.
Be the domain authority with customers during planning calls through implementation.

What we need to see:

8 overall years' experience in IT, high-performance computing, or other related field; 3 years of experience in a management or leadership role
Demonstrated expertise in HPC systems design configuration and planning.
Proficiency with low latency/high-bandwidth interconnect infrastructure (Infiniband and Ethernet).
Expertise with HPC system software cluster management/provisioning tools, including job schedulers (Slurm, salt, xCAT).
Proficiency with shared and distributed memory parallelism (OpenMP, MPI, NCCL and HPL) and accelerators (GPUs).
Strong scripting ability (Bash, Perl, Python, etc.) and experience with programming fundamentals.
Expertise with administration, supervising and maintaining secure Linux/Unix operating systems (CentOS, Solaris).
Experience establishing processes for maintaining system performance, managing best-in-class standards, and familiarity with cloud computing and container technologies.
Ability to understand and work with large, sophisticated systems, identify and resolve problems, handle performance, and troubleshoot network issues related to infrastructure.
Expertise with multi-vendor hardware/software management, security, and network/Internet protocols. Strong communication and social skills, with the ability to provide detailed information and high-level summaries to management-level individuals and groups, present the business side of technical topics to non-technical audiences, and develop positive working relationships and strong rapport with team members.
Bachelor's degree in computer science, information systems, or a related field or equivalent experience
Solid knowledge of HPC storage
Exemplary communication and interpersonal skills, with the ability to present the business side of technical topics to non-technical audiences and persuasively and optimally get along with relationships with various stakeholders and diverse individuals and groups

Ways to stand out from the crowd:

InfiniBand experience.
Experience with GPU-focused hardware/software.
Experience with MPI.
Automation tooling background (Ansible, Salt, Puppet, etc.).
Ethernet and Storage technologies such as Lustre or GPFS.

The base salary range is 208,000 USD - 327,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
You will also be eligible for equity and benefits ) . NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.

As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

NVIDIA is a Learning Machine
NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and the metaverse is transforming the world's largest industries and profoundly impacting society.
Learn more about NVIDIA .

Senior Manager, Professional Services HPC Deployment

3 weeks ago

NVIDIA Santa Clara, United States

NVIDIA is in search of an HPC Deployment Manager to bolster its Professional Services division. Across academia and industry, NVIDIA's products are driving ground-breaking advancements in deep learning, data analytics, and the optimization of data centers. Join our team, where we ...
Project Manager

2 weeks ago

Super Micro Computer San Jose, United States

Project Manager · **Date**:May 14, 2024 · **Location**: San Jose, California, United States · **Company**:Super Micro Computer · **Job Req ID: 24383** · **About Supermicro**: · - Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Cent ...
Lead Reliability Engineer

1 week ago

Celestial Services Santa Clara, United States

Job Description: · We are looking for a Lead Reliability Engineer to spearhead reliability efforts specifically tailored for datacenter and high-performance computing (HPC) applications. The ideal candidate will have a strong background in reliability engineering with a focus on ...
Principal Software Architect

2 weeks ago

NVIDIA Santa Clara, United States

We are now looking for a Principal Software Architect for AI and HPC. · At NVIDIA, we are advancing the frontiers of AI capabilities. We seek an expert in high-performance computing and AI to design and develop software resiliency features for training AI models on the world's m ...
Lead Reliability Engineer

4 days ago

Celestial AI Santa Clara, United States

About Celestial AI · As the industry strives to meet the demands of the AI workloads, bottlenecks in data transfers between processors and memory have hindered progress. The Photonic Fabric based Memory Fabric provides an optically scalable solution to the 'Memory Wall' problem, ...
Lead Reliability Engineer

1 week ago

Celestial AI Santa Clara, United States

About Celestial AI · As the industry strives to meet the demands of the AI workloads, bottlenecks in data transfers between processors and memory have hindered progress. The Photonic Fabric based Memory Fabric provides an optically scalable solution to the 'Memory Wall' problem, ...
Principal Software Architect

4 days ago

NVIDIA Santa Clara, United States

We are now looking for a Principal Software Architect for AI and HPC. · At NVIDIA, we are advancing the frontiers of AI capabilities. We seek an expert in high-performance computing and AI to design and develop software resiliency features for training AI models on the world's mo ...
Principal Software Architect

3 days ago

NVIDIA Santa Clara, United States

We are now looking for a Principal Software Architect for AI and HPC. · At NVIDIA, we are advancing the frontiers of AI capabilities. We seek an expert in high-performance computing and AI to design and develop software resiliency features for training AI models on the worlds mo ...
Solutions Architect

3 weeks ago

NVIDIA Santa Clara, United States

Solutions Architect - AI and HPC Cloud page is loaded · Solutions Architect - AI and HPC Cloud · Apply · locations · US, CA, Santa Clara · time type · Full time · posted on · Posted 3 Days Ago · job requisition id · JR · NVIDIA is looking for a Solutions Architect to w ...
Senior Systems Engineer

3 weeks ago

NVIDIA Santa Clara, United States

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. NVIDIA GPUs serve as the visual cortex of modern computers and are at the heart of our products and services. Our work opens up new universes to expl ...
AI Engineer

2 weeks ago

Diverse Lynx Santa Clara, United States

AI+HPC infra requirement · looking for someone with Architectural and design experience also along with experience in handling 1000+ nodes. · Technical/Functional Skills - · Proficiency in RoCEv2, K8s, KVM, Ubuntu, Python, Shell, Go, Rust, GPU drivers, and Cluster interconnect ...
DC Technician 3

2 weeks ago

World Wide Technology Santa Clara, United States

World Wide Technology is looking for a Data Center Engineer. This job is part of WWT's Strategic Resourcing services. The candidate will be supporting a WWT customer and will be employed by one of WWT's preferred partners. · Postion: DC Technician 3 · Number of roles open: 2 · Lo ...
AI Engineer

3 weeks ago

Diverse Lynx Santa Clara, United States

AI+HPC infra requirement · looking for someone with Architectural and design experience also along with experience in handling 1000+ nodes. · Technical/Functional Skills - · Proficiency in RoCEv2, K8s, KVM, Ubuntu, Python, Shell, Go, Rust, GPU drivers, and Cluster interconnect ...
Backend Software Engineer

2 weeks ago

Picarro Santa Clara, United States

Title: Backend Software Engineer · Location: Santa Clara, CA, USA · Job Term: Full-time (hybrid) · The Opportunity: · We are seeking a Backend Software Engineer to help drive development of the next generation of laser-based analytical instrumentation capable of high precisi ...
Backend Software Engineer

1 week ago

Picarro Santa Clara, United States

Position Title: · Backend Software Engineer · Location: · Santa Clara, CA, USA · Job Term: (Full-Time) · The Opportunity: · We are seeking a Backend Software Engineer to help drive development of the next generation of laser-based analytical instrumentation capable of high precis ...
Backend Software Engineer

2 weeks ago

Picarro Santa Clara, United States

Title: Backend Software Engineer · Location: Santa Clara, CA, USA · Job Term: Full-time (hybrid) · The Opportunity: · We are seeking a Backend Software Engineer to help drive development of the next generation of laser-based analytical instrumentation capable of high precision, ...
Senior Systems Engineer

3 weeks ago

Santa Clara University Santa Clara, United States

Position Title: · Senior Systems Engineer · Position Type: · Regular · Hiring Range: · $83, $98,200 per year; commensurate with experience. · Pay Frequency: · Annual · POSITION PURPOSE · The Senior Systems Engineer is responsible for the design, implementation, operation ...
Senior Solutions Architect, Networking

3 weeks ago

NVIDIA Santa Clara, United States Full time

NVIDIA is looking for an experienced network infrastructure Solutions Architect. Do you want to be part of a team that brings Artificial Intelligence (AI) hardware and software technologies to production in the field? We are looking for a networking savvy Solutions Architect to j ...
Senior Systems Engineer

1 week ago

Santa Clara University Santa Clara, United States

Position Title: · Senior Systems Engineer · Position Type: · Regular · Hiring Range: · $83, $98,200 per year; commensurate with experience. · Pay Frequency: · Annual · POSITION PURPOSE · The Senior Systems Engineer is responsible for the design, implementation, operation ...
Manager, At Scale Networking Engineering US, CA, Santa Clara

3 weeks ago

NVIDIA Santa Clara, United States

NVIDIA has been continually redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. Its an outstanding legacy of innovation thats fueled by great technologyand amazing people. Today, were tapping into the unlimited potential of AI to define the ...

Senior Manager, Professional Services HPC Deployment - Santa Clara, United States - NVIDIA

Description

Senior Manager, Professional Services HPC Deployment

Project Manager

Lead Reliability Engineer

Principal Software Architect

Lead Reliability Engineer

Lead Reliability Engineer

Principal Software Architect

Principal Software Architect

Solutions Architect

Senior Systems Engineer

AI Engineer

DC Technician 3

AI Engineer

Backend Software Engineer

Backend Software Engineer

Backend Software Engineer

Senior Systems Engineer

Senior Solutions Architect, Networking

Senior Systems Engineer

Manager, At Scale Networking Engineering US, CA, Santa Clara

Silas Jones

for Recruiters

Information

Senior Manager, Professional Services HPC Deployment - Santa Clara, United States - NVIDIA

Description

Senior Manager, Professional Services HPC Deployment professionals in Santa Clara