Director, Global Network Reliability Engineering, Director, Global Network Reliability Engineering - Santa Clara

Only for registered members Santa Clara, United States

1 day ago

Default job background
$268,000 - $408,250 (USD)
NVIDIA is seeking a Director of Network Reliability Engineering within the Enterprise Networking organization in IT. As the #1 AI Company in the world builds and delivers infrastructure, join the Enterprise networking organization to lead NVIDIA's Network Reliability Engineering ...
Job description
NVIDIA is seeking a Director of Network Reliability Engineering within the Enterprise Networking organization in IT.

As the #1 AI Company in the world builds and delivers infrastructure, join the Enterprise networking organization to lead NVIDIA's Network Reliability Engineering team.

From Data Centers to Contract Manufacturing sites, this org builds them all.


In this role, you will be responsible for NVIDIA's global network operations, ensuring reliability, scalability and efficiency goals are defined and met.

You will be leading a team of network reliability engineers to bring in a data driven approach to operations, with focus on observability, well defined success metrics, and making continuous improvements.

You will lead design and automation of all operations, provide architectural input based on outage patterns and observability trends and be the keeper of excellence in Networking.

The successful candidate would be able to leverage an execution mindset to facilitate effective translation of strategic plans to incremental delivery of impact to the business.

Structured thinking and problem-solving skills, along with exceptional communication abilities will be crucial for success in this role as you build strong teams that partner with engineering and operations teams across NVIDIA.

The leader will join and lead global operations across several countries, covering all of NVIDIA's data centers, Labs, Super Labs, Offices and Contract Manufacturing sites.

What You Will Be Doing


Your main focus will be maturing the current support model and processes to a more data driven, automated, SRE model.

Build an in-house team of reliability experts for networking support and operations from the existing outsourced SMES , providing leadership, direction, and strategy for a growing team.

Set the technical vision, strategy, and roadmap for network operations in partnership with the key infrastructure and partner teams.

Work across Network Architecture, Network engineering and partner well to establish run books, regular training sessions and ensure we build the network to be self-healing.

Work very well in understanding RCAs from events and incidents and work with our AI operations to enrich our observability tooling for better full stack view of the network to applications.

Influence the architecture of the Nvidia networks both on-prem and in the clouds.

What We Need To See

Bachelor's degree in Computer Science, related technical field, or equivalent experience

Experience building and growing teams that are geographically distributed , appreciate local operations and bring in a global perspective, following standards.

Ability to do technical deep-dives into code, networking, operating systems, and storage, as well as being verbally and cognitively agile enough to hold your own in strategy discussions with NVIDIA's executive team and peer SMEs
Ability to identify trends and promote solutions that solve challenges efficiently across multiple product areas
Excellent innovative thinking, collaboration, and problem-solving skills.
12+ overall years of experience with system design, network architecture, network engineering, and network operations and 7+ years Leadership of experience

Ways To Stand Out From The Crowd

Experience transforming network operations using software driven methods
Experience in a Hyperscale Cloud Service Provider (public facing or not)
Knowledge of SRE principles (observability, SLOs, SLIs, logging, etc)
Knowledge of software interface design & documentation for less technical end-users

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization.

The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services.

Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.

NVIDIA is looking for phenomenal people like you to help us accelerate the next wave of artificial intelligence.

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and dedicated people in the world working for us. If you're creative and passionate about developing cloud services we want to hear from you

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 268,000 USD - 408,250 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until January 13, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.

As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

, , JR2007400


Similar jobs

  • Work in company

    Reliability Engineer

    Only for registered members

    A client of Innova Solutions is immediately hiring a PCB Reliability Engineer. · As a PCB Reliability Engineer r, · you will need:B.S. in an engineering discipline. · Minimum of 5 years-experience in reliability organizations within automotive and/or printed circuit board assembl ...

    Santa Clara $110,000 - $120,000 (USD)

    1 month ago

  • Work in company

    Senior Reliability Engineer, Senior Reliability Engineer

    Only for registered members

    Being a leader in AI, NVIDIA has entered the Automotive graphics' segment with GPUs since 2006. Since then, we have increased our product offerings to include complete graphics support for customer navigation and instrument clusters. It is an exciting time for the auto industry, ...

    Santa Clara $168,000 - $264,500 (USD)

    2 days ago

  • Work in company

    Reliability Engineer

    Only for registered members

    This position offers an exciting opportunity for an experienced Reliability Engineer committed to driving operational excellence through innovation and technical expertise. · ...

    Santa Clara, CA

    1 month ago

  • Work in company

    Reliability Testing Engineer

    Only for registered members

    We are partnering with a well-funded startup to hire a Reliability Testing Engineer. · This is a hands-on role ideal for early- to mid-career engineers who are passionate about understanding failure mechanisms and device reliability. · ...

    Santa Clara

    1 month ago

  • Work in company

    Reliability Testing Engineer

    Only for registered members

    Thrively is seeking a Reliability Testing Engineer to develop and execute reliability test programs ensuring the long-term performance of MEMS devices. · ...

    Santa Clara

    1 week ago

  • Work in company

    Reliability Engineer

    Only for registered members

    Figure is an AI robotics company developing autonomous general-purpose humanoid robots. · ...

    San Jose, CA

    4 days ago

  • Work in company

    Reliability Engineer

    Only for registered members

    +We are looking for a Hardware Reliability Engineer in charge of driving design for reliability, designing and executing test plans for our humanoid robots to ensure they meet our lifetime specs. · +Collect and analyze current and prospective product use cases to define hardware ...

    San Jose $120,000 - $250,000 (USD) Full time

    4 days ago

  • Work in company

    Reliability Test Engineer

    Only for registered members

    Job summary · Reliability testing for computer/server systems and consumer electronics. · B.S. in Engineering with 5+ years in reliability testing (automotive/PCB manufacturing). · ...

    Santa Clara

    1 month ago

  • Work in company

    Senior Reliability Engineer

    Only for registered members

    +NVIDIA is at the forefront, providing high-end to mid-level performance solutions for these meaningful changes in automotive industry, · We have some of the most thoughtful and talented people in the world working for us. · +Developing and implementing reliability test plans and ...

    Santa Clara $168,000 - $264,500 (USD)

    2 weeks ago

  • Work in company

    Reliability Engineer

    Only for registered members

    The ideal candidate brings a strong technical foundation with a deep understanding of physics and mechanical systems. · ...

    San Jose

    3 days ago

  • Work in company

    Reliability Engineer

    Only for registered members

    · Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is based ...

    San Jose, CA $85,000 - $145,000 (USD) per year

    2 days ago

  • Work in company

    Package Reliability Engineer

    Only for registered members

    We are seeking an experienced Package Reliability Engineer with expertise in 2.5D/3D advanced packaging. This role requires collaboration with external assembly and test partners, · internal design, · process, · and failure analysis teams, · & suppliers to ensure the reliability ...

    Santa Clara $185,000 - $225,000 (USD)

    4 weeks ago

  • Work in company

    Reliability Testing Engineer

    Only for registered members

    +Job summary · Reliability Testing Engineer · +ResponsibilitiesSupport the development and execution of reliability test plans for MEMS, silicon photonic, and associated electronic components+Benefits ...

    Santa Clara, CA

    1 month ago

  • Work in company

    Package Reliability Engineer

    Only for registered members

    We are seeking an experienced Package Reliability Engineer with expertise in 2.5D/3D advanced packaging. · ...

    Santa Clara

    1 month ago

  • Work in company

    Database Reliability Engineer

    Only for registered members

    We are seeking a highly skilled Database Reliability Engineer (DBRE) with a minimum of 8+ years of proven experience in complex PostgreSQL and MongoDB administration and multi-cloud architecture. · ...

    Santa Clara

    1 month ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We are seeking an experienced Reliability Engineer to support reliability qualification and evaluation of semiconductor products prior to volume production. · Own and execute reliability test plans for new products, including sample testing and full qualification prior to volume ...

    San Jose

    1 week ago

  • Work in company

    Senior Reliability Engineer

    Only for registered members

    NVIDIA has entered the Automotive graphics' segment with GPUs since 2006 and is now looking for a Senior Reliability Engineer. · ...

    Santa Clara $168,000 - $264,500 (USD) Full time

    2 weeks ago

  • Work in company

    Package Reliability Engineer

    Only for registered members

    We are seeking an experienced Package Reliability Engineer with expertise in advanced packaging and collaboration skills. · A strong background in physics of failure, materials science and experience working closely with OSATs is required for this role. · The ideal candidate will ...

    Santa Clara

    1 week ago

  • Work in company

    Principal Reliability Engineer

    Only for registered members

    Invent the future with us. A semiconductor design company for a new era leading the future of computing with an innovative approach to CPU design focused on high-performance energy efficient AI compute. · Serve as a lead reliability authority proactively engaging with all enginee ...

    Santa Clara $140,500 - $254,000 (USD)

    3 days ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    NVIDIA Site Reliability Engineering provides a chance to define production systems with high efficiency and availability. · ...

    Santa Clara $168,000 - $333,500 (USD)

    1 month ago

  • Work in company

    Database Reliability Engineer

    Only for registered members

    Job Title Database Reliability Engineer Dice is leading career destination for tech experts at every stage of their careers Our client Maxonic Inc is seeking following Apply via Dice today Job description Top Skills Min years of relevant experience Min Bachelors DBRE PostgreSQL M ...

    Santa Clara

    1 month ago