Jobs
>
Santa Clara

    Reliability, Availability and Serviceability Expert, Datacenter AI Products Development - Santa Clara, United States - NVIDIA

    Default job background
    Description


    For two decades, we have pioneered visual computing, the art and science of computer graphics - with our invention of the GPUs, the engine of modern AI technologies, the field has expanded to encompass AI-powered video games, social networking and web search, IC & other product design, medical diagnosis, and scientific research.

    Today, visual computing is the critical computing engine for deep learning-based AI including ChatGPT, becoming increasingly central to how people entertain and interact, and there has never been a more exciting time to join us to enable visual computing and AI to the next chapter.

    We are looking for one product development engineer as a SME to drive key aspects of RAS/Resilience features from Chip to module to server for our next-generation products for AI Applications.

    We are expecting you to bring deep knowledge and experience in RAS/Resilience testing, characterization, analysis, benchmarking, and risk assessment of large AI training or HPC cluster systems with InfiniBand or enhanced Ethernet.


    What you'll be doing:


    The focal point SME for manufacturing test requirements, test methodology, test plan and test flow for AI system RAS/Resilience features to ensure good test coverage and successful production ramp-ups.

    Own the AI system RAS/Resilience models, Benchmarking and Risk assessment.

    Own the troubleshooting and root-causing of AI system RAS/Resilience related failures at factory and in the field.

    Drive the end-to-end RAS efforts of chip-board-system to reduce FIT rates.


    Lead the data analysis of RAS/Resilience logs to refine, revise and overhaul test methodology and manufacturing flows; influence and drive software tools/infrastructure required for new product development, validation, and productization.

    Opportunity to work closely and partner with architecture, hardware, software, and product engineering teams through the product development lifecycle.

    Be ready to be challenged to assess new hardware features and architect manufacturing RAS tests, flows, methodologies.

    You'll nurture a deep understanding of NVIDIA's AI hardware and software architecture.

    What we need to see:
    BS or higher in EE, CE, CS, Mathematics, or equivalent experience.


    12+ years proven hands-on experiences in design, testing, benchmarking, and risk assessment of system RAS / Resiliency features of large Compute or AI or HPC systems.

    Proficient in Compute System RAS/Resilience model theory and methodology.

    Proficient in HPC or AI system architecture and Cluster Interconnect technologies.


    Proficient in using test equipment, Linux commands and benchmark utilities to test and trouble-shoot compute system RAS & Resiliency features.

    Strong problem-solving and trouble-shooting expertise; and institutionalizing root-cause analysis.

    Self-initiative, strong interpersonal skills, and flexibility to adapt to new technologies.

    Solid Knowledge and/or Experience in HPC or MLPerf benchmarking is a plus.


    NVIDIA is widely considered to be one of the technology world's most desirable employers We have some of the most forward-thinking and hardworking people in the world working for us.

    If you're creative and autonomous, we want to hear from you

    The base salary range is 188,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

    You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

    NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.

    As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

    #J-18808-Ljbffr

  • Soarkidz, Inc.

    Product Development

    5 days ago


    Soarkidz, Inc. San Jose, United States

    About Us: · Soar Kidz, a rapidly expanding EdTech company, has quickly established itself as a leader in the online education industry. With its innovative live streaming platform, the company delivers high-quality classes to students worldwide, achieving exceptional retention ra ...

  • SilverLinx

    Product Developer

    1 week ago


    SilverLinx Centerville, United States

    Our renowned client in Immenstadt is currently seeking a full-time Product Developer (m/f/d) to join their team at Centerville. · AUGUSTA Personaldienstleistungen GmbH has been active in the field of personnel placement and leasing for over 30 years. Are you looking for a new cha ...


  • Radiant Ventures Mountain View, United States

    Your Responsibilities · Responsible for creating, maintaining, and prioritizing user stories and requirements with clear acceptance criteria, as well as product roadmaps · Optimization of business processes and analysis of complex issues · Observation of target markets and close ...


  • Google Mountain View, United States

    **Minimum qualifications**: · - Bachelor's degree or equivalent practical experience. · - 10 years of experience in product management, consulting, co-founder or related technical role. · - 5 years of experience in people management, with technical leadership. · - 5 years of expe ...


  • Kelly Santa Clara, United States

    Kelly Services is actively searching for a Design Quality Engineer - New Product Development supporting the system quality engineering team for a 6+ month onsite engagement with one of our global clients located in Santa Clara, California. · Pay rate starts at: $50/hour + benefit ...


  • NVIDIA Santa Clara, United States

    We are now looking for a Sr. Product Development Engineer - DataCenter · NVIDIA Corporation is a world leader in visual computing technology. The GPU, which the company invented, serves as the visual cortex of modern computers and is at the heart of our products and services. NV ...


  • NVIDIA Santa Clara, United States

    Senior Product Development Engineer - Boards page is loaded · Senior Product Development Engineer - Boards · Apply · locations · US, CA, Santa Clara · time type · Full time · posted on · Posted 3 Days Ago · job requisition id · JR · NVIDIA is an industry leader in high ...


  • NVIDIA Santa Clara, United States

    NVIDIA continues to reinvent itself. Our invention of the GPU in 1999 fueled the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing. NVIDIA is a ...


  • NVIDIA Santa Clara, United States

    NVIDIA continues to reinvent itself. Our invention of the GPU in 1999 fueled the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing. NVIDIA is a ...


  • MSR Cosmos San Jose, United States

    Position: Product Development with AI/ML · Location: San Jose CA -Onsite (Can consider remote) · Type: Contract · Qualifications · Experience working on large feed products, especially in improving feed quality · 5+ Experience in Product management. · Prior experience with AI/ML ...


  • bioMérieux SA San Jose, United States

    Location: San Jose, CA, United States · Position Type: Unfixed Term · Job Function: Bioscience R&D · A family-owned company, bioMérieux has grown to become · a world leader in the field of in vitro diagnostics . For almost 60 years and across the world, we have imagined and dev ...


  • The Mom Project San Jose, CA, United States

    The normal nurturing and discovery that school-aged children gain from families, community, and youth activities has decayed due to daily challenges and what many refer to as "algorithms", leaving families and youth today overwhelmed. We are focused on introducing a "community ma ...


  • eBay Inc. San Jose, United States

    At eBay, we're more than a global ecommerce leader — we're changing the way the world shops and sells. Our platform empowers millions of buyers and sellers in more than 190 markets around the world. We're committed to pushing boundaries and leaving our mark as we reinvent the fut ...


  • bioMérieux San Jose, CA, United States

    Want to see more? Find all job opportunities for both BioFire and bioMérieux worldwide by clicking here.DescriptionThe estimated salary range for this role based in California is between $135,900 and $177,200 annually. This role is eligible to receive a variable annual bonus base ...


  • The Mom Project San Jose, CA, United States

    The normal nurturing and discovery that school-aged children gain from families, community, and youth activities has decayed due to daily challenges and what many refer to as "algorithms", leaving families and youth today overwhelmed. We are focused on introducing a "community ma ...


  • Charles Brown Recruitment San Jose, United States

    The product engineer will be responsible for NPI project management, product evaluation, characterization, qualification, and production support of our cutting edge Power IC products. · This position requires interface with various engineering and production groups within the com ...


  • KLA Milpitas, United States Full time

    Base Pay Range: $111, $189,200.00 Annually · Primary Location: USA-CA-Milpitas-KLA KLA's total rewards package for employees may also include participation in performance incentive programs and eligibility for additional benefits identified below. Interns are eligible for some o ...


  • J&J Family of Companies Santa Clara, United States

    Staff Quality Engineer, New Product Development W · Description · J&J MedTech is recruiting for a · Staff Quality Engineer, New Product Development · , located in · Cincinnati, OH or Santa Clara, CA. · At Johnson & Johnson, we believe health is everything. Our strength in heal ...


  • J&J Family of Companies Santa Clara, United States

    Staff Quality Engineer, New Product Development W · Description · J&J MedTech is recruiting for a Staff Quality Engineer, New Product Development , located in Cincinnati, OH or Santa Clara, CA. · At Johnson & Johnson, we believe health is everything. Our strength in healthcare in ...


  • NVIDIA Santa Clara, United States

    NVIDIA continues to reinvent itself. Our invention of the GPU in 1999 fueled the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI the next era of computing. NVIDIA is a le ...