Jobs
>
Santa Clara

    Site Reliability Engineer - Santa Clara, United States - NVIDIA

    Default job background
    Description
    NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology—and outstanding people. Today, we're tapping into the unlimited potential of AI to define the next era of computing.

    An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world.

    Doing what's never been done before takes vision, innovation, and the world's best talent.

    As an NVIDIAN, you'll be immersed in a diverse, encouraging environment where everyone is inspired to do their best work.

    Come join the team and see how you can make a lasting impact on the world.

    We are looking for a Staff Site Reliability Engineer to join our team.

    You should have experience supporting and working with teams across the company to improve the usability, reliability, and performance for enterprise applications.

    What You'll Be Doing

    Design, develop, and evolve the Site Reliability Engineering practice.
    Deploy and support tools from a system engineering perspective and be able to solve any issues in-depth.

    Help the SRE teams define technology and business strategies that deliver iterative enhancements to the tools and processes that improve availability, observability, and scalability.

    Recognize, validate, and publish emerging technologies and architectures that align with business objectives.
    Lead and build the proven foundation for the Infrastructure and Application lifecycle on installation, monitoring, observability, and user experience.

    Build tooling to lower the barrier of entrance for engineering teams to plug in and enjoy the benefits of Reliability.

    Documenting institutional knowledge.
    Building software to help operations and support teams.

    What We Need To See

    Bachelor's and/or Masters in computer science or related field of study (or equivalent experience)
    8+ demonstrable experience deploying and supporting applications in a Cloud environment.
    Having Confluence, Jira, and Service Desk experience is a plus.
    Excellent Windows and Linux system skills.
    Good understanding of security components like SSL, load balancer, firewalls, etc.
    Extensive experience supporting applications in high-availability environments.
    Scripting skills to automate repetitive and basic tasks.
    Experience in documenting processes and procedures.
    Strong interpersonal skills with the ability to understand and explain technical issues to a non-technical audience.


    Widely considered to be one of the technology world's most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package.

    As you plan your future, see what we can offer to you and your family

    The base salary range is 160,000 USD - 247,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

    You will also be eligible for equity and benefits . NVIDIA accepts applications on an ongoing basis.

    NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.

    As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

    #J-18808-Ljbffr


  • Natron Energy Santa Clara, United States

    Natron is seeking a Reliability Engineer to support the development and test of our high-power battery systems for data center UPS and EV charging applications. The occupant of this position will work with the Product Engineering, Reliability, Technology, and Operations teams to ...

  • COMTECH TELECOMMUNICATIONS

    Reliability Engineer

    3 weeks ago


    COMTECH TELECOMMUNICATIONS Santa Clara, United States

    Job Description · Job DescriptionComtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a Reliability/Failure Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside customers, pr ...

  • Comtech Telecom

    Reliability Engineer

    3 weeks ago


    Comtech Telecom Santa Clara, United States

    Comtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a Reliability/Failure Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside customers, providing solutions to a variety of ...


  • Advanced Micro Devices , Inc. San Jose, United States

    Overview: · WHAT YOU DO AT AMD CHANGES EVERYTHING · We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences the building bloc ...


  • Celestial AI Santa Clara, United States

    About Celestial AI · As the industry strives to meet the demands of the AI workloads, bottlenecks in data transfers between processors and memory have hindered progress. The Photonic Fabric based Memory Fabric provides an optically scalable solution to the 'Memory Wall' problem, ...


  • Celestial AI Santa Clara, United States

    About Celestial AI · As the industry strives to meet the demands of the AI workloads, bottlenecks in data transfers between processors and memory have hindered progress. The Photonic Fabric based Memory Fabric provides an optically scalable solution to the 'Memory Wall' problem, ...


  • TEKsystems San Jose, United States Contract

    Description: · Adobe is looking for an experienced Site Reliability Engineer to join the internal tooling team support, configure, integrate, upgrade, and automate the use of enterprise tools used across their large Engineering organization. Role will be focused on user interact ...


  • Apple Cupertino, United States

    Reliability Engineer · Cupertino,California,United States · Hardware · Do you ever wonder what goes into making Apple products an amazing user experience? Apples innovative reliability team is responsible for insuring that our products exceed our customers expectations for rob ...


  • HCLTech San Jose, United States

    About HCLTech: · HCLTech is a global technology company, home to 221,000+ people across 60 countries, delivering industry-leading capabilities centered around digital, engineering and cloud, powered by a broad portfolio of technology services and products. We work with clients ac ...


  • Software Technology, Inc Santa Clara, United States

    Job Description · Job DescriptionPosition : Service Reliability Engineer / Sr. Devops Engineer · Location : Santa Clara, CA · Duration : 1 Year + · OK with any visa No OPT please · Local consultants only · Customer will not provide letter for H1B candidates. Please check with t ...


  • Cryptoware Technologies Inc Santa Clara, United States

    Job Description · Job DescriptionResponsibility · • Lead the effort of global expansion of Huobi globe spanning infrastructure. · • Work with engineering teams to make sure new features and changes are deployed quickly and safely. · • Constantly improve our system performance and ...


  • ServiceNow Santa Clara, United States

    Company Description · At ServiceNow, our technology makes the world work for everyone, and our people make it possible. We move fast because the world can't wait, and we innovate in ways no one else can for our customers and communities. By joining ServiceNow, you are part of an ...

  • Comtech Telecom

    Reliability Engineer

    2 weeks ago


    Comtech Telecom Santa Clara, United States Full time Regular

    Comtech Telecommunications Corp. has an opportunity in Santa Clara, CA for a Reliability/Failure Analysis Engineer. In this important role, you will collaborate with a diverse team of technical professionals and interact with outside customers, providing solutions to a variety of ...


  • Palo Alto Networks Santa Clara, United States

    Our Mission · At Palo Alto Networks everything starts and ends with our mission: · Being the cybersecurity partner of choice, protecting our digital way of life. · Our vision is a world where each day is safer and more secure than the one before. We are a company built on the fou ...


  • Kofi Group Santa Clara, United States Direct Hire

    To Apply for this Job Click Here · Principal Site Reliability Engineer · San Francisco Bay Area, CA · We are partnering with a late-stage Cloud Security company that is looking for a Principal Level SRE · The ideal candidate will have: · Strong sense of architecture and design f ...


  • Palo Alto Networks Santa Clara, United States

    Our Mission · At Palo Alto Networks everything starts and ends with our mission: · Being the cybersecurity partner of choice, protecting our digital way of life. · Our vision is a world where each day is safer and more secure than the one before. We are a company built on the fou ...


  • Lawrence Harvey Sunnyvale, United States

    Site Reliability Engineer · Status: Full Time · Compensation: 120k to 145k · Hybrid Requirements: 3 days in office, 2 days remote · Lawrence Harvey has partnered with a leading Chinese fintech startup that is committed to democratizing payment services and empowering people and ...


  • Advantis Global is now INSPYR Solutions Sunnyvale, United States

    ABOUT THIS FEATURED OPPORTUNITY · The QoS Infrastructure Tools Team is responsible for building and maintaining tools that are essential for Site Reliability Engineers (SREs) and engineers across the organization. The team primarily develops applications using Golang for backend ...


  • MRINetwork Jobs San Jose, United States

    Job Description · Job Description · We are working with a company operating in the best of both worlds – an innovative start-up inside of a $6 billion parent company building the next generation of solar. They have developed an industry-leading building-integrated solar technol ...


  • Intel San Jose, United States

    Job Details: · Job Description: · Microelectronic Quality Reliability Engineers provide project management, product, process design/development and sustaining support for integrated circuit or semiconductor assemblies, various other electronic components, sub systems and/or com ...