Jobs
>
Olympia

    Senior Site Reliability Engineer, Data Science Platform - Olympia, United States - NVIDIA

    Default job background
    Description
    We are now looking for a Sr. Site Reliability Engineer (SRE), Data Science Platform.

    At NVIDIA, we pride ourselves on data-driven decision-making, and the data science team is at the heart of this initiative.

    We are looking for an excellent Sr. Site Reliability Engineer with extensive data infrastructure experience for our data science platform supporting NVIDIA's cloud platform services.

    Our data science platform serves as the basis for advanced real time data analytics, streaming, data lake and sophisticated ML/AI training with offline/online inferencing for NVIDIA's cloud services.

    Site Reliability Engineering is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices.

    SRE at NVIDIA ensures reliability and uptime as promised to the users and at the same time enabling developers to make changes to the existing system through careful preparation and planning while keeping an eye on capacity, latency and performance.

    SRE is also a mindset and a set of engineering approaches to running better production systems and optimizations.

    The person in this position will be responsible for Service Response and Workflows and will drive tools/service development to maintain and improve service SLOs.

    What You'll Be Doing

    Working on building tools to improve the SRE Observability and rapidly debug and triage incidents and user-reported issues.
    Make valuable contribution to the overall health, performance, and reliability of NVIDIA's Cloud Data Science platform and Infrastructure Services.

    Taking ownership of automating, scripting, and tooling of new/existing scripts to help the team achieve 100% automation of daily tasks.

    Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity management and launch reviews.

    Clear SRE Observability understanding and experience in building new tools and automation using Python/GO.
    Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
    Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
    Practice balanced incident response and blameless postmortems.

    What We Need To See

    MS or BS in Computer Science/Engineering or a related field or equivalent experience.

    5+ years Site reliability engineering experience working on large scale distributed micro services in a production environment with a real passion for automation and tooling.

    SRE approach and who can understand Error budgeting, SLO's, SLA's.
    Clear understanding on Incident management, change management and problem management process. Ability to detect all service-impacting issues, accurate triage, partner communication, impact containment, service restoration, and post-incident follow-up.

    Proven strengths in problem-solving and root causing issues, while continuously seeking ways to drive optimization, efficiency and the bottom line.

    Strong experience on streaming data infra services involving web services, Kafka, Spark etc.
    Expert knowledge with building and operating large scale observability platforms for monitoring and logging (ELK, Prometheus etc)
    Excellent interpersonal skills including the ability to identify and communicate data driven insights

    Ways To Stand Out From The Crowd

    Experience with operating large scale distributed systems with strong SLAs.

    Excellent scripting:
    Python, GO.

    Strong experience on operating data platforms.

    NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization.

    The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services.

    Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.

    NVIDIA is looking for great people like you to help us accelerate the next wave of artificial intelligence.

    The base salary range is 144,000 USD - 270,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

    You will also be eligible for equity and benefits . NVIDIA accepts applications on an ongoing basis.

    NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.

    As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

    #J-18808-Ljbffr


  • Dell Olympia, United States Full time

    Account Executive - Direct Sales (remote) Pacific Northwest · Location: This is a work-from-home position within the Pacific Northwest US with travel up to 40% · Secureworks (NASDAQ: SCWX) is a global cybersecurity leader that secures human progress with Secureworks TaegisTM, a S ...


  • Seneca Resources Olympia, United States

    NO C2C, OUTSIDE CONSULTANTS, BENCHES OR REFERRALS · Job Responsibilities · Responsible developing UI to maximize the User Experience (UX) · Design dashboards of operational status for customer · Build scalable web applications, including developing mobile applications · Test web ...


  • State of Washington Olympia, United States

    Vital Signs Reporting Analyst · (Exempt) · Who We Are · The Puget Sound Partnership is a non-regulatory state agency under Governor Inslee's Office leading the region's collective effort to restore and protect the Puget Sound natural ecosystems. Our mission is to accelerate ecosy ...


  • Intel Olympia, United States

    Job Details:Job Description: Join Intel as a Cloud Hardware Product Manager The Cloud Hardware Product Manager defines, plans, and manages activities for a specific product or group of products from product definition and planning through production, release, and end-of-life. Oth ...


  • Citizens Olympia, United States

    · Citizens Financial Group, Inc. (CFG) seeks a Senior Software Engineer for its Johnston, RI location. · Duties: Responsible for complex development activities within the Salesforce org that is dedicated to our Mortgage business line. They will work as a senior member of a cross ...

  • Canonical

    People Data Analyst

    5 days ago


    Canonical Olympia, United States

    Bring your people analytics, social science research and data mining/science skills to a unique team seeking to understand, and shape, the future of the digital workplace. We are interested in technology, of course, but we are also interested in the human mission of enabling the ...


  • Microsoft Corporation Olympia, United States

    Are you passionate about distributed systems, infrastructure services, massive scalability, and quality? Are you interested in solving the challenges of Cloud-scale Storage that supports the newest AI workloads? Are you excited about being at the forefront of cloud storage innova ...


  • Panaseer Olympia, United States

    Who we are:Panaseer is a unique business that is changing the face of enterprise security.We are a fast-growing Cyber Security and Data Science start-up that helps organizations manage their security posture using automated metrics and reporting; protecting the critical services ...


  • ECO Magazine Olympia, United States

    Ocean Career: Service Calibration Specialist - Sea-Bird Scientific · A wholly owned subsidiary of Danaher Corporation - Sea-Bird Scientific ) is headquartered in Bellevue, Washington. Sea-Bird Scientific is the world's largest developer and manufacturer of products for the meas ...


  • Bristol-Myers Squibb Olympia, United States

    **Engineer I, Non-Viral Delivery Technology Development - Cell Therapy** · CAROUSEL_PARAGRAPH · * Seattle, Washington · * Drug Dev and Preclinical Studies · * R1552611 · * Full Time · * 01/31/2022 · ** · Share Job** · mail_outlineGet future jobs matching this search · or · ** J ...


  • Centene Corporation Olympia, United States

    You could be the one who changes everything for our 28 million members. Centene is transforming the health of our communities, one person at a time. As a diversified, national organization, youll have access to competitive benefits including a fresh perspective on workplace flexi ...


  • DevSelect Olympia, United States

    Title: Android Automation Engineer · Location: Seattle, WA · Openings: 1 · Type: Long-Term Contract · Summary · The Client SDK team is seeking an experienced Android Automation Engineer to work on automating testing of new and existing features in our Mobile SDK. This individua ...


  • Marriott Olympia, United States

    Job Number · Job Category Information Technology · Location Marriott International HQ, 7750 Wisconsin Avenue, Bethesda, Maryland, United States · Schedule Full-Time · Located Remotely? Y · Relocation? N · Position Type Management · JOB SUMARY · As a Senior Devops Software Engine ...


  • Softek Global Services Olympia, United States

    Softek Global Services provides both operations & strategic support to tribal 8(a) & commercial organizations. Our client is a Tribal-Owned, SBA certified 8(a) small business and trusted federal partner, offering low-risk, fast-path to agile development, lean operations and digit ...


  • Ford Motor Company Olympia, United States

    At Ford Motor Company, we believe freedom of movement drives human progress. We also believe in providing you with the freedom to define and realize your dreams. With our incredible plans for the future of mobility, we have a wide variety of opportunities for you to accelerate yo ...


  • Marriott Olympia, United States Paid Work

    Job Number · Job Category Information Technology · Location Marriott International HQ, 7750 Wisconsin Avenue, Bethesda, Maryland, United States · Schedule Full-Time · Located Remotely? Y · Relocation? N · Position Type Management · JOB SUMMARY · The Architect, Network DevOps wil ...


  • CleanChoice Energy Olympia, United States

    Job Title: Senior Product ManagerFull-Time 40 Hours/WeekClassification: exempt under FLSALocation: 100% RemoteAbout CleanChoice Energy, a national renewable energy company that empowers people and businesses to cut emissions and live cleaner, healthier lives, is seeking a Senior ...

  • Engine Room

    Payroll Manager

    2 weeks ago


    Engine Room Olympia, United States

    Killer opportunity for first-class payroll talent · Who we are:Engine Room serves as the finance and accounting nerve center for the West Coasts smartest venture-backed life science start-ups. · Whom were looking for:Prime Payroll Manager talentThis position is full time. Respo ...


  • Ford Motor Company Olympia, United States

    At Ford Motor Company, we believe freedom of movement drives human progress. We also believe in providing you with the freedom to define and realize your dreams. With our incredible plans for the future of mobility, we have a wide variety of opportunities for you to accelerate yo ...


  • Citizens Olympia, United States

    Citizens Financial Group, Inc. (CFG) seeks a Senior Software Engineer for its Johnston, RI location. · Duties: Responsible for complex development activities within the Salesforce org that is dedicated to our Mortgage business line. They will work as a senior member of a cross-f ...