Jobs
>
Santa Clara

    Senior High Performance Computing Cluster Administrator - Santa Clara, United States - NVIDIA

    Default job background
    Description


    NVIDIA's Deep Learning Optimized Frameworks Group is looking for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains.

    As a member of the DLFW Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute cluster that runs demanding deep learning, high performance computing, and computationally intensive workloads.

    We are looking for an expert to identify architectural changes and/or completely innovative approaches for our GPU Compute Cluster.

    In this role, you will help us with the strategic challenges we encounter, including compute, networking, and storage design for large-scale, high-performance workloads and effective resource utilization in a heterogeneous compute environment.


    What you'll be doing:
    Administer Linux systems, ranging from powerful DGX servers to embedded systems, bringup hardware to publicly available systems.

    Coordinate Storage Solutions and plan for growth.

    Automate configuration management, software updates, and maintenance and monitoring of system availability using modern DevOps tools (Ansible, Gitlab, etc.)

    Actively connect with management regarding any problems with the equipment and propose resolution.

    Plan, build and install/upgrade new systems that support NVIDIA DL Software

    What we need to see:
    You have a BA, BS, or MS in CS, EE, CE or equivalent experience

    4+ years of previous experience deploying and administrating HPC clusters

    Familiar with resource scheduling managers (Slurm (preferred), LSF, etc

    Proven track record to script in bash, Perl or python

    Experience with containers (Docker, Singularity, LXC)

    Deep understanding of operating systems, computer networks, and high-performance applications

    Ability to work well with developers & test engineers

    Hard-working dedication to provide quality in support for your users

    Ways to stand out from the crowd:

    Familiarity and prior work experience with technologies such as: Ansible, GIT, Slurm, Zabbix, Prometheus, Grafana and Docker

    Familiarity with GPU usage in Compute Cluster and Cuda

    Experience with mobile and embedded systems

    Basic knowledge of Deep Learning.

    Experience coding/scripting in Perl/Python/bash

    The base salary range is 148,000 USD - 230,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

    You will also be eligible for equity and benefits ) . NVIDIA accepts applications on an ongoing basis.

    NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.

    As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

    NVIDIA is a Learning Machine

    NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and the metaverse is transforming the world's largest industries and profoundly impacting society.

    Learn more about NVIDIA .
    #J-18808-Ljbffr


  • Nvidia Santa Clara, CA, United States

    NVIDIA's Deep Learning Optimized Frameworks Group is looking for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a mem ...


  • Quantum Ventures Santa Clara, United States

    Your tasks:You will work as a full-fledged team member in various areas of our work, including thin client support, printer and network administration, and creating system documentation. · Our training coordinator will be available to answer your questions and help you deepen you ...

  • Wipro Limited

    Oracle Apps Dba

    1 week ago


    Wipro Limited San Jose, United States

    San Jose, California · - Tech Hiring · **Job Description**: · **About Wipro**: · - Wipro Limited (NYSE: WIT, BSE: 507685, NSE: WIPRO) is a leading technology services and consulting company focused on building innovative solutions that address clients' most complex digital transf ...


  • Varada Consulting Mountain View, United States

    **Staff HPC and Infrastructure Administrator** · **Clearance**:US Citizenship is required / Ability to obtain a Public Trust · **Job Location**: AMES Research Center, Mountain View, CA (Mon-Fri Regular Business hours, Hybrid 3 days onsite/2 days remote) · **Relocation Assistance ...


  • Newaygo County RESA Fremont, United States

    **General Function** · The Career-Tech Center Classroom Teacher shall organize, coordinate, promote, and give direction to a comprehensive instructional program for the Newaygo County Career-Tech Center. · **Position Qualifications** · - Minimum of Bachelor's Degree, · Experience ...


  • Alexandria Real Estate Equities San Carlos, United States

    As a critical member of our Asset Services team, you would contribute directly to the success of our tenants and their scientific discoveries by providing white-glove services to the tenants, supporting our facilities team, and collaborating with building staff and other stakehol ...

  • Tech Mahindra

    DevOps Engineer

    1 week ago


    Tech Mahindra Santa Clara, United States

    Role : DevOps Engineer · Location : Santa Clara, CA (mandatory to be in office at least 4 days/week – no exception) · JD Summary : · Skilled engineer to join our team and contribute to the successful implementation and management of our cloud-based infrastructure. · Stay up to d ...


  • Pomeroy Pleasanton, United States

    **Title**: SQL DBA · **Company**: Pomeroy · **Location**:Pleasanton, CA Hybrid: 2-3 per MONTH onsite) · **Pay Rate**:$45-50/hr · **Duration**: 6+ months · **Job Summary**: · **Responsibilities**: · - SQL Server database administration, configuration, load balancing, performance t ...


  • NVIDIA Santa Clara, United States

    NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation fueled by great technology—and dynamic people. Today, we're tapping into the unlimited potential of AI to define the next era of computing ...


  • NVIDIA Santa Clara, United States

    NVIDIAs Deep Learning Architecture and Libraries group is seeking excellent Software Engineers to design and develop the software stack for our next generation test and development cluster, the core infrastructure that provides a foundation for every stage of our product developm ...

  • Sustainable Talent

    HPC Cluster Engineer

    2 weeks ago


    Sustainable Talent Santa Clara, United States

    Are you ready to make your mark in the forefront of technological innovation? As an HPC Cluster Engineer, you'll play a pivotal role in shaping the future of AI, deep learning, and machine learning initiatives. Join us and leverage Nvidia's cutting-edge GPU technology to drive gr ...


  • NVIDIA Santa Clara, United States

    NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new unive ...

  • HCL Technologies

    Lead Consultant

    2 weeks ago


    HCL Technologies Santa Clara, CA, United States

    Job FamilyProduct / Domain Consulting Job Description (Posting).7-10 years of hands-on experience with SAP Basis as a technical core engineerSAP HANA Installation and Administration activitiesSLT Administration, Installation/Upgrades of HANAHANA DB and Client Patches, Table Parti ...


  • LigaDATA Santa Clara, United States

    About Ligadata: · Ligadata is a cutting-edge data analytics company that leverages machine learning to empower businesses with actionable insights. We are at the forefront of innovation, providing advanced solutions for data processing, analysis, and decision-making. Join our dy ...


  • NVIDIA Santa Clara, United States

    NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation fueled by great technology—and dynamic people. Today, we're tapping into the unlimited potential of AI to define the next era of computing ...

  • Omega Solutions

    Devops Engineer

    1 week ago


    Omega Solutions Santa Clara, United States

    Devops Engineer · Client: Samsung · Jersey City, NJ (Onsite) · Must have 7+ years DevOps experience acting as an Individual Contributor. · Max Rate: $65/hr on c2c · Mandatory skillset · • Must be proficient in multiple scripting languages such as ruby, python, bash, etc. · ...

  • Tech Mahindra

    Databricks Admin

    3 weeks ago


    Tech Mahindra Santa Clara, United States

    Greetings · Databricks Admin · Santa Clara, CA · The candidate will work on building, scaling, and monitoring highly complex BigData platform on Databricks, Snowflake DB and ElasticSearch cloud. · The candidate will work on building, scaling, and monitoring Data science and ML ho ...


  • Omega Solutions Santa Clara, United States

    System Administrator · 6-9 months Contract to Perm · San Antonio, TX · Rate: Max $45/hr on w2 all inclusive · SAWS is looking for an experienced Systems Administrator. · JOB SUMMARY · The Systems Administrator is responsible for developing and implementing network server sta ...

  • NVIDIA

    Solutions Architect

    3 weeks ago


    NVIDIA Santa Clara, United States

    Solutions Architect - AI and HPC Cloud page is loaded · Solutions Architect - AI and HPC Cloud · Apply · locations · US, CA, Santa Clara · time type · Full time · posted on · Posted 3 Days Ago · job requisition id · JR · NVIDIA is looking for a Solutions Architect to w ...


  • Omega Solutions Santa Clara, United States

    System Administrator6-9 months Contract to PermSan Antonio, TXRate: Max $45/hr on w2 all inclusive · SAWS is looking for an experienced Systems Administrator.JOB SUMMARYThe Systems Administrator is responsible for developing and implementing network server standards, procedures ...