Senior Engineer, GPU Infrastructure - West Palm Beach, United States - Vultr

    Default job background
    Description
    Join Vultr


    The Engineering team is a central pillar of our growth strategy, and we are looking for a Principal Engineer, GPU Infrastructure to help build and support our GPU-based product offerings.

    You and your team will have ownership over the setup and provisioning of our GPU-based systems and help drive engineering and operational excellence around our GPU infrastructure.

    Our team's mission is to provide a fast, performant, and stable infrastructure for all of our customers.


    What to expect:
    Developing and maintaining GPU infrastructure in bare metal and containerized environments

    Work directly with our networking team to build scalable and supportable GPU clusters

    Ensure excellent customer experience by ensuring consistent and reliable provisioning of GPU infrastructure

    Build and maintain test automation of GPU-based products to ensure fast and reliable provisioning

    Implement and maintain GPU-based solutions to meet the needs of diverse applications and computational workloads

    Conduct in-depth benchmarking, performance testing, and troubleshooting of GPU systems to identify and resolve any hardware or software limitations

    Working with vendors to get all supported drivers and packages

    Working with vendors on any bugs, performance-related issues, hardware problems, and reference architectures

    Address any hardware, software, or performance issues promptly, coordinating with vendors, technical support, and internal teams as required

    Our ideal candidate will have:
    Hands-on experience working with current, high-performance GPUs, primarily NVIDIA products (e.g. NVLink, Infiniband, GRID drivers, vGPU and NVAIE)

    In-depth, hands-on experience working with and automating bare metal internals including BIOS, BMC, firmware, NICs, Redfish/IPMI, PCIe

    Experience with Linux, package management and device drivers

    Experience with commercial firmware

    Experience with Python, Bash, and PHP

    Experience with Machine Learning software

    Compensation
    $120,000 - $135,000

    This salary can vary based on location, years of experience, background and skill set.

    #J-18808-Ljbffr