Jobs
>
Palo Alto

    HPC Infrastructure Engineer - Palo Alto, United States - Arc Institute

    Arc Institute
    Arc Institute Palo Alto, United States

    1 week ago

    Default job background
    Description
    About Arc Institute

    The Arc Institute is a new scientific institution that conducts curiosity-driven basic science and technology development to understand and treat complex human diseases. Headquartered in Palo Alto, California, Arc is an independent research organization founded on the belief that many important research programs will be enabled by new institutional models. Arc operates in partnership with Stanford University, UCSF, and UC Berkeley.

    While the prevailing university research model has yielded many tremendous successes, we believe in the importance of institutional experimentation as a way to make progress. These include:
    • Funding: Arc will fully fund Core Investigator's (PI's) research groups, liberating scientists from the typical constraints of project-based external grants.
    • Technology: Biomedical research has become increasingly dependent on complex tooling. Arc Technology Centers develop, optimize and deploy rapidly advancing experimental and computational technologies in collaboration with Core Investigators.
    • Support: Arc aims to provide first-class support-operationally, financially and scientifically-that will enable scientists to pursue long-term high risk, high reward research that can meaningfully advance progress in disease cures, including neurodegeneration, cancer, and immune dysfunction.
    • Culture: We believe that culture matters enormously in science and that excellence is difficult to sustain. We aim to create a culture that is focused on scientific curiosity, a deep commitment to truth, broad ambition, and selfless collaboration.
    Arc scaled to nearly 100 people in its first year. With $650M+ in committed funding and a state of the art new lab facility in Palo Alto, Arc will continue to grow quickly to several hundred in the coming years.

    About the position

    We are seeking a HPC Infrastructure Engineer to join our Software Infrastructure team. In this role, you will be responsible for administering and optimizing our High-Performance Computing (HPC) cluster orchestrated by Slurm. You will work closely with researchers, developers, and IT professionals to ensure the availability, reliability, and performance of our HPC infrastructure. Your work will fuel the development of biological foundation models (i.e. Evo; Arc's recently released DNA foundation model), the Virtual Cell Initiative, and other cutting-edge bioinformatic projects in the context of Institute-wide efforts.

    About you
    • You lead with empathy. You know that successful systems are more about the user than the tool. You enjoy building relationships and credibility with your colleagues.
    • You enjoy solving problems. Any new project is an interesting puzzle. So is a tricky troubleshooting issue. You get satisfaction from helping someone get to resolution.
    • You're curious. You like to keep track of the latest developments in your field, and to learn about the substance behind your employer's mission.
    In this position you will
    • Manage and maintain the Slurm-based HPC cluster, ensuring high availability and performance.
    • Monitor system performance, identify bottlenecks, and implement optimizations.
    • Develop and implement strategies for system automation and configuration management.
    • Troubleshoot and resolve hardware, software, and network issues.
    • Collaborate with researchers and developers to understand their computational needs and provide appropriate resources and support.
    • Perform regular system updates, patches, and security enhancements.
    • Manage user access, quotas, and job scheduling policies.
    • Develop and maintain documentation for system configurations, procedures, and policies.
    • Participate in on-call rotations to provide high-availability support for critical issues.
    Requirements
    • Bachelor's degree in Computer Science, Information Technology, or a related field.
    • Proven experience in administering HPC clusters and managing Slurm workload managers or similar (Kubernetes, Grid Engine, Torque, etc.).
    • Strong knowledge of Linux operating systems (CentOS, Ubuntu, etc.).
    • Experience with configuration management tools such as Ansible, Puppet, or Chef.
    • Proficiency in scripting languages like Python, Bash, or Perl.
    • Familiarity with network protocols, storage systems, and high-speed interconnects (InfiniBand, Ethernet).
    • Experience with monitoring tools like Nagios, Prometheus, or Grafana.
    • Proficiency in software installation, configuration and development (make, bazel, gcc, gdb, conda, pip)
    • Experience developing and maintaining software that interacts with Nvidia GPUs, including drivers and diagnostic tools (CUDA, nvcc, nccl, etc.).
    • Understanding of security best practices and experience implementing security measures.
    • Excellent problem-solving skills and the ability to work under pressure.
    • Strong communication and collaboration skills.
    The base salary range for this position is $122,250 - $146,050. These amounts reflect the range of base salary that the Institute reasonably would expect to pay a new hire or internal candidate for this position. The actual base compensation paid to any individual for this position may vary depending on factors such as experience, market conditions, education/training, skill level, and whether the compensation is internally equitable, and does not include bonuses, commissions, differential pay, other forms of compensation, or benefits. This position is also eligible to receive an annual discretionary bonus, with the amount dependent on individual and institute performance factors.


  • Tencent Americas Palo Alto, United States

    Position Overview · Tencent Overseas IT has the mission to empower Tencent's rapid global growth with future-ready, global IT platforms, applications, and services. We are chartered to lead the Overseas IT strategy, architecture, roadmap, and execution. Satisfying our internal an ...


  • Tencent Americas Palo Alto, United States

    Position Overview · Tencent Overseas IT has the mission to empower Tencents rapid global growth with future-ready, global IT platforms, applications, and services. We are chartered to lead the Overseas IT strategy, architecture, roadmap, and execution. Satisfying our internal an ...


  • Tencent Americas Palo Alto, United States

    Position Overview · Tencent Overseas IT has the mission to empower Tencent's rapid global growth with future-ready, global IT platforms, applications, and services. We are chartered to lead the Overseas IT strategy, architecture, roadmap, and execution. Satisfying our internal an ...


  • Xai Palo Alto, United States

    About xAI · xAIs mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. · Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challengi ...


  • Xai Palo Alto, United States

    About xAI · xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. · Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challengi ...


  • Crystal Equation Corporation Mountain View, United States

    Responsibilities: · 1. Linux Administration. · Understanding of basic concepts and commands. · OS installation · Understanding of RAID, LVM and filesystem management · User management including LDAP · Package management · Linux network configuration including bonding. · Administ ...


  • Obsidian Security Palo Alto, United States

    Job Description · Job DescriptionAbout Us: · Obsidian Security was founded in 2017 to solve the unaddressed blindspot of SaaS Security. SaaS applications provide the tools employees need to succeed and hold the business' most critical information. If those tools become unavailabl ...


  • Bunkerhill Health Palo Alto, United States

    [Full Time] Infrastructure/DevOps Engineer at Bunkerhill Health (United States) | BEAMSTART Jobs · Infrastructure/DevOps Engineer · Bunkerhill Health United States · Date Posted · 05 Nov, 2022 · Work Location · Palo Alto, United States · Salary Offered · $150000 — $170000 yearl ...


  • Glean Palo Alto, United States

    About Glean · We're on a mission to make knowledge work faster and more humane. We believe that AI will fundamentally transform how people work. In the future, everyone will work in tandem with expert AI assistants who find knowledge, create and synthesize information, and execu ...


  • Mashgin Inc Palo Alto, United States

    Research & Development – Software Engineering / · At Mashgin, we are developing the future of checkout experiences. We aim to reduce the friction between customer desires and their ability to be on their way. · In May 2022, Mashgin closed a Series B funding round at a $1.5B valua ...


  • Aptos Palo Alto, United States

    Aptos is a people-first blockchain on a mission to help billions of people achieve universal and fair access to decentralized assets in a safe and scalable way. · Founded by some of the original creators and maintainers that researched, designed, and built the Diem blockchain to ...


  • BitGo, Inc. Palo Alto, United States

    BitGo is the leader in digital asset financial services, providing institutional investors with liquidity, custody, and security solutions. Founded in 2013, BitGo pioneered the multi-signature wallet and is the first digital asset company to focus exclusively on serving instituti ...


  • Matroid Stanford, United States

    About Matroid · Matroid's mission is to enable computers to visually understand the world. · With a "no programming required" product, Matroid empowers businesses and industry with our computer vision solutions. Our users can deploy cutting-edge, deep neural networks on the clo ...


  • Diverse Lynx Mountain View, United States

    Infrastructure Engineer · Mountain View, CA · Full time/Permanent · JD: Daily walkthrough of the Center and report/fix the infrastructure issues · Manage the Infrastructure of the center · Install and troubleshoot Experience/Demo · First line support for AV devices and engage OE ...


  • Mashgin Inc Palo Alto, United States

    About Mashgin · At Mashgin, we are developing the future of checkout experiences. We aim to reduce the friction between customer desires and their ability to be on their way. · In May 2022, Mashgin closed a Series B funding round at a $1.5B valuation. We are backed by some of Sil ...


  • AppLovin Stanford, United States

    About AppLovin · AppLovin makes technologies that help businesses of every size connect to their ideal customers. The company provides end-to-end software and AI solutions for businesses to reach, monetize and grow their global audiences. For more information about AppLovin, vis ...


  • Tech Mahindra Sunnyvale, United States

    Networks' IT Infrastructure team is chartered to provide compute, storage, and cloud services for the enterprise, including life-cycle environments, supply chain operations, business and customer facing applications. · The IT UNIX engineer role delivers in depth technical experti ...


  • SambaNova Systems Palo Alto, United States

    The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale. · SambaNova Suite ...


  • Lorven Technologies Mountain View, United States Full time

    Hi, · Our client is looking Testbench Infrastructure Engineer for Full-time project in Mountain View, CA (On-site) below is the detailed requirements. · Kindly share me your Updated Resume with Best Reachable Number · Job Role: Testbench Infrastructure Engineer · Locatio ...


  • MultiOn AI Palo Alto, United States

    At MultiOn, we are looking for a few more A+ players who want to join a rocket ship startup. · About Us: · We believe in the power of Artificial Intelligence to revolutionize the way we interact with computers and the digital world. We're not just building a state-of-the-art pers ...