Member of Technical Staff, Supercomputing Platform - San Francisco - Magic Inc

    Magic Inc
    Magic Inc San Francisco

    2 days ago

    Description

    Magic's Mission

    Magic's mission is to build safe AGI that accelerates humanity's progress on the world's most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal.

    About The Role

    As an engineer on the Supercomputing Platform & Infrastructure team, you will design, build, and operate the large-scale GPU infrastructure that powers Magic's model training and inference workloads.

    A core part of this role is building and maintaining our infrastructure using Terraform-driven infrastructure-as-code practices, ensuring reproducibility, reliability, and operational clarity across clusters spanning thousands of GPUs.

    Magic's long-context models create sustained pressure on compute, networking, and storage systems. Long-running distributed jobs, high-throughput data movement, and strict availability requirements demand infrastructure that is automated, observable, and resilient by design. You will own the systems and IaC foundations that make this possible.

    This role can evolve into broader ownership of supercomputing platform architecture, shaping how Magic scales GPU clusters and infrastructure reliability as model workloads grow.

    What You'll Work On

    • Design and operate large-scale GPU clusters for training and inference

    • Build and maintain infrastructure using Terraform across cloud and hybrid environments

    • Develop modular, scalable IaC patterns for compute, networking, and storage provisioning

    • Improve deployment reproducibility, environment consistency, and operational safety

    • Optimize networking and storage systems for high-throughput AI workloads

    • Automate fault detection and recovery across distributed clusters

    • Debug complex cross-layer issues spanning hardware, drivers, networking, storage, OS, and cloud

    • Improve observability, monitoring, and reliability of core platform systems

    What We're Looking For

    • Strong systems engineering fundamentals

    • Deep, hands-on experience with Terraform, including module design, state management, environment isolation, and large-scale deployments

    • Experience operating production GPU infrastructure or high-performance distributed systems

    • Strong understanding of networking and storage systems

    • Experience with major cloud platforms (GCP, AWS, Azure, OCI, etc.)

    • Track record of owning production-critical infrastructure end-to-end

    Compensation, Benefits, And Perks (US):

    • Annual salary range between $200K - $550K depending on experience

    • Equity is a significant part of total compensation, in addition to salary

    • 401(k) plan with 6% salary matching

    • Generous health, dental and vision insurance for you and your dependents

    • Unlimited paid time off

    • Visa sponsorship and relocation stipend to bring you to SF, if possible

    • A small, fast-paced, highly focused team

    Magic strives to be the place where high-potential individuals can do their best work. We value quick learning and grit just as much as skill and experience.

    Our Culture

    • Integrity. Words and actions should be aligned

    • Hands-on. At Magic, everyone is building

    • Teamwork. We move as one team, not N individuals

    • Focus. Safely deploy AGI. Everything else is noise

    • Quality. Magic should feel like magic


  • Work in company

    Member of Technical Staff, Supercomputing Platform

    Only for registered members

    Magic's mission is to build safe AGI that accelerates humanity's progress on the world's most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alo ...

    San Francisco $200,000 - $550,000 (USD)

    6 days ago

  • Work in company

    Member of Technical Staff, Supercomputing Platform

    Only for registered members

    Magic's mission is to build safe AGI that accelerates humanity's progress on the world's most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alo ...

    San Francisco

    21 hours ago

  • Work in company

    Software Engineer, Data Visualization

    Only for registered members

    The Data Visualization team at OpenAI is responsible for building and maintaining all the visualization tools used for analyzing various software and hardware aspects of our custom-built hyperscale supercomputers. · This includes visualizing hardware (nodes, network, racks, etc.) ...

    San Francisco, CA

    2 weeks ago

  • Work in company

    Software Engineer, Data Visualization

    Only for registered members

    The Data Visualization team at OpenAI is responsible for building and maintaining all the visualization tools used for analyzing various software and hardware aspects of our custom-built hyperscale supercomputers. · ...

    San Francisco, CA

    1 month ago

  • Work in company

    Software Engineer, Collective Communication

    Only for registered members

    About the Team · The Workload Networking team is responsible for the collective communication stack used in our largest training jobs. Using a combination of C++ and CUDA we work on novel collective communication techniques that enable efficient training of our flagship models on ...

    San Francisco

    1 week ago

  • Work in company

    Software Engineer, Collective Communication

    Only for registered members

    The Workload Networking team is responsible for the collective communication stack used in our largest training jobs. · Using a combination of C++ and CUDA we work on novel collective communication techniques that enable efficient training of our flagship models on our largest cu ...

    San Francisco $380,000 - $555,000 (USD)

    1 month ago

  • Work in company

    Senior Site Reliability Engineer

    Only for registered members

    We are hiring for a fast-growing AI marketing analytics company that helps enterprises understand what truly drives business outcomes.Using GPU-native analytics and causal AI, · the platform enables large organizations to measure marketing impact accurately · and make confident d ...

    San Francisco

    3 weeks ago

  • Work in company

    Senior Site Reliability Engineer

    Only for registered members

    We are hiring for a fast-growing AI marketing analytics company that helps enterprises understand what truly drives business outcomes. · ...

    San Francisco, CA

    3 weeks ago

  • Work in company

    Senior SRE

    Only for registered members

    We are hiring for a fast-growing AI marketing analytics company that helps enterprises understand what truly drives business outcomes. · Work on GPU-native analytics and causal AI used by Fortune 100 companies. · ...

    San Francisco

    3 weeks ago

  • Work in company

    Deep Learning Compiler Engineer

    Only for registered members

    Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated sma ...

    San Francisco $160,000 - $240,000 (USD) per year

    1 week ago

  • Work in company

    Networking Operating System Firmware Engineer

    Only for registered members

    We're seeking a Networking Operating System Firmware Engineer to help bootstrap and scale the switching layer of our AI supercomputers. · In this role,you'll build and maintain custom SONiC NOS images from scratch, · working across the Linux kernel ,switch ASIC SAI/SDKs ,platform ...

    San Francisco, CA

    1 month ago

  • Work in company

    Senior AI/ML Specialist Solutions Architect

    Only for registered members

    · About the Company · Our client is a publicly traded company at the forefront of the AI revolution, offering an AI-centric cloud platform that's reshaping the landscape of artificial intelligence. The company provides cutting-edge infrastructure, including large-scale GPU clust ...

    San Francisco

    1 week ago

  • Work in company

    Senior AI/ML Specialist Solutions Architect

    Only for registered members

    About the Company · Our client is a publicly traded company at the forefront of the AI revolution, offering an AI-centric cloud platform that's reshaping the landscape of artificial intelligence. The company provides cutting-edge infrastructure, including large-scale GPU clusters ...

    San Francisco $225,000 - $275,000 (USD) Full time

    1 week ago

  • Work in company

    Cloud Solutions Architect

    Only for registered members

    · About the Company · Our client is at the forefront of the AI revolution, providing cutting-edge infrastructure that's reshaping the landscape of artificial intelligence. They offer an AI-centric cloud platform that empowers Fortune 500 companies, top-tier innovative startups, ...

    San Francisco $135,000 - $210,000 (USD) per year

    1 week ago

  • Work in company

    Senior AI/ML Specialist Solutions Architect

    Only for registered members

    About the Company · Our client is a publicly traded company at the forefront of the AI revolution, offering an AI-centric cloud platform that's reshaping the landscape of artificial intelligence. The company provides cutting-edge infrastructure, including large-scale GPU clusters ...

    San Francisco

    1 week ago

  • Work in company

    Cloud Solutions Architect

    Only for registered members

    About the Company · Our client is at the forefront of the AI revolution, providing cutting-edge infrastructure that's reshaping the landscape of artificial intelligence. They offer an AI-centric cloud platform that empowers Fortune 500 companies, top-tier innovative startups, and ...

    San Francisco $225,000 - $275,000 (USD) Full time

    1 week ago

  • Work in company

    Cloud Solutions Architect

    Only for registered members

    We are seeking a Cloud Solutions Architect (Pre-Sales) to join our client's team. · ...

    San Francisco $180,000 - $300,000 (USD) Full time

    1 month ago

  • Work in company

    Networking Operating System Firmware Engineer

    Only for registered members

    About the Team · OpenAI's Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI-native silicon while working closely with software and research par ...

    San Francisco

    1 week ago

  • Work in company

    MTS, Developer Experience

    Only for registered members

    We are the AGI Autonomy organization, and we are looking for a driven and talented Member of Technical Staff to join us to build state-of-the-art agents. · Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move ...

    San Francisco, CA

    1 month ago

  • Work in company

    MTS, Developer Experience

    Only for registered members

    We are looking for a driven and talented Member of Technical Staff to join our team to build state-of-the-art agents. · Design and implement a modern, fast, and ergonomic development environment for AI researchers. · Build and manage CI/CD pipelines that support large-scale AI re ...

    San Francisco

    1 month ago

  • Work in company

    Technical Program Manager, Hardware Systems

    Only for registered members

    The Compute team works on the design of our AI supercomputers doing everything from workload modeling to accelerator co-design We're leaning into our partnerships to make data center co-design an integral part of this process and are looking for engineers to design AI supercomput ...

    San Francisco $270,000 - $340,000 (USD)

    1 month ago

Jobs
>
San Francisco