Staff Software Engineer, Training - San Francisco Bay Area

Only for registered members San Francisco Bay Area, United States

1 month ago

Default job background
What You'll Do · Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernels · Design, build, and optimize distributed training systems (PyTorch) for multi-node GPU cluster ...
Lorem ipsum dolor sit amet
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.

Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.

Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Get full access

Access all high-level positions and get the job of your dreams.



Similar jobs

  • Work in company

    Training Performance Engineer

    Only for registered members

    We are building a unified, modular runtime that meets researchers where they are and moves with them up the scaling curve. · Success for us is measured by raising both training throughput (how fast models train) and researcher throughput (how fast ideas become experiments and pro ...

    San Francisco $250,000 - $460,000 (USD)

    1 month ago

  • Work in company

    Training Performance Engineer

    Only for registered members

    About the Team · Training Runtime designs the core distributed machine-learning training runtime that powers everything from early research experiments to frontier-scale model runs. With a dual mandate to accelerate researchers and enable frontier scale, we're building a unified, ...

    San Francisco

    2 days ago

  • Work in company

    Distributed Training Engineer

    Only for registered members

    Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million-dollar funding and direct sponsorship from AMD with hands-on support from AMD engineers the team is scaling rapid ...

    San Francisco

    6 days ago

  • Work in company

    Training Performance Engineer

    Only for registered members

    About The Team · Training Runtime designs the core distributed machine-learning training runtime that powers everything from early research experiments to frontier-scale model runs. With a dual mandate to accelerate researchers and enable frontier scale, we're building a unified, ...

    San Francisco, CA

    5 days ago

  • Work in company

    Distributed Training Engineer, Sora

    Only for registered members

    The Sora team is working on making video a key capability of OpenAI's foundation models.As a Distributed Systems/ML engineer, you will work on improving the training throughput for our internal training framework and enable researchers to experiment with new ideasCollaborate with ...

    San Francisco, CA

    1 month ago

  • Work in company

    Distributed Training Engineer, Sora

    Only for registered members

    About the Team · The Sora team is working on making video a key capability of OpenAI's foundation models. We are a hybrid research and product team that seeks to understand and expand the capabilities of our video models, while ensuring their reliability and safety. We accomplish ...

    San Francisco

    2 days ago

  • Work in company

    Distributed Training Engineer, Sora

    Only for registered members

    The Sora team is working on making video a key capability of OpenAI's foundation models. As a Distributed Systems/ML engineer, you will work on improving the training throughput for our internal training framework and enable researchers to experiment with new ideas. ...

    San Francisco $380,000 - $555,000 (USD)

    1 month ago

  • Work in company

    Training: ML Framework Engineer

    Only for registered members

    About the Team · Training Runtime designs the core distributed machine-learning training runtime that powers everything from early research experiments to frontier-scale model runs. With a dual mandate to accelerate researchers and enable frontier scale, we're building a unified, ...

    San Francisco

    2 days ago

  • Work in company

    Research Engineer, Pre-training

    Only for registered members

    +Job summary · We are seeking a Research Engineer to join our Pre-training team, · responsible for developing the next generation of large language models.+ResponsibilitiesConduct research and implement solutions in areas such as model architecture, · Data processing, · ...

    San Francisco $340,000 - $425,000 (USD)

    1 month ago

  • Work in company

    Staff Software Engineer, Training

    Only for registered members

    Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack from data pipelines to GPU kernels Design build and optimize distributed training systems PyTorch for multi-node GPU clusters ensuring scalability r ...

    San Francisco

    1 month ago

  • Work in company

    Research Engineer, Infrastructure, Training Systems

    Only for registered members

    · Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals.  · We are scientists, engineers, and buil ...

    San Francisco $350,000 - $475,000 (USD) per year

    3 days ago

  • Work in company

    Software Engineer, AI Training and Infrastructure

    Only for registered members

    We are building the world's first general purpose robotic intelligence that is robust and adapts to unseen scenarios without failing. · We believe massive scale through data-driven machine learning is the key to unlocking these capabilities for the widespread deployment of robots ...

    San Francisco $100,000 - $300,000 (USD)

    1 month ago

  • Work in company

    ML Engineer, FM Training Integration

    Only for registered members

    We are a group of engineers to support training foundation models at Apple We build infrastructure to support training foundation models with general capabilities such as understanding and generation of text images speech videos and other modalities and apply these models to Appl ...

    San Francisco $147,400 - $220,900 (USD)

    1 month ago

  • Work in company

    Machine Learning Engineer, Training Infrastructure

    Only for registered members

    · Job Title: Machine Learning Engineer, Training Infrastructure · Position Type: Full time · Location: San Francisco, CA, USA · Salary Range: $150,000 - $250, 000 (USD) · Job ID#: 158135 · Job Description:We are looking for an ML Engineer with 3+ YOE in high-performance computin ...

    San Francisco, CA

    2 days ago

  • Work in company

    LLM Training Frameworks and Optimization Engineer

    Only for registered members

    We are building cutting-edge infrastructure to enable efficient and scalable training of large language models (LLMs). We focus on optimizing training frameworks, algorithms, and infrastructure to push the boundaries of AI performance, scalability, · and cost-efficiency.We invite ...

    San Francisco $160,000 - $230,000 (USD)

    1 month ago

  • Work in company

    Staff Infrastructure Engineer, Pre-training

    Only for registered members

    Anthropics mission is to create reliable interpretable and steerable AI systems We want AI to be safe and beneficial for our users and for society as a whole Our team is a quickly growing group of committed researchers engineers policy experts and business leaders working togethe ...

    San Francisco $340,000 - $425,000 (USD)

    1 month ago

  • Work in company

    DevOps / IaC Engineer AI Training

    Only for registered members

    We're looking for a DevOps/IaC Engineer to shape the data that powers frontier AI. · Fluent in English with strong writing and communication skills. · Expertise in DevOps and Infrastructure as Code (IaC): containers (Docker), orchestration (Kubernetes), CI/CD (GitHub Actions, Cir ...

    San Francisco $25 - $100 (USD)

    3 weeks ago

  • Work in company

    Engineer in Training

    Only for registered members

    +Perform structural analyses under supervision +Design new structures and repairs of existing structures +Work on routine problems independently and on project teams +Assist in preparation of proposals letters reports calculations drawings specifications budgeting scheduling and ...

    Oakland, California, United States

    2 weeks ago

  • Work in company

    LLM Training Dataset and Checkpoint Optimization Engineer

    Only for registered members

    We are seeking a Training Dataset and Checkpoint Acceleration Engineer to optimize data pipelines and checkpoint mechanisms for large-scale machine learning workloads.In this role, you will work at the intersection of data engineering and distributed systems, ensuring that traini ...

    San Francisco $160,000 - $230,000 (USD)

    1 month ago

  • Work in company

    Entry Level Designer/Engineer in Training, Mechanical

    Only for registered members

    We are seeking an entry-level Mechanical Designer/Engineer-in-Training to join our team. · As a member of our Buildings team, you will work with guidance and direction on tasks and smaller projects, · and as a team member of a larger project under the guidance of a Senior Enginee ...

    San Francisco $85,400 - $119,600 (USD)

    3 weeks ago

  • Work in company

    Bilingual Large Model Training Acceleration Engineer

    Only for registered members

    We are an AI platform engineering group focused on large-scale model training systems and performance acceleration. · Optimize large model training pipelines for performance and scalability · Design and improve distributed training systems · ...

    San Francisco

    3 weeks ago