Staff Software Engineer, Training - San Carlos - Genesis AI

    Genesis AI
    Genesis AI San Carlos

    1 week ago

    Description

    What You'll Do

    • Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernels

    • Design, build, and optimize distributed training systems (PyTorch) for multi-node GPU clusters, ensuring scalability, robustness, and high utilization

    • Implement efficient low-level code (CUDA, cuDNN, Triton, custom kernels) and integrate it seamlessly into high-level training frameworks

    • Optimize workloads for hardware efficiency: CPU/GPU compute balance, memory management, data throughput, and networking

    • Develop monitoring and debugging tools for large-scale runs, enabling rapid diagnosis of performance regressions and failures

    What You'll Bring

    • Deep experience in distributed systems, ML infrastructure, or high-performance computing (8+ years)

    • Production-grade expertise in Python

    • Low-level performance mastery: CUDA/cuDNN/Triton, CPU–GPU interactions, data movement, and kernel optimization

    • Scaling at the frontier: experience with PyTorch and training jobs using data, context, pipeline, and model parallelism

    • System-level mindset with a track record of tuning hardware–software interactions for maximum utilization


    #J-18808-Ljbffr

  • Work in company

    Software Engineer, Training

    Only for registered members

    About the Company · Models are what they eat. But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy. · At DatologyAI, we've built a state of the art data ...

    Redwood City

    1 week ago

  • Work in company

    Research Engineer, Post-training

    Only for registered members

    Company Overview · At Skild AI, we are building the world's first general purpose robotic intelligence that is robust and adapts to unseen scenarios without failing. We believe massive scale through data-driven machine learning is the key to unlocking these capabilities for the w ...

    San Mateo $100,000 - $300,000 (USD)

    2 weeks ago

  • Work in company

    Robotics Engineer, Post Training

    Only for registered members

    We are looking for a Robotics Engineer specialising in Post Training & Deployment to help further the performance and capabilities of Skild AI's world-class high-level software and products. · ...

    San Mateo $100,000 - $300,000 (USD) Full time

    1 month ago

  • Work in company

    Research Engineer, Post-training

    Only for registered members

    · Company Overview · At Skild AI, we are building the world's first general purpose robotic intelligence that is robust and adapts to unseen scenarios without failing. We believe massive scale through data-driven machine learning is the key to unlocking these capabilities for th ...

    San Mateo, CA $100,000 - $300,000 (USD) per year

    1 week ago

  • Work in company

    Research Engineer, Post-training

    Only for registered members

    Company Overview · At Skild AI, we are building the world's first general purpose robotic intelligence that is robust and adapts to unseen scenarios without failing. We believe massive scale through data-driven machine learning is the key to unlocking these capabilities for the w ...

    San Mateo, CA

    1 week ago

  • Work in company

    Research Engineer, Post-Training

    Only for registered members

    Meta is seeking Research Engineers to join the Post-Training team within Meta Superintelligence Labs. High-quality data is the engine of AI progress at MSL, determining the capabilities we can unlock and how fast our models improve. As a Research Engineer on this team, you will b ...

    Menlo Park $58.65 - $181,000 (USD)

    6 days ago

  • Work in company

    Research Engineer, Post-Training

    Only for registered members

    Meta is seeking Research Engineers to join the Post-Training team within Meta Superintelligence Labs. High-quality data is the engine of AI progress at MSL, determining the capabilities we can unlock and how fast our models improve. As a Research Engineer on this team, you will b ...

    Menlo Park $88.46 - $257,000 (USD)

    6 days ago

  • Work in company

    Research Engineer, Post-Training

    Only for registered members

    Meta is seeking Research Engineers to join the Post-Training team within Meta Superintelligence Labs. High-quality data is the engine of AI progress at MSL, determining the capabilities we can unlock and how fast our models improve. As a Research Engineer on this team, you will b ...

    Menlo Park $74.04 - $217,000 (USD) Full time

    1 week ago

  • Work in company

    Research Engineer, Post-Training

    Only for registered members

    Meta is seeking Research Engineers to join the Post-Training team within Meta Superintelligence Labs. High-quality data is the engine of AI progress at MSL, determining the capabilities we can unlock and how fast our models improve. As a Research Engineer on this team, you will b ...

    Menlo Park $74.04 - $217,000 (USD)

    6 days ago

  • Work in company

    Research Engineer, Post-Training

    Only for registered members

    Meta is seeking Research Engineers to join the Post-Training team within Meta Superintelligence Labs. High-quality data is the engine of AI progress at MSL, determining the capabilities we can unlock and how fast our models improve. As a Research Engineer on this team, you will b ...

    Menlo Park $58.65 - $181,000 (USD) Full time

    1 week ago

  • Work in company

    Research Engineer, Post-Training

    Only for registered members

    Meta is seeking Research Engineers to join the Post-Training team within Meta Superintelligence Labs. High-quality data is the engine of AI progress at MSL, determining the capabilities we can unlock and how fast our models improve. As a Research Engineer on this team, you will b ...

    Menlo Park $88.46 - $257,000 (USD) Full time

    1 week ago

  • Work in company

    Research Engineer, Post-Training

    Only for registered members

    Meta is seeking Research Engineers to join the Post-Training team within Meta Superintelligence Labs. High-quality data is the engine of AI progress at MSL, determining the capabilities we can unlock and how fast our models improve. As a Research Engineer on this team, you will b ...

    Menlo Park, CA

    5 days ago

  • Work in company

    Research Engineer, Distributed Training

    Only for registered members

    We are seeking a highly motivated and skilled Research Engineer with expertise in model training, · Conduct research and implement solutions in areas such as model architecture, algorithms, data processing, and optimization. · Optimize and scale our training infrastructure to imp ...

    Palo Alto

    2 months ago

  • Work in company

    Research Engineering Manager, Post-Training

    Only for registered members

    Meta is seeking a Research Engineering Manager to lead the Post-Training team within Meta Superintelligence Labs. · ...

    Menlo Park $219,000 - $301,000 (USD) Full time

    2 weeks ago

  • Work in company

    Research Engineering Manager, Post-Training

    Only for registered members

    Meta is seeking a Research Engineering Manager to lead the Post-Training team within Meta Superintelligence Labs. High-quality data is the core of AI progress at MSL, fueling our frontier capabilities and determining how our models interact with the world. In this leadership role ...

    Menlo Park $219,000 - $301,000 (USD)

    2 weeks ago

  • Work in company

    Research Engineer, Distributed Training

    Only for registered members

    We are seeking a highly motivated and skilled Research Engineer with expertise in model training, · Conduct research and implement solutions in areas such as model architecture, · Optimize and scale our training infrastructure to improve efficiency and reliability in a reinforcem ...

    Palo Alto, CA

    1 month ago

  • Work in company

    Research Engineering Manager, Post-Training

    Only for registered members

    Meta is seeking a Research Engineering Manager to lead the Post-Training team within Meta Superintelligence Labs. High-quality data is the core of AI progress at MSL, fueling our frontier capabilities and determining how our models interact with the world. In this leadership role ...

    Menlo Park, CA

    2 weeks ago

  • Work in company

    Training Performance Engineer

    Only for registered members

    We are building a unified, modular runtime that meets researchers where they are and moves with them up the scaling curve. · Success for us is measured by raising both training throughput (how fast models train) and researcher throughput (how fast ideas become experiments and pro ...

    San Francisco $250,000 - $460,000 (USD)

    2 months ago

  • Work in company

    Train Control Engineer

    Only for registered members

    JOB · Pay RatePay Band E06 (Non-Rep)$128, minimum) - $194, maximum)The negotiable initial salary offer will be between $128, $149,055.16 commensurate with education and experience. Reports ToManager of Train Control Engineering or designee.Current AssignmentThis is a capital posi ...

    Oakland, CA

    2 weeks ago

  • Work in company

    Train Control Engineer

    Only for registered members

    Ride BART to a satisfying career that lets you both: 1) make a difference to Bay Area residents, and 2) enjoy excellent pay, benefits, and employment stability. · BART offers a competitive salary, · comprehensive health benefits, · paid time off, · and the CalPERS retirement prog ...

    Oakland $49,055 - $128,685 (USD)

    2 weeks ago

  • Work in company

    Training Infrastructure Engineer

    Only for registered members

    About Black Forest Labs · We're a team of world-class researchers and engineers creating the generative models that power how people make images and video—tools used by millions of creators, developers, and businesses worldwide. Our FLUX models are among the most advanced in the ...

    San Francisco $180,000 - $300,000 (USD)

    3 days ago

Jobs
>
San Carlos