- Work alongside researchers and engineers to bring cutting-edge AI models into production.
- Collaborate with research teams to enable high-performance inference for novel architectures.
- Design and implement new techniques, tools, and architectures that improve performance, latency, throughput, and efficiency.
- Optimize our codebase and compute fleet (e.g., GPUs) to fully utilize hardware FLOPs, bandwidth, and memory.
- Extend orchestration frameworks (e.g., Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving.
- Establish standards for reliability, observability, and reproducibility across the inference stack.
- Publish and share learnings through internal documentation, open-source libraries, or technical reports that advance the field of scalable AI infrastructure.
- Bachelor's degree or equivalent experience in computer science, engineering, or similar.
- Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.
- Experience with inference serving systems optimized for throughput and latency (e.g., SGLang, vLLM).
- Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts.
- A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships.
- Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases
- Experience training or supporting large-scale language models with hundreds of billions of parameters or more.
- Understanding of distributed compute systems, GPU parallelism, and hardware-aware optimizations.
- Contributions to open-source ML or systems infrastructure projects (e.g., SGLang, vLLM, PyTorch, Triton, DeepSpeed, XLA).
- Track record of improving research productivity through infrastructure design or process improvements.
- Location: This role is based in San Francisco, California.
- Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $350,000 - $475,000 USD.
- Visa sponsorship: We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the visa process together.
- Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed.
-
· Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals. · We are scientists, engineers, and buil ...
San Francisco1 week ago
-
Inferact is looking for an infrastructure engineer to build the distributed systems that power inference at global scale. · ...
San Francisco, CA1 month ago
-
+We are looking for Members of Technical Staff to join the Model Serving team at Cohere. The team is responsible for developing, deploying, and operating the AI platform delivering Cohere's large language models through easy to use API endpoints. · +Work closely with many teams t ...
San Francisco1 month ago
-
+We are looking for Members of Technical Staff to join the Model Serving team at Cohere. · +5+ years of engineering experience running production infrastructure at a large scale · Experience designing large, highly available distributed systems with Kubernetes, and GPU workloads ...
San Francisco Full time1 month ago
-
We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. · We obsess o ...
San Francisco, CA1 month ago
-
+We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. · ...
San Francisco, CA1 month ago
-
Who are we? · Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that ou ...
San Francisco6 days ago
- Work in company
Machine Learning Infrastructure Engineer- Model Inference
Only for registered members
About Abridge · Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most— ...
San Francisco, CA6 days ago
- Work in company
Machine Learning Infrastructure Engineer- Model Inference
Only for registered members
About Abridge · Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most— ...
San Francisco $221,000 - $260,000 (USD)1 week ago
- Work in company
Staff Software Engineer, Ads ML Inference Infrastructure
Only for registered members
The Ads ML Inference Infra team owns the online inference and feature serving systems that power real-time model scoring and delivery for all Ads models at Pinterest. The team is looking for a staff engineer with strong hands-on experience in large-scale ML inference systems. · L ...
Palo Alto $208,454 - $364,795 (USD)3 weeks ago
- Work in company
Staff Software Engineer, ML Training and Inference Infrastructure
Only for registered members
As a Staff Software Engineer, ML training and inference infrastructure, you will be a member of the Perception team at Rivian, which develops advanced machine learning algorithms that directly impact safety critical self-driving features of our category defining vehicles. · We ar ...
Palo Alto, CA2 days ago
- Work in company
Staff Software Engineer, Ads ML Inference Infrastructure
Only for registered members
The Ads ML Inference Infra team owns the online inference and feature serving systems that power real-time model scoring and delivery for all Ads models at Pinterest. · Lead efforts to build next-generation model inference and feature serving systems. · Design low-latency inferen ...
Palo Alto $208,454 - $364,795 (USD) Full time3 weeks ago
- Work in company
Staff Software Engineer, Ads ML Inference Infrastructure
Only for registered members
We're on a mission to bring everyone the inspiration to create a life they love, and that starts with the people behind the product. Discover a career where you ignite innovation for millions, transform passion into growth opportunities. · Lead and drive efforts to build next-gen ...
Palo Alto, CA3 weeks ago
- Work in company
Internship, Software Engineer, Foundation Inference Infrastructure (Summer 2026)
Only for registered members
This position is expected to start May 2026 and continue through summer term (ending approximately August 2026 or later). As a member of the Foundation Inference Infrastructure team you will design & implement backend services and tools that power autonomy software and hardware d ...
Palo Alto $100,000 - $150,000 (USD) InternshipSHIP1 month ago
-
We build and operate large-scale LLM inference and training infrastructure serving millions of users. · ...
San Francisco $200,000 - $500,000 (USD)3 weeks ago
-
About the Role · We are looking for an Inference Engineering Manager to lead our AI Inference team. This is a unique opportunity to build and scale the infrastructure that powers Perplexity's products and APIs, serving millions of users with state-of-the-art AI capabilities. · Yo ...
San Francisco $140,000 - $230,000 (USD) per year1 day ago
-
We are looking for an Inference Engineering Manager to lead our AI Inference team. This is a unique opportunity to build and scale the infrastructure that powers Perplexity's products and APIs, · serving millions of users with state-of-the-art AI capabilities.Lead and grow a high ...
San Francisco, CA1 month ago
-
We are looking for an Inference Engineering Manager to lead our AI Inference team. · This is a unique opportunity to build and scale the infrastructure that powers Perplexity's products and APIs, · serving millions of users with state-of-the-art AI capabilities. · ...
San Francisco $300,000 - $385,000 (USD)1 month ago
-
We are looking for an Inference Engineering Manager to lead our AI Inference team.This is a unique opportunity to build and scale the infrastructure that powers Perplexity's products and APIs. · , · You will own the technical direction and execution of our inference systems while ...
San Francisco $300,000 - $385,000 (USD)3 weeks ago
-
We are looking for an Inference Engineering Manager to lead our AI Inference team. This is a unique opportunity to build and scale the infrastructure that powers Perplexity's products and APIs serving millions of users with state-of-the-art AI capabilities. · ...
San Francisco1 month ago
-
About Us Most AI is frozen in place - it doesn't adapt to the world. We think that's backwards. · ...
San Francisco, CA1 month ago
Research Engineer, Infrastructure, Inference - San Francisco - Thinking Machines Lab
Description
Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals.
We are scientists, engineers, and builders who've created some of the most widely used AI products, including ChatGPT and , open-weights models like Mistral, as well as popular open source projects like PyTorch, OpenAI Gym, Fairseq, and Segment Anything.
About the Role
We're looking for an infrastructure research engineer to design, optimize, and scale the systems that power large AI models. Your work will make inference faster, more cost-effective, more reliable, and more reproducible to enable our teams to focus on advancing model capabilities rather than managing bottlenecks.
Our focus is on performant and efficient model inference both to power real-world applications and to accelerate research. This role is responsible for the infrastructure that ensures every experiment, evaluation, and deployment runs smoothly at scale.
Note: This is an "evergreen role" that we keep open on an on-going basis to express interest. We receive many applications, and there may not always be an immediate role that aligns perfectly with your experience and skills. Still, we encourage you to apply. We continuously review applications and reach out to applicants as new opportunities open. You are welcome to reapply if you get more experience, but please avoid applying more than once every 6 months. You may also find that we put up postings for singular roles for separate, project or team specific needs. In those cases, you're welcome to apply directly in addition to an evergreen role.
What You'll Do
Minimum qualifications:
-
Research Engineer, Infrastructure, Inference
Only for registered members San Francisco
-
Member of Technical Staff, Inference Infrastructure
Only for registered members San Francisco, CA
-
Staff Software Engineer, Inference Infrastructure
Only for registered members San Francisco
-
Staff Software Engineer, Inference Infrastructure
Full time Only for registered members San Francisco
-
Site Reliability Engineer, Inference Infrastructure
Only for registered members San Francisco, CA
-
Staff Software Engineer, Inference Infrastructure
Only for registered members San Francisco, CA
-
Staff Software Engineer, Inference Infrastructure
Only for registered members San Francisco
-
Machine Learning Infrastructure Engineer- Model Inference
Only for registered members San Francisco, CA
-
Machine Learning Infrastructure Engineer- Model Inference
Only for registered members San Francisco
-
Staff Software Engineer, Ads ML Inference Infrastructure
Only for registered members Palo Alto
-
Staff Software Engineer, ML Training and Inference Infrastructure
Only for registered members Palo Alto, CA
-
Staff Software Engineer, Ads ML Inference Infrastructure
Full time Only for registered members Palo Alto
-
Staff Software Engineer, Ads ML Inference Infrastructure
Only for registered members Palo Alto, CA
-
Internship, Software Engineer, Foundation Inference Infrastructure (Summer 2026)
InternshipSHIP Only for registered members Palo Alto
-
Platform / Inference Optimization Engineer
Only for registered members San Francisco
-
Engineering Manager
Only for registered members San Francisco
-
Inference Engineering Manager
Only for registered members San Francisco, CA
-
Inference Engineering Manager
Only for registered members San Francisco
-
Engineering Manager
Only for registered members San Francisco
-
Inference Engineering Manager
Only for registered members San Francisco
-
Inference Engineer
Only for registered members San Francisco, CA