Senior Software Engineer - San Francisco - Inference

Inference San Francisco

1 week ago

Description

Help us make inference blazingly fast. If you love squeezing every last drop of performance out of GPUs, diving deep into CUDA kernels, and turning optimization techniques into production systems, we'd love to meet you.
About
trains and hosts specialized language models for companies that need frontier-quality AI at a fraction of the cost. The models we train match GPT-5 accuracy but are smaller, faster, and up to 90% cheaper. Our platform handles everything end-to-end: distillation, training, evaluation, and planet-scale hosting.
We are a well-funded ten-person team of engineers who work in-person in downtown San Francisco on difficult, high-impact engineering problems. Everyone on the team has been writing code for over 10 years, and has founded and run their own software companies. We are high-agency, adaptable, and collaborative. We value creativity alongside technical prowess and humility. We work hard, and deeply enjoy the work that we do. Most of us are in the office 4 days a week in SF; hybrid works for Bay Area candidates.
About the Role
You will be responsible for making our inference stack as fast and efficient as possible. Your work spans from implementing known optimization techniques to experimenting with novel approaches, always with the goal of serving models faster and cheaper at scale.
Your north star is inference performance: latency, throughput, cost efficiency, and how quickly we can bring new model architectures into production. You'll work across the full inference stack-from CUDA kernels to serving frameworks-to find and eliminate bottlenecks. This role reports directly to the founding team. You'll have the autonomy, a large compute budget, and technical support to push the limits of what's possible in model serving.
Key Responsibilities

Implement and productionize optimization techniques including quantization, speculative decoding, KV cache optimization, continuous batching, and LoRA serving
Deep dive into inference frameworks (vLLM, SGLang, TensorRT-LLM) and underlying libraries to debug and improve performance
Profile and optimize CUDA kernels and GPU utilization across our serving infrastructure
Add support for new model architectures, ensuring they meet our performance standards before going to production
Experiment with novel inference techniques and bring successful approaches into production
Build tooling and benchmarks to measure and track inference performance across our fleet
Collaborate with applied ML engineers to ensure trained models can be served efficiently

Requirements

2+ years of experience in ML systems, inference optimization, or GPU programming
Strong proficiency in Python and familiarity with C++
Hands-on experience with LLM inference frameworks (vLLM, SGLang, TensorRT-LLM, or similar)
Deep understanding of GPU architecture and experience profiling GPU workloads
Familiarity with LLM optimization techniques (quantization, speculative decoding, continuous batching, KV cache management)
Experience with PyTorch and understanding of how models execute on hardware
Track record of measurably improving system performance

Nice-to-Have

Experience with CUDA programming
Familiarity with serving non-LLM models (TTS, vision, embeddings)
Experience with distributed inference and multi-GPU serving
Contributions to open-source inference frameworks
Experience with Docker and Kubernetes

You don't need to tick every box. Curiosity and the ability to learn quickly matter more.
Compensation
We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $220,000 - $320,000, plus equity and benefits, depending on experience.
Equal Opportunity
is an equal opportunity employer. We welcome applicants from all backgrounds and don't discriminate based on race, color, religion, gender, sexual orientation, national origin, genetics, disability, age, or veteran status.

Software Engineer
3 days ago

Only for registered members San Francisco

We are seeking a talented Software Engineer to join our dynamic team and contribute to the development of next-generation IVF medical devices. · In this role, you will be responsible for designing, developing, and maintaining the software that powers our state-of-the-art reproduc ...
Software Engineer
1 month ago

Only for registered members San Francisco

We are seeking a skilled Software Engineer to design, implement and maintain robust software systems for production automation. · Build core libraries and services supporting motion planning, control and perception pipelines. · ...
Software Engineer
2 weeks ago

Only for registered members San Francisco, California, United States

We are looking for a Software Engineer to join our growing engineering team. · In this role, you will design, build and operate scalable software platforms that support analytics and AI solutions. · You will contribute to system architecture cloud deployments and modern container ...
Software Engineer
2 weeks ago

Only for registered members San Francisco Full time

We are looking for a Software Engineer to join our growing engineering team. · Design, develop and maintain high-quality software solutions using Python. · Contribute to the design and evolution of scalable and maintainable software architectures. · ...
Software Engineer
6 days ago

Only for registered members San Francisco, California, United States

We are seeking a highly skilled Remote Software Engineer to join our team and contribute to developing innovative, scalable, and high-performance software solutions. · Design, develop, and maintain software applications that are scalable, secure, · and efficient. · Collaborate wi ...
Software Engineer
1 month ago

Only for registered members San Francisco, CA

A versatile Software Engineer to drive technical innovation and accelerate growth across our portfolio of software companies. · 5+ years of experience as a full-stack or backend engineer, · ...
Software Engineer
1 month ago

Only for registered members San Francisco

We are excited to meet software engineers who are interested in financial markets and want to have real impact in a fast-paced environment. · ...
Software Engineer
1 month ago

Only for registered members San Francisco

VoiceBit is revolutionizing restaurant ordering with its innovative voice-first system. · ...
Software Engineer
1 week ago

Only for registered members San Francisco

Becoming is building Developmental Intelligence: AI for predicting how organisms change over time. We are building systems that don't — by tightly integrating hardware, biology, and software into platforms that operate continuously over long time horizons. · We are hiring a Full ...
Software Engineer
6 days ago

Only for registered members San Francisco, CA

Fast-growing technology startup building modern API infrastructure and user interfaces to help construction software vendors automate workflows and bridge the gap between job sites and back offices. · Design and implement robust backend systems that unify fragmented construction ...
Software Engineer
1 month ago

Only for registered members San Francisco, CA

We are hiring our first Forward-Deployed Engineer to help enterprise customers evaluate and adopt Greptile. · ...
Software Engineer
1 week ago

Only for registered members San Francisco, CA

Software engineer with 2+ years of experience in backend engineering at a top-tier tech company or quant firm required. · CS or technical degree from a top program · Ship production-grade code Own features end-to-end ...
Software Engineer
2 weeks ago

Only for registered members San Francisco

We are seeking a Software Engineer with deep Java expertise and emerging experience in Generative AI to support next-generation engineering solutions. · Design, develop, and integrate GenAI capabilities for enterprise software development teams. · Research, evaluate, and producti ...
Software Engineer
1 month ago

Only for registered members San Francisco

We are an applied AI lab building end-to-end software agents. · We're the makers of Devin, the first AI software engineer · and Windsurf, the AI-native IDE. Together, · they represent our vision for collaborative AI teammates that enable engineers to focus on more interesting pro ...
Software Engineer
2 weeks ago

Only for registered members San Francisco $140,000 - $170,000 (USD)

We're hiring our first Forward-Deployed Engineer to help enterprise customers evaluate and adopt Greptile. · Trajectory · Went from 0 > XM in <12 Months and growing >25% MoM · 2,500+ customers · Raised 30M+ led by Benchmark, along with continued support from YC, Paul Graham, Init ...
Software Engineer
1 month ago

Only for registered members San Francisco

You will be part of a high-impact team focused on architecting, building, and scaling the infrastructure, · tools,and platforms that improve the resiliency,reliability performance,and scalability of distributed systems running on · MuleSoft Anypoint Platform. · Design and develop ...
Software Engineer
2 weeks ago

Only for registered members San Francisco

+Job summary · We are seeking a highly experienced Software Engineer to drive Generative AI (GenAI) adoption across the enterprise. You will work closely with engineering teams to evaluate, implement, and scale modern AI development tools, AI agents, and emerging platform capabil ...
Software Engineer
3 weeks ago

Only for registered members San Francisco, CA

We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure. · Design, develop, and maintain high-quality software across our tech stack from low-level system to customer-facing UIs · Collaborate with prod ...
Software Engineer
1 month ago

Only for registered members San Francisco, CA

We are seeking a dynamic and innovative Software Engineer to join our forward-thinking technology team. · ...
Software Engineer
1 week ago

Only for registered members San Francisco Full time

Our client's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. · ...
Software Engineer
4 days ago

Only for registered members San Francisco

We're hiring our first Forward-Deployed Engineer to help enterprise customers evaluate and adopt Greptile. · Partner with GTM and engineering team to run technical discovery and tailor Greptile to each customer's stack. · Design and lead proof-of-concepts and pilots that demonstr ...

Software Engineer
Only for registered members San Francisco
Software Engineer
Only for registered members San Francisco
Software Engineer
Only for registered members San Francisco, California, United States
Software Engineer
Full time Only for registered members San Francisco
Software Engineer
Only for registered members San Francisco, California, United States
Software Engineer
Only for registered members San Francisco, CA
Software Engineer
Only for registered members San Francisco
Software Engineer
Only for registered members San Francisco
Software Engineer
Only for registered members San Francisco
Software Engineer
Only for registered members San Francisco, CA
Software Engineer
Only for registered members San Francisco, CA
Software Engineer
Only for registered members San Francisco, CA
Software Engineer
Only for registered members San Francisco
Software Engineer
Only for registered members San Francisco
Software Engineer
Only for registered members San Francisco
Software Engineer
Only for registered members San Francisco
Software Engineer
Only for registered members San Francisco
Software Engineer
Only for registered members San Francisco, CA
Software Engineer
Only for registered members San Francisco, CA
Software Engineer
Full time Only for registered members San Francisco
Software Engineer
Only for registered members San Francisco