Member of Technical Staff, Performance and Scale - San Francisco
1 week ago

Job summary
We're looking for an infrastructure engineer to build the distributed systems that power inference at global scale.You'll design and implement the foundational layers that enable vLLM to serve models across thousands of accelerators with minimal latency and maximum reliability.
Job description
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.
Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.
Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Access all high-level positions and get the job of your dreams.
Similar jobs
We're looking for an infrastructure engineer to build the distributed systems that power inference at global scale. · Bachelor's degree or equivalent experience in computer science, engineering, or similar. · Strong systems programming skills in Rust, Go, or C++. · Experience des ...
3 weeks ago
We're looking for an infrastructure engineer to build the distributed systems that power inference at global scale. · ...
3 weeks ago
A market leader in cloud infrastructure and developer tooling is seeking a Senior Software Engineer (Go) to help design, scale, and optimize a next-generation global CDN. · ...
1 month ago
The company is seeking a hands-on Infrastructure Tech Lead to help scale the platform through a period of rapid growth and increasing customer demand (and beyond). After recently emerging from private beta,the company has seen strong adoption from hundreds of businesses signing u ...
3 days ago
This is a hands-on role at the intersection of ML · software engineering and scalable infrastructure. You will help scale and optimize our training systems and core model code.Own training/inference infrastructure: Design implement and maintain systems for large-scale model trai ...
3 weeks ago
The company is seeking a hands-on Infrastructure Tech Lead to help scale the platform through a period of rapid growth and increasing customer demand (and beyond). · This role will be central to ensuring the platform remains fast, reliable, and resilient as the company expands in ...
4 days ago
You'll build and operate the data infrastructure that powers large-scale robot learning. · ...
1 week ago
+You'll take ownership of large-scale pre-training pipelines: data, compute, and model optimization. · +Run and iterate large-scale pre-training experiments · Design scalable systems for distributed compute and efficient checkpointing · Collaborate with research scientists on tra ...
8 hours ago
In this role you will help scale and optimize our training systems and core model code. · You'll own critical infrastructure for large-scale training, from managing GPU/TPU compute and job orchestration to building reusable and efficient JAX training pipelines. · Hands-on high-le ...
3 weeks ago
We are looking for a Senior Python Software Engineer (Core Infrastructure) to play a central role in architecting and scaling the distributed systems that orchestrate thousands of intelligent AI agents in dynamic, real-world environments. · ...
5 days ago
You'll build and operate the data infrastructure that powers large-scale robot learning. Your systems will sit directly between raw data sources and training/evaluation, enabling us to move faster while maintaining performance, correctness, and reliability at scale. · Data Ingest ...
3 weeks ago
Job summary · As an ML Infra Engineer (Data Systems), you'll build and operate the data infrastructure that powers large-scale robot learning. · ResponsibilitiesData Ingestion & Processing: Design and build high-throughput pipelines that validate, transform, and featurize raw mul ...
3 weeks ago
In this role you will help scale and optimize our training systems and core model code. · ...
3 weeks ago
+Job summary · Next-generation autonomous AI systems are being built to move far beyond simple interaction — toward software capable of executing complex, real-world operations at scale. · +ResponsibilitiesBuild a high-performance, fault-tolerant infrastructure layer that: · Orch ...
6 days ago
As an ML Infra Engineer (Data Systems), you'll build and operate the data infrastructure that powers large-scale robot learning. · ...
3 weeks ago
We build AI systems that power mission-critical workflows for some of the most important institutions in the world. · Design and scale Kubernetes- and Terraform-based infrastructure across customer environments. · Define standards for networking, security, CI/CD, and multi-region ...
6 days ago
As a performance engineer at Databricks you will work closely with multiple teams across the company to evaluate the performance of products and features identify performance bottlenecks and partner with engineers to solve performance and scalability issues. · ...
6 days ago
Pantograph is building generally intelligent robots that learn by doing — exploring, failing, and continuously improving through self-supervised reinforcement learning. · We own the entire stack, from hardware and firmware to training infrastructure and learning algorithms, · all ...
4 weeks ago
Help OpenAI evaluate and bring up new compute platforms that can support large-scale AI training and inference. · ...
6 days ago
+Job summary · The company is seeking a Principal Eng AI to architect, optimize, and extend its AI compute infrastructure.++Design and refine large-scale distributed training infrastructure optimized for high-throughput, low-latency operations. · Develop frameworks for resource s ...
1 week ago