Staff Software Engineer, Inference - United States
3 days ago

Job summary
Build low-latency inference pipelines for on-device deployment enabling real-time next-token and diffusion-based control loops in robotics design optimize distributed inference systems on GPU clusters pushing throughput with large-batch serving efficient resource utilization implement efficient low-level code CUDA Triton custom kernels integrate seamlessly into high-level frameworks optimize workloads for both throughput batching scheduling quantization latency caching memory management graph compilation develop monitoring debugging tools guarantee reliability determinism rapid diagnosis of regressions across both stacks.
Job description
, consectetur adipiscing elit. Nullam tempor vestibulum ex, eget consequat quam pellentesque vel. Etiam congue sed elit nec elementum. Morbi diam metus, rutrum id eleifend ac, porta in lectus. Sed scelerisque a augue et ornare.
Donec lacinia nisi nec odio ultricies imperdiet.
Morbi a dolor dignissim, tristique enim et, semper lacus. Morbi laoreet sollicitudin justo eget eleifend. Donec felis augue, accumsan in dapibus a, mattis sed ligula.
Vestibulum at aliquet erat. Curabitur rhoncus urna vitae quam suscipit
, at pulvinar turpis lacinia. Mauris magna sem, dignissim finibus fermentum ac, placerat at ex. Pellentesque aliquet, lorem pulvinar mollis ornare, orci turpis fermentum urna, non ullamcorper ligula enim a ante. Duis dolor est, consectetur ut sapien lacinia, tempor condimentum purus.
Access all high-level positions and get the job of your dreams.
Similar jobs
Inference Optimization Engineer
3 days ago
BentoML is a leading inference platform provider that helps AI teams run large language models and other generative AI workloads at scale. · You will improve the speed and efficiency of large language models at the GPU kernel level through the inference engine and across distribu ...
Generative AI Inference Engineer
3 days ago
Job summary · Generative AI Inference Engineer · We are seeking passionate Machine Learning Engineers to join our Inference team. · ...
Senior Engineer
3 weeks ago
We are looking for an experienced Field-Applications Engineer to help deploy a new generation of code translation tools enabled by AI and modern verification techniques. · Deploy and manage containerized services using Docker. · Deploy and run Python based GenAI pipelines interac ...
ML Engineer
3 days ago
Elevating the quality of human life through every conversation. · ...
+Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. · +This is a globally remote opportunity for exceptional generalist engineers who can work across the entire vLLM stack: from low-level GPU ke ...
Senior II Software Engineer Lead
1 week ago
Join the Akamai Inference Cloud Team to drive innovation at the intersection of AI and edge computing. · 8 years of relevant experience and a Bachelor's degree or its equivalent · ...
Lead Tech Recruiter
3 days ago
Nebius is leading a new era in cloud computing to serve the global AI economy. · We create the tools and resources our customers need to solve real-world challenges and transform industries, · without massive infrastructure costs or the need to build large in-house AI/ML teams. · ...
Lead Tech Recruiter
11 hours ago
+Nebius is leading a new era in cloud computing to serve the global AI economy. · +Deep Technical Sourcing (LLM, Inference, Systems, GPU): Proactively identify and engage senior-level engineers and researchers across a wide range of AI/ML and systems domains.Use advanced sourcing ...
Forward Deployed Engineer
3 days ago
Forward Deployed Engineers sit at the intersection of core engineering, product strategy, and customer success. You'll partner directly with customers to understand real-world problems, then design, build, and launch production-ready AI solutions on the Bento platform. · ...
Senior Engineering Manager
1 week ago
We're seeking an experienced Senior Engineering Manager to lead our AI Inference Cloud team in designing and developing our globally-distributed AI inference platform. · Responsibilities · ...
AI Principal Software Developer
1 week ago
We are seeking experienced AI Developers to help us shape the future of OCI Networking with AI. · This position offers an opportunity to work on cutting-edge AI applications and includes a collaborative work environment, · competitive benefits,and the chance to contribute to tran ...
Statistics Specialist
3 days ago
We're looking for statistics specialists who live and breathe probability theory, statistical modeling, data analysis, hypothesis testing, regression analysis, multivariate statistics, · Bayesian inference, time series analysis, · experimental design, · and machine learning algor ...
Developer Advocate
6 days ago
Job summary · We're looking for a scrappy, resourceful Developer Advocate to help grow Token Factory. · QualificationsStrong understanding of AI developers and building patterns. · ...
AI Engineer
3 days ago
We're looking for an AI Engineer to design, implement and optimize advanced AI systems that balance quality, performance and cost. · ...
Developer Advocate
11 hours ago
We're looking for a scrappy, resourceful Developer Advocate to help grow Token Factory, Nebius' high-performance inference platform built for teams running real production AI workloads at scale. · Help developers know, adopt and use Token Factory for inference use cases. · Build ...
Principal Software Engineer
1 week ago
We are seeking a Principal Software Engineer to join our Akamai Inference Cloud Team. As a technical leader and architect, you will drive critical technical decisions and guide the team in building a platform that delivers OpenAI-compatible endpoints while orchestrating inference ...
Senior Engineer
3 days ago
We are looking for an experienced Field-Applications Engineer to help deploy a new generation of code translation tools enabled by AI and modern verification techniques. · ...
Senior ML Solutions Architect
3 days ago
+Nebius is seeking an experienced Senior ML Solutions Architect to support customers leveraging Nebius Token Factory's serverless inference platform for open-source LLMs across multiple modalities. · +ResponsibilitiesDesign and implement LLM-based solutions using Nebius Token Fac ...
GenAI Intern
3 days ago
The selected candidate will have the opportunity to contribute to a Machine Learning Operations (MLOps) platform. · ...
GenAI/ML Systems Research Intern
3 days ago
The selected candidate will contribute to a Machine Learning Operations (MLOps) platform supporting state-of-the-art training and inference features. The candidate will also contribute to GenAI and AI/ML systems. · Advanced AI/ML systems with a focus on next-generation model trai ...